Test Load Balancer(TLB)



Please use a javascript enabled browser to read this documentation(some of the documentation pages use controls that need javascript support to render)

Documentation for version:

This documentation directory covers core concepts and all the configuration options for TLB .
Should you wish to refer the documentation for a different release, please choose the corresponding version using the selector above.

Quick Start

Introduction

TLB has two primary components.
  • A Server : that stores and allows querying of test data (test times/test results etc)
  • A Balancer : that partitions and re-orders suite of tests, given a server url

TLB engages once the test-task has been invoked. This generally happens on multiple processes or machines each one of which is supposed to share the test-load (and hence participate in test-load-balancing by executing only a part of the whole). Since each of these partitions run only a subset of tests, time taken to run each such partition is as many times lesser than the total test time. TLB ensures each such subset takes about the same time to execute.

For instance, if total time (time required to execute all tests) is 50 minutes and there are 10 partitions participating in load-balancing of this test-suite, each one runs only a subset, which takes about 5 minutes to finish (50 mins / 10 partitions = 5 minutes each).

When each such partition is started at the same time, they finish at about the same time too. Which means a 50 minute test-task with parallelization finishes only in 5 minutes. Once all the partitions finish, because of the way TLB partitions tests, it is guranteed that all tests have executed. The build should be considered red if any test fails on any partition. If all the tests pass, the build can be considered green.

NOTE: TLB only understands splitting the given set of tests into smaller subsets and reordering them within the aforementioned subsets. It has nothing to do with actually running the tests(which build-tool's and testing-framework's jobs), parallelizing them across machines or Virtual Machine management(which are typically taken care of by CI/build server like Hudson, Go, TeamCity, Bamboo etc.


Basic Setup

Having explained what TLB is and isn't, here are the steps to follow to incorporate test-load-balancing on your project's test-suite

  1. Ensure JRE 1.6 is installed (Note: JDK bundles JRE as well, so if you have JDK installed, you are all set). This is because TLB core is written in Java.
  2. Download TLB distribution from the Downloads page.
    If you want to use TLB on a Ruby project, you can install the relevant TLB gem using:
    $ gem install tlb-testunit19 for test::unit test suite on Ruby 1.9.x
    $ gem install tlb-testunit18 for test::unit test suite on Ruby 1.8.x
    $ gem install tlb-cucumber for Cucumber test suite
    $ gem install tlb-rspec1 for RSpec-1.x (1.2.x, or 1.3.x etc) test suite
    $ gem install tlb-rspec2 for RSpec-2.x (2.4, or 2.3 etc) test suite
  3. Start the TLB Server using the server.sh(for *nix) or server.bat(for windows) script(s) present in the TLB distribution. You'd want to do something similar to:
    For *nix: $ tlb-x.x/server.sh start For Windows:> tlb-x.x/server.bat start

    This will start a very lightweight http RESTlet server bound to port 7019 (or whatever user's override is). These scripts have commonly used environment variables relevant to tlb-server configuration that you can tweak if need be. If you wish to use the Go support instead (if using Go-server support, you won't need the TLB Server), please go through the Included file 'startup_server_link' not found in _includes directory documentation to understand how to have balancer work in Go-server-support mode.

  4. Once the server is up, you need to add Balancing to your build.

    Balancer gets a list of all the tests that need to be executed from the build script (after the build script has been invoked, and before tests start running). It then prunes that list to make a subset using the historical test information obtained from the Server. This smaller subset is passed-on to the test framework to execute. Balancer continues to listens to events published by the test framework as these tests execute, to record result and time taken by each test. This data is then posted across to the TLB server and acts as seed data for balancing/ordering future builds.

    Example of TLB balancer configuration for some of the supported frameworks

    Feel free to copy the build task fragment relevant to the platform and tool-set your project uses into your build script and tweak the details to fit your project needs. In addition to project specific changes, you'll need to make other changes as suggested by the inline comments in relevant fragment(s).


    Junit on Ant

        <!-- Change the 'load.balanced.classpath' so that it is your test classpath along with the TLB jar and
        its dependencies. You can also tweak the 'depends' to fix the task dependencies of your build.
         You can change the fileset's includes pattern to include your tests. -->
        <target name="test.balanced" depends="compile, compile-tests">
            <typedef name="load-balanced-fileset" classname="tlb.ant.LoadBalancedFileSet" classpathref="load.balanced.classpath"/>
            <junit failureproperty="test.failure" printsummary="yes" haltonfailure="true" haltonerror="true"
                   showoutput="true" fork="true">
                <classpath refid="load.balanced.classpath"/>
                <batchtest todir="${reports.dir}">
                    <load-balanced-fileset dir="${test-classes.dir}" includes="**/*Test.class"/>
                    <formatter classname="tlb.ant.JunitDataRecorder"/>
                    <formatter type="plain"/>
                </batchtest>
            </junit>
        </target>
    

    Test::Unit on Rake

    require 'rake'
    require 'rubygems'
    if RUBY_VERSION =~ /^1\.9/
      gem 'tlb-testunit19'
    else
      gem 'tlb-testunit18'
    end
    require 'tlb/test_unit/test_task'
    
    Tlb::TestUnit::TestTask.new(:test_balanced) do |t|
      t.libs << "test"
      t.test_files = FileList['test/**/*_test.rb']
      t.verbose = true
    end
    
    load 'tasks/tlb.rake'
    
    task :bal => ['tlb:start', :test_balanced]
    

    RSpec 1.x on Rake

    #Use the task :bal to run balanced test suite. You can change the FileSet to match whatever tests you need
    #to run.
    require 'rubygems'
    gem 'tlb-rspec1'
    require 'tlb/spec/spec_task'
    
    Tlb::SpecTask.new(:balanced_specs) do |t|
      t.spec_files = FileList['spec/**/*_spec.rb']
      t.spec_opts << "--format progress"
    end
    
    load 'tasks/tlb.rake'
    desc "run specs load-balanced(based on environment variables)"
    task :bal => ['tlb:start', :balanced_specs]
    

    RSpec 2.x on Rake

    #Use the task :bal to run balanced test suite. You can change the t.pattern to match whatever tests you need
    #to run.
    require 'rubygems'
    gem 'tlb-rspec2'
    require 'tlb/rspec/spec_task'
    
    Tlb::RSpec::SpecTask.new(:run_balanced) do |t|
      t.pattern = 'spec/**/*_spec.rb'
    end
    
    load 'tasks/tlb.rake'
    desc "run specs load-balanced(based on environment variables)"
    task :bal => ['tlb:start', :run_balanced]
    

    Cucumber on Rake

    require 'rubygems'
    require 'cucumber'
    gem 'tlb-cucumber'
    require 'tlb/cucumber/rake/cucumber_task'
    
    Tlb::Cucumber::Rake::CucumberTask.new(:cucumber_tests) do |t|
      t.cucumber_opts = ["--format", "pretty"]
    end
    
    load 'tasks/tlb.rake'
    desc "Run Cucumber features in a load-balanced fashion (based on environment variables)"
    task :bal => ['tlb:start', :cucumber_tests]
    

  5. Once the necessary build script modifications are made, some TLB configuration environment variables need to be set before TLB can start heavy lifting. Feel free to tweak the value of these variables to what makes sense for your project and environment. These variables need to be set for each partition invocation (for example, if you have chosen to make 2 partitions of your test-suite, these values should be set on each partition before executing).

    A detailed description of possible values and implications of all TLB configuration parameters(including the ones below) is available at Configuring TLB

    We recomend the following values to start with.

    • TLB_BASE_URL=http://tlb-server-host.my-domain.com:7019

      This tells TLB balancer where to obtain the test data from and post data to. Replace the use the appropriate hostname and port.

    • TLB_TOTAL_PARTITIONS=2

      Total number of subsets/splits to be made. This should be equal to the number of machines/processes that are going to execute in this test-task parallely.

    • TLB_PARTITION_NUMBER=1

      Controls which of the $TLB_TOTAL_PARTITIONS is the current partition. It decides which of all the computed subsets needs to be given to the current process's test-runner to execute. Assuming you have decided to go with 2 partitions, this variable will need to be set to 1 for the first partition, and 2 for the second. Its the one-based index of current subset.

    • TLB_JOB_NAME=sample_job

      The same TLB server can be used by different builds from different projects at the same time. In order for the server to identify them uniquely, TLB uses a notion of name-space(we call it 'job name'). Make sure all job names are unique.

      For instance, if you want to configure two projects, one that has a JUnit suite and is a Java project called 'foo', and other that is a Ruby project called 'bar' that is using TLB to balance Test::Unit based tests. The value for TLB_JOB_NAME can be 'foo' for the former and 'bar' for the later. However, if 'foo' in the subject has two different test suites, one that is a unit test suite, and another that is a JUnit functional test suite, you'd want to distinguish between those as well. Sensible job-names for the current scenario would probably be 'foo-unit' and 'foo-functional'. Of course, you can choose to name it anything.

    • TLB_JOB_VERSION=foo-project-${build_number}

      Sometimes you may end up with multiple builds running at the same time, executing the same test-suite (i.e. has the same TLB_JOB_NAME). Now, when the test data is to be sent to the server, there needs to be a way for the server to identify the instance of the build that is reporting. All partitions running the same build instance should have the same value of TLB_JOB_VERSION and each build instance should use a unique TLB_JOB_VERSION. Here is a scenario that will help understand the importance of this variable, and the contract it enforces:

      • Jen and Matt are two developers on a certain project
      • The project's build has a unit-test task which is configured to use TLB, and the CI server is configured to have 2 different build-machines to execute the test task with the right set of environment variables, so that they balance
      • Lets say the JOB_NAME for unit-test task is 'project-tests'
      • Matt triggers build number 74 with his changes to source, the first partition of which completes and posts test-data back to the TLB server, whereas the second partition is still running
      • Jen triggers build number 75 in the meantime with her changes, and the build reaches a stage where first build-machine has downloaded the test-data from shared TLB server and has decided upon a subset, and has started running tests, but the second partition has not pulled down the data yet.
      • At this point, the other partition from Matt's build finishes, and uploads data back to the server
      • Second partition of Jen's build now downloads updated copy of data(which is different from the data first partition pulled)
      Now this is a problem, because second partition of Jen's build is going to balance differently from the first one, which means though it will run the second partition, it may run some tests that first partition has executed as well, and worse than that, may ignore a few tests that the first partition did not run(because according to the data at that point, they fell in the second partition). This is where job version comes in. Job version forces TLB server to store a snapshot of data as seen by one of the partitions of a particular build instance, and then TLB server ensures that every other partition using the same job version sees the exact same snapshot of data. In the example above, build number can be used as job version, and that'll ensure second partition of build number 75 gets the same snapshot of data that was served to the first partition of the same build.

      Phew!

    For a list of all the configuration variables and more details on configuration options refer to Configuring TLB.

  6. Ensure that the environment variables are set before the build starts. During the build, all the partitions should execute parallely for TLB to give best results(remember, more the prallelization, shorter the build).