Test Load Balancer(TLB)



Please use a javascript enabled browser to read this documentation(some of the documentation pages use controls that need javascript support to render)

What is Test Load Balancer (TLB)?

Test Load Balancer(TLB) is a tool that can automatically partition tests into multiple subsets each one of which can be executed in parallel. The execution can happen on different physical/virtual machines or on the same machine as different processes or threads. More the partitions, lesser are the number of tests executed on each one, and since all of the partitions start at the same time(and finish almost at the same time) overall test-execution time gets divided by the number of partitions you make. Test-running is by far the longest step in most(if not all) builds, and cutting down test-running time speeds up the build, hence feedback loop. TLB can be used for any kind of test-suite, it can be unit, integration or functional tests.

In addition to balancing, TLB does other interesting things like re-order tests within a subset(set of tests that run on a partition) before they are executed. For instance, it re-arranges tests to execute failing ones(that failed in previous build) first, hence ensuring early feedback.

Here is the slide-deck that we use for introduction talk in conferences.


3.. 2.. 1.. Quick Start


LICENSE

TLB is released under the BSD (2-clause version) license. Check out the License


What Language(s)/Platform(s) does TLB support?

TLB is written in Java, which means, it can be used on pretty much all platforms that Java can run on.

However, TLB was written from ground up to support both JVM and non-JVM based languages and runtime(s). The Balancer is capable of running as a standalone process, in alien-environment(as we call it). The build/test framework and programming language that tests are written in does not matter.

'tlb.rb' for instance, supports MRI(CRuby) and using standalone balancer. However, since TLB-core is written in java, you will need to have java installed on the box running tests(so balancer process can be launched).

Supported tools and environments section below has the list of all frameworks/environments TLB supports as of now.


What testing-tools/build-tools does TLB support?

This list is updated as upstream evolves. Please check version specific documentation to find out what a particular version supports.

TLB supports:
Testing Tool Build Tool(s) Programming Language(s) or Platform(s)
JUnit Ant, Buildr Java
Twist Ant, Buildr Java
RSpec-1.x & Rspec-2.x Rake Ruby (MRI/CRuby) & JRuby (both 1.9 and 1.8)
Test::Unit Rake Ruby (MRI/CRuby) & JRuby (both 1.9 and 1.8)
Cucumber Rake Ruby (MRI/CRuby) & JRuby (both 1.9 and 1.8)

Work in progress:
TLB team is working on adding support for the following tool combinations:
Testing Tool Build Tool(s) Programming Language(s) or Platform(s)
JUnit Maven Java
Nunit NAnt .Net

We plan to add support for:
We have not yet started work on adding support for tools mentioned in this section.
MSTest(.Net), MSBuild(.Net), Maven(Java), unittest(Python), FiveAM(CommonLisp), cppunit(C++) etc.

And of course, anything else you can mail us patches for... :-)


Dedicated support

If you need help getting up and running with TLB or require help tuning your build, do write to us at singh.janmejay@gmail.com or itspanzi@gmail.com. You can also drop us a note on the project mailing list.

What problem does TLB solve?

Most build servers(like Hudson, Go, TeamCity, Bamboo etc.) provide parallel execution capability(capability to execute command(s) on different machines at the same time). However, parallelization of tests needs a tool that can decide what tests need to be run in each such parallely running process across machines. This is where TLB comes in.

Given that you have a way to invoke the test command(project's test target) on multiple machines at the same time(which can be using a CI server's agent-grid environment, or can even be actually invoked by a tiny utility script from your terminal), you can use TLB to ensure each one of these invocations execute only a few tests(and not all). TLB ensures these few tests for each partition are selected in a way, such that:

  • No test is run on more than one partition: Mutual Exclusion
  • No test is missed out(every test is selected by at-least one partition): Collective Exhaustion

The problem TLB solves for you is that of slicing and dicing your test suites in the most optimal way, allowing you to get the best of parallel execution(and not that of launching parallel processes on single or multiple machines). TLB engages once your test target is launched, it remains agnostic to what triggers it.


How does TLB work?

TLB has two primary components.

  • A Server : that stores and allows querying of test data (test times/test results etc)
  • A Balancer : that partitions and re-orders suite of tests, given a server url
Balancer hooks-up with your build-framework and testing-framework to do the actual work, whereas Server is primarily a data-repository that balancer talks to. Data from historical run(s) is used by the balancer to partition/re-order the current run, and in-turn, data from the current run is posted back so it can be used as historical data for future runs.

It actually forms a cycle where historical data is used to partition accurately, and run data is captured to seed future runs so accuracy is maintained(as data is updated with every new run).

Figure 1: Pictorial expression of aforementioned interaction between Server and Balancer, to show where Server and Balancer fit in the entire act of load balancing.


Deep dive: Understanding and Configuring TLB

While Quick Start is a good place to start off, sooner or later you will want to understand the basic concepts involved, and find out more about configurable parameters TLB exposes to allow you to tune it best for your project and environment. Getting a good grasp of TLB concepts will enable you to leverage a lot of very useful features TLB exposes.

Concepts in TLB covers TLB concepts. Configuring TLB section for configuration related documentation which covers configuration parameters in detail.


Philosophy and Inspiration

Running tests is by far the biggest time slice in any project's build, unless the project has very little or no tests at all(which if the case, is obviously a bigger problem).

Over a period of time, functionality and features grow, and so do tests. Eventually because of these huge number of tests, the time taken by end-to-end build starts eating into the productivity pie. Every upstream change-set goes through a build process which is several minutes(or worse, hours) long, which makes cost of a fixing a failing test really high. This is particularly true for automated functional tests or integration tests just because they generally take more time to execute than unit-tests.

When things fail in a slow build, fixing em, getting a green build and having development/testing teams back on track can be an extremely frustrating and time consuming process. It takes awful lot of time to attempt fixes(especially so, when dealing with an indirect or complicated issue, which takes multiple attempts) and having slow build often forces the developer(s) to wait for hours together to get a single good build.

Given that hardware is so cheep now a days and that assuming tests are independent of each other and are order insensitive, most teams can cut their build time by parallelizing test execution. No matter how many tests the project has, you can maintain really low build time just by throwing more hardware at it. Parallelization and scaling-out(distribution of computing to several commodity machines) is a terrific strategy to conquer independent computing problems that when executed serially, take long time and running of tests is one such problem.

Before coming up with TLB, we faced the slow-serial builds problem on every single project we worked on, and couldn't find anything out there capable of solving it satisfactorily. TLB has been written from ground-up to solve this problem for every language and every testing-framework.

TLB is our final answer to the slow builds problem.