Test Load Balancer(TLB)



Please use a javascript enabled browser to read this documentation(some of the documentation pages use controls that need javascript support to render)

Documentation for version:

This documentation directory covers core concepts and all the configuration options for TLB .
Should you wish to refer the documentation for a different release, please choose the corresponding version using the selector above.

Configuration Parameters:

This page explains purpose each configurable environment variable serves, kind of scenarios it'll be useful in and the possible values and implications.
TLB is a completely non-gui/non-cli/no-config-file kinda tool. Its very configurable(lots of knobs to turn) and makes it very easy for you to write your own implementations and plug-em-in if you want to.

TLB uses Environment Variables for every single configurable parameter it supports/exposes. Usually when configuring something thats an algorithm, we use fully-qualified-java-class-name of the class(which makes it easy for you to write alternate implementations, dump em in the classpath and have TLB load em with just an environment variable value flip).

TLB allows variable definition in terms of other variables.

It is often convinient to define TLB variables in terms of other environment variables set by execution environment. For instance, if the CI server promises to set BUILD_NUMBER and PROJECT_NAME, it may make sense to define TLB_JOB_NAME as 'ant-${PROJECT_NAME}' or '${PROJECT_NAME}-test' and TLB_JOB_VERSION(or JOB_VERSION before 0.3) as '${PROJECT_NAME}-${BUILD_NUMBER}' etc.

Please check this detailed example to get an in-depth understanding of variable interpolation feature.

Environment Variable Balancer against TLB Server Balancer against Go Server TLB Server
TYPE_OF_SERVER: governs the type of server to use for partitioning(whether to use TLB Server or Go Server). Applicable & Required Applicable & Required Not Applicable
TLB_SPLITTER is algorithm to use for balancing. Applicable & Required Applicable & Required Not Applicable
TLB_ORDERER is algorithm to use for ordering. Applicable(but not mandatory) Applicable(but not mandatory) Not Applicable
TLB_PREFERRED_SPLITTERS is list of algorithms to try(most prefered first), when using DefaultingTestSplitter for balancing. Applicable(but used conditionally) Applicable(but used conditionally) Not Applicable
TLB_TMP_DIR directory to use, to store temproary files. Applicable(but not mandatory) Applicable(but not mandatory) Not Applicable
TLB_BALANCER_PORT is TCP port the balancing server binds to. Applicable & Required Applicable & Required Not Applicable
TLB_SERVER_PORT is TCP port the TLB Server binds to. Not Applicable Not Applicable Applicable(but not mandatory)
TLB_DATA_DIR is the directory used by TLB Server to store test-data posted by partitions. Not Applicable Not Applicable Applicable(but not mandatory)
TLB_VERSION_LIFE_IN_DAYS governs how long test-data version is kept from the time its created. Not Applicable Not Applicable Applicable(but not mandatory)
TLB_SMOOTHING_FACTOR controls how aggressively test-time data is smoothened. Applicable(but not mandatory) Applicable(but not mandatory) Not Applicable
TLB_TOTAL_PARTITIONS declares number of partitions to be made. Applicable & Required Not Applicable Not Applicable
TLB_PARTITION_NUMBER for any partition, pin-points which one of the TLB_TOTAL_PARTITIONS is the subject itself. Applicable & Required Not Applicable Not Applicable
TLB_BASE_URL is the locator that balancer uses to reach the TLB Server. Applicable & Required Not Applicable Not Applicable
TLB_JOB_NAME is namespace that all partitions of a balanced test-task share(must remain the same across invocations as data is stored under this name). Applicable & Required Not Applicable Not Applicable
TLB_JOB_VERSION is the string TLB server uses as version identifier for tracking partitions of a particular test-task invocation(should be unique for every invocation). Applicable & Required Not Applicable Not Applicable
TLB_USERNAME is the username balancer uses to log-on to Go Server. Not Applicable Applicable(but not mandatory) Not Applicable
TLB_PASSWORD is the password balancer uses to log-on to Go Server. Not Applicable Applicable(but not mandatory) Not Applicable
GO_STAGE_FEED_MAX_SEARCH_DEPTH limits the number of pages balancer is allowed to crawl to find previous instance of stage on Go Server. Not Applicable Applicable(but not mandatory) Not Applicable
Other Go support variables that are set by Go before starting the execution of any 'job'. Not Applicable Applicable & Required Not Applicable

Details:

ON_BALANCER_SIDE refers to the process that is running tests. So being on balancer side means, being on the test-runner side.

ON_SERVER_SIDE refers to the TLB Server process.

  • In order to split tests into subsets based on data(say based on test time) or order tests based on say the result last time(for instance running tests that failed in the last run first), the test times and test results need to be stored somewhere. While balancing TLB gets this historical data off the storage and decides how to balance and re-order tests based on it and post feedback to the storage(so it can be used to balance test suite next time). This variable controls what server should the balancing instance(the process running tests) talk to in order to get/publish data.

    TLB loads the class that TYPE_OF_SERVER points to, and uses an instance of that to talk to service. The contract for the class used here is enforced by a java interface called Server


    Type: Fully qualified Java class name (for instance foo.bar.Baz). When a variable points to class-name, it can also be used to plug-in a user written custom class(that is not bundled with TLB), provided its available in the classpath, and implements the corresponding/relevant interface.


    Example: tlb.service.TlbServer or tlb.service.GoServer

  • This variable dictates the algorithm that TLB should use to balance the test set. The decision of how to split the test case(say based on time or count) is governed by the class that this variable points to.

    Defaults to No-Op Criteria(which means no balancing; will run all tests on all partitions).


    Type: Fully qualified Java class name (for instance foo.bar.Baz). When a variable points to class-name, it can also be used to plug-in a user written custom class(that is not bundled with TLB), provided its available in the classpath, and implements the corresponding/relevant interface.


    Example: tlb.splitter.DefaultingTestSplitter, tlb.splitter.CountBasedTestSplitter or tlb.splitter.TimeBasedTestSplitter

  • This variable dictates the algorithm that TLB should use to reorder the test set.

    Defaults to No-Op orderer.


    Type: Fully qualified Java class name (for instance foo.bar.Baz). When a variable points to class-name, it can also be used to plug-in a user written custom class(that is not bundled with TLB), provided its available in the classpath, and implements the corresponding/relevant interface.


    Example: tlb.orderer.FailedFirstOrderer

  • if $TLB_SPLITTER == 'tlb.splitter.DefaultingTestSplitter'
    This variable is relevant only in the scenario where DefaultingTestSplitter is used as TLB_SPLITTER. DefaultingTestSplitter tries multiple criterion in the order they appear in this value. The role of DefaultingTestSplitter is to only delegate to other criterion. It tries to delegate the balancing work to other criteria classes in the order user has specified. If balancer fails to balance using the first criteria, it moves on to the second one and tries balancing again, and then third and so on.

    This is why we recomend a using a complex and more powerful criteria like TimeBasedTestSplitter first, but since it needs historical data, it fails when historical data is unavailable(which will be the case during the very first build using TLB). For situations like these, we recomend chaining it with a simpler criterion like CountBasedTestSplitter, which doesn't need any historical data, but doesn't balancer as good as TimeBasedTestSplitter either.

    The criteria names can be specified delimited by colon(:) character.


    Type: Colon seperated list of fully qualified Java class names (for instance foo.bar.Baz:quux.bar.Foo:baz.quux.Bar).
    Example: tlb.splitter.TimeBasedTestSplitter:tlb.splitter.CountBasedTestSplitter

  • if $TYPE_OF_SERVER == 'tlb.service.GoServer'
    Expects tmp directory name. Uses the default Java temp dir(system property java.io.tmpdir) if not given.

    You almost never what to set this one. The default here always is the right thing to use. Except on.... any guesses? Hold your breath! Yes thats right, its Windows. On some flavours of windows the tmp directory doesn't exist(or is not writable). You want to set it on windows to a directory that you know exists, and the user TLB is running as can write to.


    Type: Path to an existing directory.
    Example: /tmp or C:\temp.

  • if ON_BALANCER AND (using alien-language support)
    Balancer is actually a very light-weight HTTP server. The actual library that hooks-up with testing-framework or build-tool starts this server before running tests, and shuts it down after all tests finish.

    While thats perfect, the http server needs a port to bind to, and this needs to be configured by you as a user, because TLB doesn't want to assume a port and fail a couple tests in turn, that depend on that port being free.


    Type: TCP port (should preferably be greater than 1024 so unpriviledged user can get a bind too)
    Example: 4971

  • if ON_SERVER_SIDE
    This is the tcp port you want TLB-Server to listen to, changing this to a different value, say 9005, will require change in the value of $TLB_BASE_URL being used on the TLB partitions running against the server.

    Defaults to 7019.


    Type: TCP port (should preferably be greater than 1024 so unpriviledged user can get a bind too)
    Example: 8157

  • if ON_SERVER_SIDE
    Relevant only for the TLB Server process. This is the directory that TLB Server uses to persist the historical data so it survives restarts/upgrades.

    You want this directory to survive machine restarts, so using something like /tmp is usually a bad idea.

    Defaults to a directory called TLB_DATA_DIR in the tlb-server's working directory.


    Type: Path to a directory.
    Example: /var/lib/tlb-data or C:\tlb-data etc.

  • if ON_SERVER_SIDE
    TLB supports a notion of version for test times(and smoothened test times). This is important because while balancing every partition must see the exact same data(else the partition may not turn out to be mutually exclusive and collectively exhaustive, which means the same test suite may run twice(on two different partitions) or some test suites may not run at all.

    This variable is relevant on the TLB Server. This governs how long the version is going to stay in the server(after which server will purge it, and if it gets another request for that version after the purge, will create a new version from the latest data snapshot).

    This must be significantly larger than the time test-load-balanced build is expected to take, because the snapshot will be taken when the very first partition queries the server with an unknown version, and must live untill last of balanced partitions is done fetching data.

    You want this directory to survive machine restarts, so using something like /tmp is usually a bad idea.

    Defaults to 1 day.


    Type: A whole number(number of days)
    Example: 3 (means three days)

  • if ON_BALANCER_SIDE
    Smoothing in TLB context is the act of using computed test-run-times for balancing, which is the some weighted average of last reported test-run-time and historical test-run-time, so that balancing.

    This prevents balancing in future run getting skewed because of a machine being slower than usual in the current run. TLB uses a well known algorithm called Exponential Smoothing.

    http://en.wikipedia.org/wiki/Exponential_smoothing explains this smoothing algorithm.

    Setting TLB_SMOOTHING_FACTOR for a partition ensures data for the subject partition is posted after smoothing it with respect to historical data. This does not, in any way affect other partitions for the same run. This means, partitions in the same run have the freedom to choose different smoothing factor values.

    The way we see it being used is, if a machine is known to be slow intermitently, you set a low smoothing factor on that partitcular machine. Whenever one of load-balanced partitions is run on that slow machine, it will get the locally assigned value of smoothing factor, hence will smoothen more aggresively compared to others.

    TLB uses 1 as default value of TLB_SMOOTHING_FACTOR(alpha). But 1 means NO-smoothing, so you almost always want to override this to some sensible value. We have chosen i as default because we want TLB to have as non-intrusive defaults as possible.


    Type: A real number (0 < i <= 1) [Lower the value, more is the weightage given to historical data, hence more aggresive the smoothing is]
    Example: 0.5 (means equal weightage to historical data and newly-reported data)

  • if on 'balancer side' and 'balancing against TLB server', which means ON_BALANCER_SIDE AND ($TYPE_OF_SERVER == 'tlb.service.TlbServer')
    Balancer instance, when running against TlbServer, needs to know how many total partitions have been made.

    For instance, if there are a total of 100 tests being partitioned across 5 test processes, value of this variable should be 5.

    For people balancing against GoServer: This variable is not required while balancing against GoServer because it is infered from the way jobs in your stage are named.


    Type: A natural number(number of partitions).
    Example: 12

  • if on 'balancer side' and 'balancing against TLB server', which means ON_BALANCER_SIDE AND ($TYPE_OF_SERVER == 'tlb.service.TlbServer')
    Balancer instance when running against TlbServer, needs to know which partition(of TLB_TOTAL_PARTITIONS) is it itself.

    Say while running 5 partitions first one would have this value set to ‘1’ whereas third one would have it set to ‘3’.

    For people balancing against GoServer: This variable is not required while balancing against GoServer because it is infered from the way jobs in your stage are named.


    Type: A natural number(which one of TLB_TOTAL_PARTITIONS is the current partition).
    Example: 4 1 <= $TLB_PARTITION_NUMBER <= 7 if $TLB_TOTAL_PARTITIONS = 7

  • if on 'balancer side' and 'balancing against TLB server', which means ON_BALANCER_SIDE AND ($TYPE_OF_SERVER == 'tlb.service.TlbServer')
    Balancer instance when running against TlbServer, needs to know what url it must use to talk to the TlbServer(which is an restful http service).

    This variable must point to the TlbServer base url(for instance if TlbServer is running on foo.bar.com on port 7019, this would be http://foo.bar.com:7019/.

    For people balancing against GoServer: This variable is not required while balancing against GoServer because it is infered from the way jobs in your stage are named.


    Type: Http server base url (url to the TLB server)
    Example: http://192.168.1.100:7019/

  • if on 'balancer side' and 'balancing against TLB server', which means ON_BALANCER_SIDE AND ($TYPE_OF_SERVER == 'tlb.service.TlbServer')
    Balancer instance when running against TlbServer, needs to know what namespace all partitions that are running splits of a test suite must use. A TlbServer instance can cater to several balancer instances(running different partitions of different test suites, which have nothing to do with each other).

    For instance, functional tests can run using namespace ‘func-tests’, smoke tests can use namespace ‘smoke’ and unit tests can use namespace ‘unit-tests’, however, all partitions of a suite must use the same namespace, which means assuming there are 5 partitions of unit tests(they must all use some namespace, say ‘unit-tests’ so that the TlbServer knows they belong to the same family). Similarly, assuming smoke tests are partitioned across 3 instances, they must all use ‘smoke-tests’(or something similar) as the value of this variable.

    For people balancing against GoServer: This variable is not required while balancing against GoServer because it is infered from the way jobs in your stage are named.


    Type: String(represents namespace for multiple partitions of a test-suite).
    Example: project-foo-integration-tests or project-bar-unit-tests

  • if on 'balancer side' and 'balancing against TLB server', which means ON_BALANCER_SIDE AND ($TYPE_OF_SERVER == 'tlb.service.TlbServer')
    When a suite is balanced against TlbServer, the dataset(test time, result etc) is maintained as one mutating set on the server. In a real world balancing scenario, test runner instances(partitions) do not all start at the same time.

    Lets say we have 3 partitions - partitions A, B and C. A may have started running tests and may have already reported result and time for a few tests by the time B and C start. Now, lets say B and C want to TimeBalance and hence want data from the server. However, B and C must balance based on the exact same data that A started out with and not the updated data, which has feedback from A. This means, if a new time data is available of a test from A, that should be used in balancing on B. By doing this, we may end up reruning the same test on B as it is faster. This is vital for the mutual exclusion and collective exhaustion principle that TLB follows.

    To solve this problem, TLB has a concept of versioning. When A starts running, it posts the TlbServer a version string against which the server stores a snapshot of data thats relevant for the corresponding TLB_JOB_NAME. When B or C queries data using the same version, they get the same data that A got. This ensures that all partitions see the same data, in-spite of server receiving new data continuously.

    Usually TLB_JOB_VERSION is set such that it changes between suite-runs. For instance, build number can be used as TLB_JOB_VERSION. In this case, A, B and C may all be running at version 10.

    Using a unique version ensures the frozen (hence stale) data is not used for balancing/ordering the new run of the same test suite. When the next build is triggered all three partitions start with the corresponding build number, which may be 11, hence the frozen snapshot of data from version 10 is not used. Recursive variable substitution can be used to make sensible and satisfactorily unique version strings.

    For people balancing against GoServer: This variable is not required while balancing against GoServer because it is infered from the way jobs in your stage are named.

    TLB allows environment variables interpolation, which means you can use a string that is composed of references to other environment variables(that change between builds, but remain the same for jobs in a build).

    For instance, when balancing a Go stage, using TLB Server, $TLB_JOB_VERSION = ${GO_PIPELINE_COUNTER}-${GO_STAGE_COUNTER} can be a sensible value. TLB will resolve the refered values before using the variable.


    Type: Any unique string(that changes across suite runs(and remains same across partitions for any given run)).
    Example: 'foo-bar-<build-number>' (ie. foo-bar-10, foo-bar-11 etc)

  • if on 'balancer side' and 'balancing against Go server', which means ON_BALANCER_SIDE AND ($TYPE_OF_SERVER == 'tlb.service.GoServer')
    Balancer instance, when running against Go server that has authentication enabled, needs credentials to access data from the last suite-run. This variables captures the username that partitioned instance should use to login to the Go server.

    Is required only in the case of Go servers which have security turned on.


    Type: String(username)
    Example: tlb-user

  • if on 'balancer side' and 'balancing against Go server', which means ON_BALANCER_SIDE AND ($TYPE_OF_SERVER == 'tlb.service.GoServer')
    Balancer instance, when running against Go server that has authentication enabled, needs credentials to access data from the last suite-run. This variables captures the password that partitioned instance should use to login to the Go server.
    Type: String(password)
    Example: tlb-password

  • if on 'balancer side' and 'balancing against Go server', which means ON_BALANCER_SIDE AND ($TYPE_OF_SERVER == 'tlb.service.GoServer')
    Balancer instance, when running against Go server Stage Feed(an atom feed of completed stages) to find the historical stage instance to download test run data(posted by the previous build) from.

    However, Go server can have long history of stages, and the stage to be balanced may be a new one or a manual one(which doesn't run too often). In such situations, TLB must stop at a definite depth(number of pages) to keep itself from launching a 'Denial of Service' attack on the Go server feed.

    This number controls what that limit is, for a partition.

    Warning: Please do not set different values across partitions sharing a job-name(partitions that are supposed to run part of the same set of tests). Setting it to different values between partitions of one invocation can mess-up balancing on one or more of partition instances while other partition instances balance well, which can voilate mutual exclusion and more importantly, collective exhaustion.
    Obviously, its safe to use different values across different stages, or job-families(for instance, jobs named job-1, job-2, job-3 make one job family, and hello-1, hello-2 make another family).

    TLB uses 10 as default value of GO_STAGE_FEED_MAX_SEARCH_DEPTH, which means it will go 10 pages back in history(and no more) before declaring that it has failed to find historical stage.


    Type: A whole number(number of stages-atom-feed pages TLB is allowed to traverse to find last stage run)
    Example: 25

  • If you are unfamilier with Go and do not intend to use Go support, please feel free ignore this text.

    if on 'balancer side' and 'balancing against Go server', which means ON_BALANCER_SIDE AND ($TYPE_OF_SERVER == 'tlb.service.GoServer')
    You do not need to set these variables. Go-agent sets these before spawning Go-task(s). So as long as you don't re-set these to some other value(s), you'll be fine.

    GO_SERVER_URL HTTPS url to the go server.
    GO_PIPELINE_NAME Name of the Pipeline the task that is running tests belongs to.
    GO_STAGE_NAME Name of the Stage the task that is running tests belongs to.
    GO_JOB_NAME Name of the Job the task that is running tests belongs to.
    GO_PIPELINE_COUNTER Counter(instance number) of the Pipeline Instance the task that is running tests belongs to.
    GO_STAGE_COUNTER Counter(instance number) of the Stage Instance the task that is running tests belongs to.
    GO_PIPELINE_LABEL Label(logical instance name) of the Pipeline Instance the task that is running tests belongs to.

    TLB needs these variables because while working against Go, it needs to download/upload test related data as artifact files, which go under the corresponding job-instance.


    Type: Strings