Improving our integration tests

In the last weeks I focussed on our integration test environment. For opsi 4.1 we are increasing the amount of tests we run because in addition to the tests descriped here we added automated migration tests. All these tests should not only be reliable but they also should be fast so that it is possible to run them often.

One downside of our test implementation was that it spent a lot of time in sleep. If we know a reboot would happen then we inserted a sleep 120 to wait 120 seconds and then hopefully the machine was up. Two minutes are a lot of time but sometimes during heavy load that wasn't enough for the machine to be up and accessible. So in some places this got increased to 180 to make really, really sure that the machine could be reached. This made tests a little bit more reliable but also slower. The overall runtime for one of these tests is usually somewhere between two and four hours. Three minutes do not seem like much but there is usually more than one sleep in each test.

So I went on to replace all the sleeps with something better. As mentioned before the sleeps are usually inserted when we wait for a port being reachable. This usually is SSH or opsiclientd which we then use for further work.

My attempt was to write a small Python script that does the waiting for us. Connection attempts are made until the port becomes reachable with a short delay inbetween checks. Once a port is reachable it exits. If a connection can not be made after five minutes it is assumed that something went wrong and the script ends with an non-zero exitcode. The exitcode helps in using this script inside our jenkins stages as this marks the step as failed.

The script does have some tricks though. If a port is reachable right away it will exit right away. The script also understands different kinds of waiting. It may wait for a port to come up, a port to come down or a reboot to happen (port goes down and then comes up again).

With the script in use we managed to reduce the overall runtime by some minutes. Not totally overwhelming but in the end every minute counts. In addition this also improves the reliability of our tests. There is another neat advantage: most of the tests are implemented as functions and the changes made for the tests of opsi 4.1 also effect the runtime of our opsi 4.0 tests.