Re-executing failed test cases and merging outputs with Robot Framework

In a previous post, I discussed solving intermittent issues aka building more robust automated tests. A solution I did not mention is the simple “just give it another chance”. When you have big and long suites of automated tests (quite classic to have suites in the 1000’s and lasting hours when doing functional tests), then you might get a couple of tests randomly failing for unknown reasons. Why not just launching only those failed tests again? If they fail once more, you are hitting a real problem. If they succeed, you might have hit an intermittent problem and you might decide to just ignore it.

Re-executing failed tests (–rerunfailed) appeared in Robot Framework 2.8. And since version 2.8.4 a new option (–merge) was added to rebot to merge output from different runs. Like explained in the User Guide, those 2 options make a lot of sense when used together:

# first execute all tests
pybot --output original.xml tests 
# then re-execute failing
pybot --rerunfailed original.xml --output rerun.xml tests 
# finally merge results
rebot --merge original.xml rerun.xml

This will produce a single report where the second execution of the failed test is replacing the first execution. So every test appears once and for those executed twice, we see the first and second execution message:


Here, I propose to go a little bit further and show how to use –rerunfailed and –merge while:

  • writing output files in an “output” folder instead of the execution one (use of –outputdir). This is quite a common practice to have the output files written in a custom folder but it makes the whole pybot call syntax a bit more complex.
  • giving access to log files from first and second executions via links displayed in the report (use of Metadata). Sometimes having the “new status” and “old status” (like in previous screenshot) is not enough and we want to have details on what went wrong in the execution, and having only the merged report is not enough.

To show this let’s use a simple unstable test:

*** Settings ***
Library  String

*** Test Cases ***
    should be true  ${True}

    ${bool} =  random_boolean
    should be true  ${bool}
*** Keywords ***
    ${nb_string} =  generate random string  1  [NUMBERS]
    ${nb_int} =  convert to integer  ${nb_string}
    Run keyword and return  evaluate  (${nb_int} % 2) == 0

The unstable_test will fail 50% of times and the stable test will always succeed.

And so, here is the script I propose to launch the suite:

# clean previous output files
rm -f output/output.xml
rm -f output/rerun.xml
rm -f output/first_run_log.html
rm -f output/second_run_log.html
echo "#######################################"
echo "# Running portfolio a first time      #"
echo "#######################################"
pybot --outputdir output $@
# we stop the script here if all the tests were OK
if [ $? -eq 0 ]; then
	echo "we don't run the tests again as everything was OK on first try"
	exit 0	
# otherwise we go for another round with the failing tests
# we keep a copy of the first log file
cp output/log.html  output/first_run_log.html
# we launch the tests that failed
echo "#######################################"
echo "# Running again the tests that failed #"
echo "#######################################"
pybot --outputdir output --nostatusrc --rerunfailed output/output.xml --output rerun.xml $@
# Robot Framework generates file rerun.xml
# we keep a copy of the second log file
cp output/log.html  output/second_run_log.html
# Merging output files
echo "########################"
echo "# Merging output files #"
echo "########################"
rebot --nostatusrc --outputdir output --output output.xml --merge output/output.xml  output/rerun.xml
# Robot Framework generates a new output.xml

and here is an example of execution (case where unstable test fails once and then succeeds):

$ ./ unstable_suite.robot

# Running portfolio a first time      #

Unstable Suite
stable_test                                       | PASS |
unstable_test                                     | FAIL |
'False' should be true.
Unstable Suite                                    | FAIL |
2 critical tests, 1 passed, 1 failed
2 tests total, 1 passed, 1 failed
Output:  path/to/output/output.xml
Log:     path/to/output/log.html
Report:  path/to/output/report.html

# Running again the tests that failed #

Unstable Suite
unstable_test                                     | PASS |
Unstable Suite                                    | PASS |
1 critical test, 1 passed, 0 failed
1 test total, 1 passed, 0 failed
Output:  path/to/output/rerun.xml
Log:     path/to/output/log.html
Report:  path/to/output/report.html

# Merging output files #

Output:  path/to/output/output.xml
Log:     path/to/output/log.html
Report:  path/to/output/report.html

So, the first part is done: we have a script that launch the suite twice if needed and put all the output in “output” folder. Now let’s update the “settings” section of our test to include links to first and second run logs:

*** Settings ***
Library   String
Metadata  Log of First Run   [first_run_log.html|first_run_log.html]
Metadata  Log of Second Run  [second_run_log.html|second_run_log.html]

 If we launch our script again, we will get a report with links to first and second run in the “summary information” section:


The script and the test can be found in a GitHub repository. Feel free to comment on that topic if you found out more tips on those Robot options.

Randomizing test execution order

Many testing framework offer optional randomization of test execution order. For example:
– Robot Framework with –randomize
– Rspec with –order random

I consider this option as very useful and use it by default for all the automated tests portfolio I run. The advantages I see are:

– we detect ordering dependencies as soon as possible.  If we execute tests A and B always in the same order, test B could work only because test A left the system in a state that is used by test B. If one day we invert the order (by renaming the tests for example, if order depend on alphabetical order) then the suite will fail and it will take us some time to understand the problem (because test A and B were maybe written months/years ago). This happens also if we insert a new test between A and B or refactor test A or B. If we run the tests in random order all the time, we will detect this issue very soon.

– we might detect bugs in the SUT that appear only in some specific sequence of actions that a random order of test could meet with luck. The problem then could be how to duplicate the bug we just bumped into. For Rspec, randomness can have predictability via seeding.  JUnit had a randomization problem when Java 7 came out, and had to think that over and came up with a deterministic but random order. There is no such thing for Robot Framework so we have to manually reproduce the test order that caused the failure.

– we won’t always run the same tests first. And usually when we read a test report, we start by the top and analyse error one by one. This could help us to not analyze the “Access”, “Audit” and “Authentication” tests first all the time…

One could argue that using a fixed test execution order is useful to run some smoke/sanity tests first, and then the rest of the porfolio. I think in that case, it is better to split that in 2 different jobs. A first “smoke test” job that runs quickly (5-10 minutes) and another “full test” that can take several hours. In Robot Framework, this can be easily achieved using tags.

Another reason to push for fixed tests execution order could be performance optimisation. Test A could be preparing the system for test B to start and when test B ends, the system is ready for test C to go. One of the reason this is a bad pattern is that you won’t be able to run only test B or only test C! If the setup of a given test is in the previous test, then you are doomed to run always the full portfolio. This is simply not bearable. When a full portfolio detect a couple of failed tests, we want to be able to run those tests once more to double-check they are failing and then start to analyse the problem.

We could also introduce randomness in the test themselves, but this is another topic… for a future post!

How to solve intermittent issues and get more robust automated tests

In the last episode of their podcast, Trish Khoo and Bruce McLeod discussed how to solve intermittent issues in automated test suites. Trish also made a presentation on this topic at a Selenium Conference. In those both media, Trish and Bruce go over different topics:

1) test design
2) logging
3) tolerance
4) have stable systems
5) some tips (prefer “wait for” to “sleep” for example)

I recommend you to consume those 2 presentations. Here are some notes on those topics with some comments on how I approach them with Robot Framework.

On test design, as automated tests are code, all the principle of clean code should be applied to tests: small functions, single responsibility, DRY, no side-effect. There is a specific chapter on “clean tests” in “clean code” by Uncle Bob where he discuss readability, use of DSL, single assert per tests and FIRST principle. In Robot Framework, creation of keywords is so quick and easy, that I tend to create as many as needed until I obtain test cases which are 10 lines long max with given-when-then sections clearly identified.

On logging, in Robot Framework, I have a keyword that I launch in the Teardown of every single test that check the ${TEST_STATUS} variable (filled by Robot engine). If the test if FAILED, then I create a backup folder where I backup a lot of data on the current state of the SUT (log files, audit files, configuration files, database content etc.).

On tolerance, in my previous company we were doing financial computation, and from version to version, the results of some functions could differ very slightly due to change in algorithm or compiler. So we agreed with Product Management team on threshold that were accepted and created smart comparison keyword that were taking 3 arguments: expected result, actual result, threshold. That made the tests way more stable.

On tips, the “wait for” rather than “sleep” tip is a one I use all the time. First because most of the fixed-time-sleep should be prohibited in the tests for performance reason (see my post on performance of tests) and then because sleep might be too long or too short on some systems. Another keyword/pattern I use in Robot Framework in the “wait until keyword succeeds“. This is the “wait for” applied to any keyword. When there are some timing issues in the tests, this keyword can be very very handy.

Finally, setup and teardown are a keystone of stable tests. They should be extracted from the tests and should leave the system in a state where any other test can be run afterward. This does not mean to leave the system in a identical state than when the test started (that could be very time consuming depending on the SUT and the actions performed by the tests). But we should be in a “known state” in which we know the following tests will be able to run OK.

All that being said, I have a couple of instable tests, so I’d better apply all those rules quickly…

Robot Framework and Excel

It can be useful to read Excel documents within test cases for two reasons : get some input for a test and check the output of a test when an Excel document is produced by the SUT. I propose to discuss quickly those 2 cases and see how easy it is to achieve it with Robot Framework.

The main reason I see for Excel document to be a good candidate for test cases input data is that Excel sheets are able to do computations (quite complicated ones in fact). In my previous job, the expected of our test cases were often some financial computation (derivate products, rates etc.) and the tool of choice for Product Management and QA team to compute the expected of different use cases was Excel. When we chose to automate those examples, we basically copy/pasted the results from Excel to our testing tool. Until one day we thought it would be even more convenient to read the Excel file directly from our testing tool. To be honest, the idea was stolen from a demo of a tester building a data-driven tests framework based on Quality Center and Excel (yes, I know…)

The other side of the coin is when the SUT is producing Excel documents that we want to check. In that case, the expected could be stored in our tests, or it could be another XLS file (and we could compare results found in both).

So, here is some Robot Framework code that you could use as a start:

*** Settings ***
library  openpyxl.reader.excel
*** variables ***
${expected_value}  expected result
*** test cases ***
Excel sheet
    ${wb} = load_workbook sample.xlsx
    ${ws} = set variable ${wb.get_active_sheet()}
    ${cell} = set variable ${ws.cell('A1')}
    ${actual_value} =  set variable ${cell.value}
    should be equal  ${expected_value}  ${actual_value}

Here I open the document sample.xlsx and check that the value in the first cell match a value that I stored in a variable. I am using the openpyxl library (warning : can read only XLSdocuments). This is, IMHO, quite compact and simple code. From there, you can write your own keyword to parse the document according to your needs.

Some final thoughts :

  1. Robot Framework text format for test suites/cases is great as it fits perfectly in SCM. And this is where we want to store our tests artifacts (quote from a great article from Elisabeth Hendrickson : “Technical artifacts including test automation and manual regression test scripts belong in the Source Control System versioned with the associated code”). Unfortunately, Excel document (even in the XLSX format, which is XML… but zipped!) are binary files and if we store them in SCM, we won’t be able to compare different versions. That’s a major limit to their use.
  2. as I swim in the open source world all day long, I could have come up with a post about how to read some files that have a more open format than XLS…. The thing is that sometimes we don’t have the choice of the tools (example, customers are asking for export as XLSX in my product, I will have to test it one way or another) and that is the tool with which I had the experience I wanted to share.
  3. It is interesting to see that this Excel topic is discussed for many testing tools. Some are reading Excel from Junit, some other from Cucumber and there was a proposition to include it by default in Robot Framework.

Oh and thanks to Jean-Charles and Matthieu for the testing dojo in which we came up with this little snippet of code !

Robot Framework, Selenium, Jenkins, and XVFB

Until recently, my test cases written with Robot Framework were not driven by the GUI. Those tests were interacting with the SUT using many different ways: process library, database access, file system, REST requests etc. Those access usually don’t change too often during the lifetime of a product as they often become a contract for users/tools communicating with product (i.e. API). Whereas driving the SUT through the UI can be brittle and engage the team in a constant refactoring of the tests. This is summarized in the famous pyramid of tests. But, having no automated GUI tests, my pyramid was headless! So I decided to invest a bit in the topic.

From there I had the choice between :

  • writing tc with Robot using the Robot Framework Selenium 2 library
    pro: integrated with other existing Robot tc + easy to start
    con: library a bit behind the official one + selenium users are mostly not using this API
  • writing tc in Java using the Selenium 2 Java binding
    pro: up to date + large community of users
    con: another set of tools to write/launch/report test cases in addition to Robot

I took the easy path and went for the Robot Library. There is an excellent demo of the library which helps to start-up. Before digging into the tests of my SUT, I added another 2 steps:

  1. going a bit further in my knowledge of Chrome Developer Tool. There is an excellent online course by Code School. Those tools are soon becoming very useful when creating GUI driven test case. Note that, Firefox dev console seems to offer the same service, but somehow I feel more ease with Chrome ones.
  2. running a testing dojo with a group of peers. We used Jenkins as SUT (as I already did to perform some stress test trials) and it worked quite well although Jenkins does not seem to be the easiest web app to test as we had to use XPATH to locate element which can become obsolete quite fast.

I was then able to create my first tests on the SUT I am testing. Those were more easy as expected as all the element I had to address had unique IDs! The biggest obstacle is to chain actions without using the horrible sleep keyword (see Martin again). With the Robot Selenium library, the easiest way is to use (and abuse) the “wait until page contains element” keyword.

Last but not least, I needed to launch the tests using Jenkins (very easy and quite nice with the plugin) on a linux VM…. without a display! Here enters PhantomJS that is a headless browser supported by the Robot Selenium library. But somehow, my tests were randomly failing with it, so I gave it up and went back to Firefox. And here is the last character of this little story: xvfb. This display server sends the graphic to memory so no need to have a full/real display available. The install is not immediate but some shared their feedback on it so it is quite smooth after all.

So the current configuration  of my Robot Selenium tc in Jenkins is:

  • xvfb running on the VM on which tests are going sending display to :89
  • in Jenkins, in my Robot task, in “Build Environment / Set environment variable” I put “DISPLAY=:89”

And that’s it, tests are running OK on this VM. And on the top of that, when there is a failure in the tests, the Robot Selenium lib is able to perform a screenshot and save it to disk to ease analysis of the issue!

Improve the performance of your automated tests

If you google “performance of automated tests”, you will get loads of articles about “automated performance tests” but very few about how to speed-up the execution of your test portfolio. Although this is one of the goals of a test suite: it should give a quick feedback. Of course we don’t expect the same performance for unit tests (measured in seconds or minutes) than for larger system tests (measured in dozens of minutes to several hours probably). But at any layer of the test pyramid, having faster tests is an advantage.

Here is a collection of ways to enhance the performance of the automated tests:

1) Execute Tests in Parallel
If the tests are independent (as they should be), you could split your test portfolio in pieces and run the different parts in parallel. You can start on the machine on which your tests are already running as it probably has a multi-core processor that could run different “testing threads” in parallel. In that case, beware to customize each install/set-up so that the different instances of your products don’t step on each other foot. You can also send the execution to several machines in your lab. Here are a couple of articles related to this topic:
– Pivotal Labs on how to parallelize RSpec tests
– blog post on how to parallelize JUnit tests
– How to distribute Selenium tests on multiple machine with Selenium Grid

2) Avoid Sleep
On the higher levels of the testing pyramid, we are testing user scenarios and our scripts might need some pauses in order to run successfully. For example, on the filesystem level, we could have to wait for a log file to be created before checking its content. On the UI level, we might need to wait for a button to be here to click it. An easy way to cope with this is to add sleep() all over the tests scripts until they pass. That, of course, should be avoided as much as possible. First reason is because it makes the tests brittle: when we will run the test on a slower machine the test might fail. Another reason is that pilling up those sleep() will make the test very slow. So we should use any kind of wait() instead that would regularly check for an object/event to be there. Here is how to do it in Selenium and Robot Framework:
– Presentation of WebDriverWait by Mozilla
– Wait until keyword succeeds in Robot Framework

3) Share Setup and Teardown
“Every test case should be independent” does not mean that every test case should handle a full setup and a full teardown. A good pattern is to share the setup and teardown on different levels to enhance the performance of the portfolio. For example, we could have a global setup that deploy a MySQL database that some tests will use later on. We could have another shared setup for a group of a dozen of test cases that add some lines in a table of that database. Finally each test case will finish its own setup by tweaking the database again before doing the test itself. The trick is to share the setup among a group of test cases that won’t modify what the setup configured! This is very convenient and easy to do with Robot Framework and with JUnit.

4) Focus your Optimization Effort
Another way to look at the test performance issue could be to start by identifying the tests or the functions/methods/keywords that are the more time consuming over your whole portfolio. Focussing your effort on those parts could lead to quick wins. Here are two examples:
– My humble code to measure most expensive keywords in a Robot Framework test suite
– A smart XSL on Stackoverflow to to identify the longest running unit tests on Junit

Hope this might help some,
and don’t hesitate to post a comment with other ideas/links!

EDIT : found some slides from David Gageot on the very same topic  :

Getting started with Gatling for stress test

I was looking for a stress tool to test a java product via its HTTP API.
Gatling was in my “to try list” for a couple of reasons:

I gave it a try and my first impressions were very good.
The wiki has a very good “getting started” and “first steps” sections with useful examples.
I propose here to show an alternate set of “first steps” by stress-testing Jenkins.

Here are my steps (on a Mac Book Pro with os x 10.8.3)

1) installation and configuration of jenkins

I discovered there is a one click install of Jenkins for MacOsX.
Jenkins is automatically launched and ready to be used.
Server can be accessed over HTTP on: http://localhost:8080/
I created a fake job that does nothing and launched it a couple of time to have some history:


2) installation of Gatling

I chose to download the 1.5.1 instead of the 2.0 as this last one is still under development and because 1.5.1 seems to have the feature I need.
I followed the “getting started” wiki page
Basically the installation and configuration was, here again, very quick:

  • unzip
  • launch 3 shell command :

    sudo sysctl -w kern.maxfilesperproc=200000
    sudo sysctl -w kern.maxfiles=200000
    sudo sysctl -w net.inet.ip.portrange.first=1024

and that’s it!

3) preparation of the simulation

For our test of jenkins, we will use the recorder to write the simulation script.
The recorder will allow us to create a script that mimick what users could do on Jenkins.
For this simple example, we will just click on the job, click on a build and click on “console output” for this build.

First, we set-up Gatling’s recorder:


Now that Gatling recorder is ready :

This creates a scala script in /path/to/gatling/user-files/simulations/Jenkins
You can take a look a the generated code (I added some comments in the file)

4) stress testing jenkins

If we launch the script as it, we will simulate the connection of a single user.
The reports gives a global view of the result of the test with the stats for the 22 requests performed.


To see what happens when 500 users connect simultaneously on my server, I just have to change the last part of the script to:


The result shows that half of the request1 performed failed (260 out of 240).


Failures are caused because Gatling will consider a request times out after 60 seconds.
Note that this timeout is defined in path/to/gatling/conf/gatling/conf

To be a little more realistic, we can simulate the fact that all the users are connecting gradually during 1 minutes (the whole 100 people of the R&D are checking the console output of the main job from 8h00 to 8h01 in the morning for example !). This can be scripted like that:


In that case, only 20 requests failed and other parts of the reports are interesting to look at.

  • Number of active active sessions:


=> we can see the growth during 1 minute and gradual shutdown

  • Number of requests per seconds


=> we can see that the 20 requests that fails are in the middle of the simulation when we reach the maximum number of active sessions

5) conclusion

This test is not very realistic for many reasons like:

  • there is no network involved as I do everything on my laptop
  • Jenkins user will retrieve bigger content on jenkins and have different behaviors

Yet, it is a good start and quite easy to perform.
Next steps would use more advanced features of Gatling.

Would you have any questions or experience to share, don’t hesitate to contact me.
If I have more to share on the topic, expect a follow-up.

MOOC on software testing

Not a day without Mooc being mentioned. Thought it was about time to try one. So I just started Udacity course on software testing. And it was quite a surprise to see how the topic is opened up. No long definition of the different types of tests (black box, white box, exploratory, manual, automated…). The first examples are unit-test-like ! the first tester is the developer and the first test are codes, hence (automated) unit-like tests. That can seem obvious but general acceptance of software testing is a bit different. Here is a section of the wikipedia article about software testing :

“Software testing, depending on the testing method employed, can be implemented at any time in the development process. Traditionally most of the test effort occurs after the requirements have been defined and the coding process has been completed, but in the Agile approaches most of the test effort is on-going”

So the whole Agile/XP way of programming is becoming so mainstream that it is not mentioned and is just the way software testing is presented.

Will see how is the rest of the course goes…

Robot Framework Introduction

Did a little presentation of Robot Framework at Human Talk Grenoble about Robot Framework. Quite difficult to sell that kind of tool to an audience into web development with no or few experience of tests. This is more easy when talking to bigger team working on more traditional (old) softwares. Have to understand more about the possibilities and constraint of web development to understand how Robot could be used for the business/functional logic of the web projects.

GRILOG conference about software quality

On october 18th, GRILOG (Grenoble Isère Logiciel) organized a mini-conference about software testing. There were six lightning talks performed by different Grenoble area software actors.

Agility was not mentioned in the topic of the night, nevertheless, most of the presenters agreed on numerous notions that are central to Agile Testing :
– quality/tests are not handled at the end at the project (Eric Pierrel, Itris Automation)
– to have feedbacks very early in the project (Anne Pellegrin, Multicom). She explained the Wizard of Oz experiment ! Very interesting. Made me think about Pretotyping by Alberto Savoia.
– tests are driving the projects (Remy Dujardin, Hardis) and the whole project team should be involved in it.

So it is interesting to see how Agile was over-represented once we started to talk about quality in software. Agile is still a big topic in the area. A recent prove is the craze to get the tickets for Agile Grenoble 2012 conference. It sold out 2 weeks before the events. 500 tickets for a city like Grenoble is quite an indicator.

Slides and video of the presentations (all in french) are available online.

After the  presentation, I had the chance to chat a bit with a local university teacher in software engineering. He was a bit skeptical about this “whole Agile thing” and was happy to outsource this part of the program to an external presenter… a guy from a local IT department. So, academic and business do talk together sometimes….(“please, help me present Agile !”)