Visual Regression Tests

The first rule of “UI automated regression tests” is “You do not perform any UI automated regression tests”.

The second rule of “UI automated regression tests” is “In some contexts, if you really know what you are doing, then some automated regression tests performed via the UI can be relevant”.

Martin Fowler explained it very well in his Testing Pyramid article: “a common problem is that teams conflate the concepts of end-to-end tests, UI tests, and customer facing tests. These are all orthogonal characteristics. For example a rich javascript UI should have most of its UI behaviour tested with javascript unit tests using something like Jasmine”. So the top of a testing pyramid should be:

  • built on top of a portfolio of unit and service/component/api tests (that should include some tests focussed on the UI layer)
  • a (small) set of end-2-end tests performed via the GUI to check that we did not miss anything with our more focussed tests.

For those end-2-end tests, the usual suspect in the Open Source scene is Selenium. Driving the browser through our app via the most common paths is a good way to gain some final confidence on the SUT. But one should really understand that what Selenium will check is the presence of some elements on the page and the events associated with some elements. “if I fill this input box and click on that button then I expect to see a table with this and that strings in it”, but it does not check the visual aspect of the page. To put it another way, with Selenium we are checking the nervous system of our SUT, but not its skin. Here come the visual regression testing tools.

Thoughtworks Radar raised their level from “assess” to “trial” on “visual regression testing tools last July with this comment: “Growing complexity in web applications has increased the awareness that appearance should be tested in addition to functionality. This has given rise to a variety of visual regression testing tools, including CSS Critic, dpxdt, Huxley, PhantomCSS, and Wraith. Techniques range from straightforward assertions of CSS values to actual screenshot comparison. While this is a field still in active development we believe that testing for visual regressions should be added to Continuous Delivery pipelines.”

So I wanted to give it a try and did a quick survey of the tools available. I wanted to know which project were still active, what was their langage/ecosystem and what browser were supported. Here is list I just built:

My shopping list was:

  • Python, as I am already using Robot Framework in this langage
  • Support for another browser than PhantomJS because my SUT does not render very well under PhantomJS at the moment

So I chose needle which, according to its author is “still being maintained, but I no longer use it for anything”. I gave it a try and it works well indeed. I now have a basic smoke test trying to catch visual regression tests. More on that soon if I have more feedback to share.

Create Jenkins Job for Robot Framework

Once you have created your first tests in Robot Framework, next step is to include those tests in your Continuous Integration (CI) System. Here I will show the different steps to do so in Jenkins.

Let’s assume you have

jenkins_begin

First we create a new job to launch our Robot tests:

create_job

Once the job is created, we configure it.

  1. set up Source Code Management for the source code of the tests:jenkins_svn
  2. set up a first “Build Trigger” on the success of the job that builds the SUT:trigger_build
  3. set up a second “Build Trigger” on changes in the Source Code of the tests:trigger_scm
    This way your tests will be launched either if there is a new build of the SUT or if your tests have changed. Second trigger is relevant because some modifications in your tests may have broken some of them, and you don’t want to wait for the next build of the SUT to find it out. In other words, when a test fails in Jenkins, it is good to know if this is a consequence of a change in the SUT of a change in the tests (if both changed, analysis will be tricker).
  4. get the artefact from the project that build your SUT so that your SUT is available from your Jenkins’ workspace where the Robot tests will be run. To do so you can either use the Jenkins Copy Artifact Plugin or write a piece of Batch/Shell script.screenshot-copy-artefacts
  5. The comes the step in which Robot tests are going to be launched. For this you create a “Execute Shell” Build step that contains, at least:
    pybot path/to/my_tests/

    and all the –variable, –include, –exclude etc. that you use to customise you run.
    One noteworthy command line option in the context of a CI server is –NoStatusRC, which force Robot’s return code to zero even when there is test that fails. This way the status of the Jenkins build can be driven by Robot Jenkins Plugin like you will see in final step.build_robot

  6. Finally, to have a more granular settings of the results of the tests, and keep a copy of the report/log of the test executions located in the Jenkins Server, you can use Robot Framework Plugin. Once the plugin is installed, it will be available in the list of “Post Builds Actions”. A simple configuration would be like this:pluginand after a couple of builds, the project page would look like that:plugin_project_page

Once this basic setup is working, you will find out many options in Jenkins and Robot Framework to get more value out of it. To give just one example, once the test portfolio becomes large and/or long, you might find out that this is not efficient to launch the full regression suite at once when there is a change in the SUT or the tests’ code. A good strategy is to have 2 Jenkins job. The first one (“smoke tests”) is running only a portion of the whole suite that runs quickly (say 5/10 minutes for example):

pybot path/to/my_tests/ --include smoke --exclude not_ready

and the second job (“full tests”) launches all the tests:

pybot path/to/my_tests/ --exclude not_ready

but is launched only when smoke tests are run successfully: chain_builds

so if your SUT or your tests have some essential feature (covered in the smoke test) broken, you will save your machine a “full test”, and, more important, the team have a quicker feedback on the quality of the SUT build.

 

What editor for Robot Framework test cases

Robot Framework home page lists a number of plugins to edit Robot Framework test cases along with Robot’s own editor RIDE. Here is a feedback based on my experience with some of those tools.

RIDE was my first choice four years ago when I started using Robot. We were a team of 20 quality engineers working on Windows machine. Some of us had a technical background and some others were more on the banking business side (we were producing a financial software). RIDE ended up being a good choice, specially for the non-tech people as it hided part of the grammar of test cases: Suite Setup, Tags, Library etc. are all input boxes in a GUI. The ability to launch test cases from Ride was also very handy and kept us away from the command-line (which most of the team was never using). Overall we had a very good experience with RIDE and I would recommend it in a similar context.

TextMate (with its Robot’s bundle) was the editor I switched to when I switched to Mac. There were 2 motivations to move away from RIDE. First one is that RIDE is not very slick on Mac (there are several issues opened for a while) and even install is a bit complicated with WxPython and Python versions collisions. Second motivation was that I joined a more technical company where manipulating the source code of the test was much more frequent (e.g. editing the test case on a remote VM via SSH with vi, merging changes done by another tester…). So I didn’t want anymore to have a GUI layer hiding the source code of the tests. I chose TextMate because it was free, lightweight and worked at once with Robot and SVN. After some time, I started to miss keyword completion and quick access to keywords source code though.

PyCharm is the editor I am using for a couple of weeks now. This time the switch was motivated by some limitation of TextMate (see before) and also by looking over my colleague’s  shoulder who were getting happier and happier with PyCharm. Looks like  JetBrains’s IDE is getting momentum in the Python  community as I hear/read more and more about it. There are currently 2 competing plugins for Robot Framework: Robot Plugin and Intellibot. Both are providing Syntax Highlighting, Code Completion, Jump to Source with some little differences. Best thing is to try them both and see which one fits best.

A side note about this editor topic. when I moved to PyCharm, the amount of syntax checking became one level higher than on previous editor, and I was bothered by the fact that all my TXT files were getting analysed by the plugin (making my non-Robot TXT files uneasy to read). So I changed the extensions of my Robot Test Cases and libraries from .txt from .robot. This way I can configure Robot Plugin to affect only my Robot files and not all the classical text files.

 

 

Retour sur la JFTL 2014

la JFTL est une conférence organisée par le CFTL, association qui propose des certifications de tests logiciels. Cette année, pour la 6e édition, il y avait 500 personnes inscrites pour la journée. Dans les diapos d’introduction on nous informe que la population des visiteurs est bien ventilée entre management et opérationnel. Voici un rapide retour sur les sessions que j’ai pu voir.

Présentation intéressante de la gestion des tests de performance sur un projet RATP. Le SUT est une web app à destination des conducteurs de bus. Méthodologie classique teintée d’Agile. Les présentateurs ont partagé leur souhaits de faire des tests de perf en continu, mais ont avoué avoir eu beaucoup de mal à le faire (pour pouvoir faire les tests de perf, il faut déjà que le SUT soit fonctionnellement correct, d’ou besoin d’attendre la fin des sprints ou le sprint suivant). Côté outil: Gatling pour les développeurs et NeoLoad pour les testeurs.

SmartTesting a présenté sa solution Zest. Il s’agit d’un outil d’écriture de tests fonctionnels en ligne. Avec cet outil, on va pouvoir créer progressivement un DSL (ensemble de mots d’actions) que l’on veut utiliser dans des scénarios de tests. La plate-forme aide à l’écriture (suggestion d’actions lors de la frappe) et au refactoring (renommer des actions, créer des actions pour des motifs récurrents). Si l’on veut automatiser ces tests, on peut récupérer les tests en XML que l’on n’aura “plus qu’à” traduire dans le langage/framework de tests automatique de son choix. Je reste assez perplexe devant ce choix de ne pas proposer en natif une solution qui permette directement l’execution des tests comme il est possible de le faire avec des Cucumber, Fitness et Robot Framework.

Pages Jaunes a fait un retour d’experience sur l’utilisation de MaTeLo (de All4Tec) pour tester quelques features du site pagesjaunes.fr. L’outil MaTeLo propose du Model Based Testing. On peut importer ses exigences, décrire les états de son applis et générer des diagramme/flux en utilisant divers algorithme. Une fois les scénarios de tests générés, on peut les automatiser en Selenium. Là encore, je n’accroche pas trop sur l’outil. Pas de commentaire sur le côté mapping exigence/test case qui me semble un peu lourd (mais compréhensible dans de grosses organisations avec MOA, MOE et autres prestataires…). Par contre, générer automatiquement des dizaines/centaines de scénarios de tests qu’on va “automatiquement” exporter en Java/Selenium est contraire aux bonnes pratique d’automatisation qui commande plutôt d’automatiser autant que possible les comportement business “sous la UI” (voir cet article par exemple)

Enfin, présentation un peu poussive sur la génération de données de tests. Sujet intéressant à priori, mais quand je commence à entendre parler de “cellule de génération de tests”, ça sent l’ultra fragmentations et spécialisation des équipes. J’aurais aimé une présentation moins magistrale, plus concrète.

Globalement une conférence peu technique sur un sujet très technique. De ce point de vue, la présentation par HP de la nouvelle version de HP ALM/QC qui promet d’automatiser tous les tests et de trouver tous les bugs (ainsi que de résoudre la faim de la monde) a semblé captiver l’auditoire alors qu’elle aurait pu pas mal prêter aux sarcasmes :-)

A noter aussi la grande place occupée par les SSII dans beaucoup de présentation. Les clients finaux ne faisaient jamais de présentations seuls, mais toujours accompagnée de leur SSII. D’ou un dialogue très poli entre client qui a une vision métier du problème et la SSI qui a un discours “on va résoudre tout vos problèmes”. Grosse absence des testeurs qui mettent les mains de cambouis. Peut être car ils étaient retenus en Inde…

Journée intéressante malgré ces quelques bémols. Merci aux organisateurs et aux présentateurs !

bash: grep: command not found

It took me a year to understand why my grep was every so often not working:

[MBP]$ ps -ef | grep openidm
-bash: grep: command not found

The explanation lies in the fact that I was typing this a bit too fast. In fact when I was typing the “space” after the “pipe” (which on mac is shit+option+L), the option key was still pressed and I ended up typing option+space instead of space. And option+space is interpreted differently from space in the terminal. Got this hint from this thread: http://hintsforums.macworld.com/showthread.php?p=644491

On solution would be to carefully release the option key before pressing the space…. but a more convenient one can be found there:
http://earthwithsun.com/questions/78245/how-to-disable-the-option-space-key-combination-for-non-breaking-spaces
I choose the configuration of iTerm2:
“I use iTerm2 for most of my work and the mapping can be added in the “Keys” preference pane, by adding a new key combination in Preferences -> Keys -> the plus button. Note when adding the key make sure to put a single space in the lower box as shown.” => works great

Mystery explained and problem solved!

Randomizing test execution order

Many testing framework offer optional randomization of test execution order. For example:
– Robot Framework with –randomize
– Rspec with –order random

I consider this option as very useful and use it by default for all the automated tests portfolio I run. The advantages I see are:

– we detect ordering dependencies as soon as possible.  If we execute tests A and B always in the same order, test B could work only because test A left the system in a state that is used by test B. If one day we invert the order (by renaming the tests for example, if order depend on alphabetical order) then the suite will fail and it will take us some time to understand the problem (because test A and B were maybe written months/years ago). This happens also if we insert a new test between A and B or refactor test A or B. If we run the tests in random order all the time, we will detect this issue very soon.

– we might detect bugs in the SUT that appear only in some specific sequence of actions that a random order of test could meet with luck. The problem then could be how to duplicate the bug we just bumped into. For Rspec, randomness can have predictability via seeding.  JUnit had a randomization problem when Java 7 came out, and had to think that over and came up with a deterministic but random order. There is no such thing for Robot Framework so we have to manually reproduce the test order that caused the failure.

– we won’t always run the same tests first. And usually when we read a test report, we start by the top and analyse error one by one. This could help us to not analyze the “Access”, “Audit” and “Authentication” tests first all the time…

One could argue that using a fixed test execution order is useful to run some smoke/sanity tests first, and then the rest of the porfolio. I think in that case, it is better to split that in 2 different jobs. A first “smoke test” job that runs quickly (5-10 minutes) and another “full test” that can take several hours. In Robot Framework, this can be easily achieved using tags.

Another reason to push for fixed tests execution order could be performance optimisation. Test A could be preparing the system for test B to start and when test B ends, the system is ready for test C to go. One of the reason this is a bad pattern is that you won’t be able to run only test B or only test C! If the setup of a given test is in the previous test, then you are doomed to run always the full portfolio. This is simply not bearable. When a full portfolio detect a couple of failed tests, we want to be able to run those tests once more to double-check they are failing and then start to analyse the problem.

We could also introduce randomness in the test themselves, but this is another topic… for a future post!

Live-coding session of Robot Framework on Mac

Back in november, I hosted a talk at SoftShake. Like mentioned in my feedback post, I included some live-coding during the session. For the record, I’d like to share the set of tools I used to be comfortable doing this Robot Framework live-coding on my Mac.

1) Screen resolution: once plugged to the projector, the resolution available suddenly become very low. So it is important to rehearse the session using this low resolution, otherwise it will be very disturbing on the actual day. Choosing the largest display in Preference should get a resolution close to the projector if we don’t have access to one.

resolution

2) Window manager: I use Spectacle to quickly change windows arrangement setup. There are keyboard shortcuts to display windows on full, half or quarter of the screen. Once you get used to it, you can very quickly set 4 different windows in a couple of seconds on the screen.

spectacle

3) Terminal  : I use iTerm2. In order to quickly get 4 small shell windows, I place them on the screen using Spectacle and save the window arrangement in iTerm2. Then I can associate a keyboard shortcut to this arrangement to get it very quickly during the presentation.

iterm2 restore windows arrangment

iterm2

4) Text Editor. I use Textmate 2 (pre-built version here) with the Robot Framework Bundle. The bundle offers some keyboard shortcuts like *s⇥ which insert the *** settings *** header and nice color highlighting . There is no keyword completion though…

textmate

 

5) Live execution of the test. I wanted to obtain an effect similar to infinitest plugin with which each time a change is made on the source code, Infinitest runs the tests in the IDE. So I installed fswatch to run a script every time a directory content would be modified (i.e. every time my test would be updated) and created a small launch_robot.sh script that would be launch in such an event (script does clear + launch Robot).

fswatch

 

So, I chosed to display 4 windows during the live-coding:
1) top-left: live-execution of Robot Framework
2) top-right: source code of the Test modified live
3) bottom-left: Software Under Test. Jenkins for this session (either file structure or the UI)
4) bottom-right: web browser with Robot Framework Library documentations to show the keywords I used in my code.

live

 

During the session, the test regularly went from green to red when I added a not-yet-created keyword or used a keyword in a library I forgot to import. So the idea was to save the file as often as possible to see the result change color at the same time.

An idea for a future session could be to do the full BDD-TDD test-code-refactor cycle:
– show a spec of dev to do
– write the failing functional test with Robot
– write the failing unit test in Java with any unit test framework
– write the minimal code to get the unit test green
– refactor the code and the test
– launch the functional test and update the code to make it green if it is still red

This could even be performed by 2 people: one QA and one dev… Will see if I ever try this!

How to solve intermittent issues and get more robust automated tests

In the last episode of their podcast, Trish Khoo and Bruce McLeod discussed how to solve intermittent issues in automated test suites. Trish also made a presentation on this topic at a Selenium Conference. In those both media, Trish and Bruce go over different topics:

1) test design
2) logging
3) tolerance
4) have stable systems
5) some tips (prefer “wait for” to “sleep” for example)

I recommend you to consume those 2 presentations. Here are some notes on those topics with some comments on how I approach them with Robot Framework.

On test design, as automated tests are code, all the principle of clean code should be applied to tests: small functions, single responsibility, DRY, no side-effect. There is a specific chapter on “clean tests” in “clean code” by Uncle Bob where he discuss readability, use of DSL, single assert per tests and FIRST principle. In Robot Framework, creation of keywords is so quick and easy, that I tend to create as many as needed until I obtain test cases which are 10 lines long max with given-when-then sections clearly identified.

On logging, in Robot Framework, I have a keyword that I launch in the Teardown of every single test that check the ${TEST_STATUS} variable (filled by Robot engine). If the test if FAILED, then I create a backup folder where I backup a lot of data on the current state of the SUT (log files, audit files, configuration files, database content etc.).

On tolerance, in my previous company we were doing financial computation, and from version to version, the results of some functions could differ very slightly due to change in algorithm or compiler. So we agreed with Product Management team on threshold that were accepted and created smart comparison keyword that were taking 3 arguments: expected result, actual result, threshold. That made the tests way more stable.

On tips, the “wait for” rather than “sleep” tip is a one I use all the time. First because most of the fixed-time-sleep should be prohibited in the tests for performance reason (see my post on performance of tests) and then because sleep might be too long or too short on some systems. Another keyword/pattern I use in Robot Framework in the “wait until keyword succeeds“. This is the “wait for” applied to any keyword. When there are some timing issues in the tests, this keyword can be very very handy.

Finally, setup and teardown are a keystone of stable tests. They should be extracted from the tests and should leave the system in a state where any other test can be run afterward. This does not mean to leave the system in a identical state than when the test started (that could be very time consuming depending on the SUT and the actions performed by the tests). But we should be in a “known state” in which we know the following tests will be able to run OK.

All that being said, I have a couple of instable tests, so I’d better apply all those rules quickly…

Robot Framework Test Automation – The Book

Packt Publishing recently released a new book about  “Robot Framework Test Automation” by Sumit Bisht. The book is a quick and easy 100 pages read that can be useful to those who find the user guide a bit to dry.

First chapter helps go through the installation steps but fails at giving a clear picture of Robot Framework ecosystem: a diagram shows a “testing tool” component being the one that actually target the SUT while Robot Framework is more on the user end side as a coordinator. This is just one way to use Robot (a way that is later described more in details in section about Robot+Selenium and Robot+Sikuli). But Robot can be used to test directly the SUT through its native libraries or some custom libraries.

Second chapter is quite clear about the different file format and the organization of the test portfolio. Third chapter treats the actual creation of test cases with some data about syntax and libraries, but it lacks some examples going from the easiest to more complex cases. Chapter four discuss Robot association with other testing tools (like mentioned before) and finally the last chapter helps generating standard and custom reports.

A topic that is not covered is who is the “ideal” Robot Framework user. It is implied in chapter three that “developer and stakeholders” will collaborate in writing the tests. My experience is rather than a tool like Robot Framework is a very good fit for a QA/testers team that is distinct from dev and stakeholders. For such a team it makes sense to have a specific tool/DSL for testing rather than coding the functional/acceptance tests in the langage of the product.

Globally though the book is a good read, it fails at being a real “missing manual” compared to what the User Guide already offers. The book would maybe have benefit from taking an alternate approach with much more example, like a cookbook, and more experience from the field. Anyway, thanks to Sumit and Packt for the effort in sharing some knowledge about Robot Framework!

Agile Grenoble 2013

Last week I went to Agile Grenoble Conference. The whole day was of very good quality. I was lucky with my choice of sessions and shared my day between good discussions and great talks.

My menu was:
Kanban by Romain Couturier
Testing and refactoring legacy code by Sandro Mancuso
-
 Functional Programming by Neil Ford
-
 TDD by Michael Borde
-
 Angular + Jersey by Laurent Leseigneur and Olivier Cuenot

All of them were very interesting but I was really bluffed by Sandro’s performance.
His slides can be seen here (with video at the very last slide):

That is really the kind of session I am looking for:
1) just a couple of main messages (“testing from shortest branch, refactoring from deepest one) during 5 minutes
2) live coding with some best practices, tips and personal opinions
And this was also very motivating to become more fluent with an IDE as part of the performance was the advanced usage of Eclipse, which for a Java developer is a real add-on.

Before the keynote of the afternoon, organizers found a great way to read the whole 12 principles of the Agile manifesto to the full audience of the conference. Strangely enough, I think it was the first time I heard the 12 principles read totally during the conference after 5 editions. The idea was that every person in the room had received a cup with 3 sentences from the manifesto. When our sentences were read, we got to sit. After the 12, a group of 10 people remained standing and they had to go on stage to read the sentence they had on their cup. Those were fake sentences like “company result is the primary measure of progress”. Very clever game!

This year I was a full visitor: I was neither organizing nor presenting. So thanks to all the contributors of this great day and I hope to give a help one way or another next year!