Lessons learned: Improving product Quality via Improved Continuous Integration (CI)
Here’s how we “sold” QA automation to developers and product people, and improved our release cycle.
- It’s better to have versatile teams than specialized ones
- We iteratively progress towards “release anything anytime”
- Tips for prioritizing your automation efforts
- How to fight flaky tests
Chances are, you’ve heard something like this before: “Our team is too small a team to write UI tests, there’s no time for that now. We’ll catch up later.” Or, “We’ll cover everything with UI/end-to-end tests once the product is ready/finished.” I know I have…and I’m probably guilty of saying something like these things a few times over my career.
Showmax grew really quickly, going from idea to being a substantial SVOD platform, in essentially no time at all. Even as we grew out of our first apartment-slash-office, the pace was still very quick and even the thought processes behind “slowing down” to write some UI automation just weren’t there.
We got to the point, however, at which continuing to postpone automated testing just wasn’t something we could do anymore. The few UI tests we had were flaky, took a lot of time, and did not really test that much. So, of course, no one really ran them. Adding new UI tests was nearly impossible without making them hard-wired to the current state (not desired), and the stability was still lousy (hint: Test-Driven Development with UI tests in mind makes perfect sense, too).
On top of all of this, our build times were long and there were multiple steps needed to get the application into the wild.
Put simply, the biggest issue we had back then was speed of delivery. The ability to deliver quickly depends heavily on code quality and test coverage (UI tests included) of the application to be delivered, and a functioning Continuous Integration (CI) should be your friend, helping with both of these crucial things. However, it is not really functioning if its state is as described above.
How to solve it
There is no simple “ten steps to do this and that” guide. Also, you won’t need any specific technology or methodology to achieve it, either.
What is essential, though, is the belief that it will help you going further and that it really makes sense to invest. If you’re heading this direction because you want or need that 100% coverage, forget it. If you want to show off your flashing testfarm, that’s not the best goal either. You mostly need dedication, trust, encouragement, and - most importantly - the people capable of providing these things.
We decided to focus on automation to improve our CI to ease the burden on our Quality Assurance (QA) department, but also (of course) to improve the general quality of our product. But again, every great idea needs the resources to execute it.
> Dedicate a person (or people!) as responsible for your QA engineering
This is no part-time job - especially at the beginning. There is a lot of work to be done, and it requires not just a dedicated person, it needs a dedicated leadership team that supports people working on the “invisible” work that brings no imminent, obvious value to your customers (your employees are also customers, but that’s a topic for later).
> Make this person part of the team
If you build a specialized automation/QA team, or delegate the work on your CI to your Ops team, it’s all but certain that you just create silos. QA will always, subconsciously or otherwise, stand in the way of DEV delivery. The refrains are (almost) always the same: QA is blocking us from this or that; Ops team have their infrastructure broken again; he slowed down the build again.
> Ensure that the team understands the benefits of your initiative
Everyone understands that faster builds are good. However, increased test coverage usually means longer build times. Before impeding the workflow with new steps in the process (e.g. slow UI tests; new “obstacles” in the pipeline), make sure your team is aware of the benefits of these new things. Shorter release cycles, increased confidence in quality, ability to refactor without fear of breaking something unexpected, and so on.
Your CI is a crucial part of the business - make it reliable.
> Keep your CI stable
Say you want to include your UI tests in the pipeline. If there are some flaky tests, or if some builds randomly fail, there is a high risk that the team will start ignoring the failures and consider them as a normal state of the build. This noise in the test results can actually overshadow a “properly” failing build that will slip through unnoticed as just another flaky result.
> Improve your process incrementally
It’s much better to start slow and low, and even step back from the original coverage. Exclude from your CI all tests that are flaky, slow, or of questionable benefit. Create a ticket for each troublesome test to rewrite the test and/or the respective part of your application, and make sure you prioritize it before you move on with new feature requests.
In general, each new step adds some friction to the existing setup. Make sure you do not add too much friction with a single addition, as it might block the delivery, annoy the team, or both.
> Increase your certainty
Once your builds are stable, you can add new checks or tools to improve the coverage. Add a linter or other static code analysis as a build requirement, increase the thresholds in your quality metrics, increase test coverage, add new test environments, or similar.
Scale up patiently
When you feel confident enough with the stability and speed of your CI, try to level it up by scaling your current solution. While the small additional steps mentioned above usually add some new restrictions to the process, scaling your solution will keep you alert so you can avoid slowing your process.
It will lead to an iterative process of finding bottlenecks or other possibilities for improvements while carefully expanding your coverage. The more tests (and builds) you run, the more you want to keep them efficient and quick. There’s also a political element here, as this can act as a good justification for the overall initiative with product and leadership teams.
> Increase your build frequency
Increasing your build frequency will expose the superfluous or tedious parts of your flow, inspiring those now heavily-involved in getting rid of them. It will force everybody to think twice about what really needs to be executed with each run.
> Save UI tests for your core features
People are good at creative work, machines at repetitive work. Let your people explore new issues instead of validating the ones that are already known - leave those to your CI. Cover only the parts that need to be covered, not those that could be covered. Go the 80:20 way - 20% coverage is a perfect solution when covering the essentials.
> Expand your test scenarios
Once confident with the setup, add an extra layer of difficulty with sub-optimal conditions. Investigate those not-so-happy paths by adding slow test devices or running in an unstable environment. This will further enhance the stability of your tests and product.
Does it work?
It definitely did for us.
In the beginning, we did a full regression QA for the release candidate (RC) builds. The time between the creation of an RC build and the actual release was hard to predict and very long, usually due to rebuilds. We wanted to improve the speed of delivery by having our master branch releasable instantly.
To achieve this, we decided to test each feature branch before it was merged. This greatly increased the amount of QA work as there are usually dozens of “feature branches” merged into each release. But, thanks to the improved build times and - most importantly - the automating of the regression test suite, we could keep up with it.
To back it up with some numbers - our QA cycle (from RC build to publishing the build in Google Play) was shortened by 30%. We achieved even better results when looking at the number of hotfixes/rebuilds of the RC before the release - those went down by 45%.
To put that in perspective, we went from one RC build in two weeks, to several dozens of builds a week, hand-in-hand with reduced number of hotfixes of released builds. Our UI tests went from 20 (which never ran regularly) to 90 executed with each commit, and build times were reduced to less than half of their original duration.
We want to scale our testing devices. Currently, we run automated tests on two versions of emulated devices and two real devices (one phone and one tablet). This might be considered good enough, but we want to cover the most-used devices by our customers.
Wrapping it up
Do the math! Working on your CI is often invisible work. Have some metrics ready and regularly check your numbers. They will guide you on your way, as well as cover your back when you get questioned about improved quality or agility.
Watch your step! This can be a never-ending process and, especially at the beginning, there are a lot of different directions you can take. Although the vision of CI/CD might be quite clear (e.g. a big fat green DEPLOY button), this vision is more like a lighthouse in the distance.
You will get distracted on your way - be it by your product needs, new nice-to-haves (both good and crazy) that emerged - or the original assumptions that may very well be outdated. Regularly check your direction and try to measure the impact so you don’t waste your time striving for that single-click perfection.