#apps
Nov 21, 2023

Integration Tests

Integration Tests

Foreword

We all know writing tests can be a bit of a time sink and, well, a tad dull. But, trust us, they're your code's best buds, sniffing out those bugs and keeping your work top-notch. We're here to talk about crafting efficient tests that are quick in bug-busting. Let's get into the nitty-gritty of software testing!

Catalogue API

In our CMS team, the most critical parts are API services, especially our lovely Catalogue API, which impacts end users and can easily affect the platform’s usability. Catalogue is the source of nearly all our content, including movies, series, sports events, promoted rows, etc. It supports a wide variety of clients, has a continuously growing number of API versions, and receives a high amount of requests every second. In addition to all of that, Catalogue API uses Elasticsearch indices with a constantly changing mapping as a data source. In other words, it is a perfect environment where bugs can find shelter.

Unit tests

Back in 2021, we had only unit tests for Catalogue. Even though unit tests are great, we realised that unit tests alone are not enough to prevent stressful surprises during production deploys, especially after big cross-service changes. Unit tests typically only work in a limited context on a small section of a program (known as the “unit”). They also heavily rely on mocking things, which consequently brings the extra maintenance costs of keeping the mocks synchronised with the code after every refactor or function rename. 

Let's go higher with integration tests

The main concern is that we can't include all crucial Catalogue parts in the testing loop only by having unit tests. We need something larger or, if you like, higher level. The idea with integration tests is to have the ability to send a request to a Catalogue API, trigger the entire call stack, hit Elasticsearch, and get a response. Another benefit is that integration tests only care a little about the code itself. As a result, we can rename, refactor, and move packages around without the burden of syncing tests. Mocks are not welcome here!

POC: Not so easy

We can configure our CI to spin up Elasticsearch when integration tests require it. We can use pytest to set up a testing application, provide requested fixtures, create Elastic indices, and fill them with data. In turn, falcon's test client can send requests. Sounds straightforward, right? Well, not really. 

We rapidly sketched a testing environment, including the test app setup, Elastic connection and indices creation, various fixtures, and other related tools. The challenge is accurately recreating that constantly changing mapping we have already mentioned. Indices and the mapping are defined by a fabled (and feared) cron job called ES-init, which takes normalized data from our CMS DB, denormalizes it, and indexes it into Elastic. It is a totally different environment.

So how can we get information about the indices? In the first POC, we simply dumped the mapping data into a JSON file and saved it in our integration-test Python package. It worked! We got proof that we could fit all these puzzle pieces together. Now we can write something like this:

@use_data(
    marketing_promos_data=[
        {
            "title": {"eng": "Coming soon Call to action"},
            "promo_type": PROMO_TYPE_SPORT_CALL_TO_ACTION,
        },
        {
            "title": {"eng": "Coming soon Editorial"},
            "promo_type": PROMO_TYPE_EDITORIAL,
        },
    ]
)
def test_no_call_to_action_promos_for_older_clients(
    client_website, main_territories, marketing_promos
):
    # Get all promos. Default API version is 138.
    result = client_website.simulate_get("/catalogue/promos")
    assert len(result.json) == 2
    # Do not return Call-to-Action promos to older clients.
    result = client_website.simulate_get(
        "/catalogue/promos", headers={"Showmax-Int-Api-Major": "130"}
    )
    assert len(result.json) == 1
    assert result.json[0]["title"] == "Coming soon Editorial"

Yes, it's beautiful! It has zero mocks and tests a lot (remember that the request goes through the entire call stack). What is more important is that it's easy to read and write, just look at it.

Fixtures

Let's write some code, and you will see how easily we can request the required data for a test.

def test_something(
    client_android,
    assetflat,
    main_territories
):
    # Some testing is happening here.

The body of this test now can use a ready-to-use Android client, one asset, and all main territories. The asset and main territories documents are indexed in the corresponding indices. All fixtures have reasonable defaults. For example, the assetflat object is a movie with valid license windows for all countries.

Imagine we want to ensure that the /by_id endpoint filters out content which doesn't have a license for a particular country. Easy!

@use_data(
    assetflats_data=[
        {"uuid": "uuid-1"},
        {"uuid": "uuid-2"},
        {"uuid": "uuid-3"},
        {
            "uuid": "uuid-4",
            "license_windows": [
                {
                    "start": choices.DATETIME_PASSED,
                    "end": choices.DATETIME_FUTURE,
                    "countries": ["ZA"],
                    "subscription_statuses": [
                        "sports_full",
                    ],
                    "type": "SVOD",
                    "uuid": "uuid-4-lw_za",
                    "is_downloadable": True,
                }
            ],
        },
    ]
)
def test_filter_by_content_country(
    client_android,
    assetflats,
    main_territories,
    main_languages,
    categories,
    main_ratings,
):
    result = client_android.simulate_get(
        "/catalogue/by_id",
        params={
            "id[]": "uuid-1,uuid-2,uuid-3,uuid-4", 
            "content_country": "KE"
        },
    )
    assert result.status == HTTP_200
    assert result.json["count"] == 3

We can request multiple objects and overwrite only the parts we want. In our case, asset uuid-4 has a valid license window only for South Africa. Other assets have valid license windows for all territories. Similarly, we can overwrite any fixture we want. For example, we've created a helper which allows us to generate hundreds of sports events with any date and time range.

Again, look at the test above – it's self-explanatory.

Monorepo and automation

After we finished writing our first integration test, several developers had already changed an Elasticsearch schema in CMS multiple times. As a result, our Catalogue tests with fixtures based on hardcoded "stale" mappings didn’t make sense anymore. Moreover, such tests would not catch a bug introduced by new mapping changes in CMS. This happened several times during production deployment. Changes in Service A implicitly may affect Service B.

We could ask developers to dump Elasticsearch mapping information and update a JSON file inside the integration-tests package every time it’s changed, but that would require another forgettable and manual step. Is there a way to automate this?

Yes: having our code in a monorepo gives us this flexibility. We can access files of other services and packages. So we introduced a test for CMS which will fail a CI if mappings are not synced with the integration-tests package. Simple and efficient. You can’t merge your pull request if the integration-tests package has outdated mapping. Period.

Another CMS command, that performs this synchronisation:

python manage.py es_sync_test_mapping

What does all this give us? Elasticsearch indices are created and populated with data by the CMS service but are read by the Catalogue API. If even a tiny Elasticsearch mapping change introduced by the CMS service is not fully tested on the Catalogue API, we can end up with a bug in production. Performing these tests manually is nearly impossible since we would have to cover a lot of cases (multiple endpoints, different clients and API versions) and unit tests just aren’t adequate for this. Integration tests solve this problem and work as a glue between different services.

It’s never enough

Monorepo not only allows us to access files of other services but also gives us the luxury to install packages from the path. If we change our packages, we don't need to rebuild and push them to PyPI. No need to update poetry.lock files in all dependent services (this topic might be a separate blog post). We simply change the code of the python package, and those changes are reflected in services during the next build. 

During the CI, we know what services or packages were changed, and based on this information, we run tests for those. For example, if a pull request includes changes for CMS service and es-models package (another python package), we run tests respectively for CMS and es-models.

Could we extend this logic so we also run tests for services which depend on changed python packages? Absolutely! All python dependencies are listed in pyproject.toml file. If a changed python package is referenced in service’s pyproject.toml, we add this service to the loop.

How is it related to integration tests? Imagine we want to add one field to the index where all assets are stored. Changes will look something like this:

  • [CMS service] Elasticsearch mapping change
  • [integration-tests package] Elasticsearch mapping update
  • [es-models] Reflecting mapping update in our python wrapper for Elasticsearch

When such a pull request is created, the CI will run the following tests:

  • Run tests for CMS service
  • Run tests for integration-tests python package
    undefined
  • Run tests for es-models python package
    undefined

In this sequence, integration tests ensure that Catalogue, Catalogue-proxy, and Playback will not be broken by changes introduced in CMS.

Conclusion

In conclusion, while unit tests serve as a valuable tool in software development, they are rarely sufficient in ensuring the seamless integration of various components of big projects. This is where it might be beneficial to invest time into creating integration tests that enable us to test the interactions between services and identify potential issues that might arise in more of a “real-world” scenario.

Share article via: