ICAEW.com works better with JavaScript enabled.
Exclusive

Asynchronous API calls and the Companies House Document API – part 1

Author:

Published: 17 May 2024

Exclusive content
Access to our exclusive resources is for specific groups of students, subscribers and members.
In a previous article, Sam explored how to connect to Companies House through leveraging their API and Python. One of the most common questions he received following the release of that article was how to manage the documents API. In this two-part series, Sam demonstrates how to set up the API effectively, first explaining the important concept of asynchronous API calls.

When thinking about the content for this article, it struck me that whilst my previous article gave what I think is a good example of using the requests library in Python, this may not necessarily be the best approach in your environment as it is synchronous in nature (more on this below).

Before using the Companies House Documents API, it is first necessary to get to grips with the difference between synchronous and asynchronous API calls. If you understand what I mean by this, then feel free to move on to the implementation of an asynchronous client for obtaining company profiles from Companies House – the second part of this article.

For this article, I am going to make an assumption that you have a basic understanding of Python, and I will also utilise code and approaches as discussed in my previous article.

All code can be found on the async-documents branch of the repo on GitHub.

Synchronous vs asynchronous

What do I mean by these terms?

Synchronous – each operation depends on the completion of the one before it before moving on, i.e. each operation blocks the next.

Asynchronous – allows for operations to “pause” whilst waiting on a result giving the impression of running tasks concurrently with each other, i.e. it is non-blocking in nature.

Why is this relevant to my previous article? Well, that is because the requests package is blocking, i.e. it is synchronous.

Now consider a use case for the Companies House API where maybe we want to scrape a lot of data from companies we engage with, maybe to see if there’s been any updates to the PSC register for example. If we have hundreds of these then in the synchronous approach each request will need to finish prior to the next one starting despite each request being entirely different and not dependent on the outcome of one another.

So, in the above scenario, using an asynchronous approach will allow multiple requests to run without waiting for the result of one request to execute.

I always found the switch from synchronous to asynchronous uncomfortable to begin with, as synchronous just feels much more logical, and I find the explanation here brilliant for those that are curious about the concept and how it generally works.

An example

I always feel it’s better to run an example to explain how this works, and to do this, we use Python’s built in asyncio package, the foundation from which we can, and other packages do, implement asynchronous capabilities.

Screenshot of async coding

The code above is hopefully quite simple to understand and does the following:

  • sync_counter is a function that will sleep for s seconds c times in a synchronous fashion, i.e. waiting for the previous sleep operation to complete.
  • async_counter_task creates a coroutine (see below) for sleeping s seconds using the asyncio.sleep method.
  • async_counter aggregates this coroutine c times using the gather method, to allow each coroutine to be scheduled and run on the event loop (see below).

A coroutine can be thought of as a task that can be suspended and resumed, in essence freeing up execution for other tasks whilst idle or waiting for a result, i.e. in an asynchronous fashion. Simply calling a coroutine will not actually execute it, but instead create a coroutine object.

To execute these coroutines, we need an event loop, which essentially handles the execution of all coroutines and managing when they suspend and resume. This is where the asyncio.run() method comes in, as this starts an event loop upon which to execute coroutines that are passed to it. Think of the event loop as the programmatic version of a conductor of a grand orchestra, controlling how instruments harmonise with each other.

Based on the definitions I gave earlier, we should at least have an expectation that the sync_counter function should run in roughly 5 seconds, as we are waiting 1 second 5 times. The output shows how using asyncio can generate performance gains for certain tasks (see below):

Screenshot of Sync and Async counter

As we can see the asynchronous version ran all 5 tasks in 1 second as they were run without blocking each other whilst the function slept.

Asynchronous, multithreading and multiprocessing often come up when talking about concurrency, and I’m not going to delve into any more detail on this topic here as there is plenty of material on this – an example overview can be seen here for those interested.

In general, we would choose asynchronous programming with a large number of I/O bound tasks, i.e. when the time to execute is primarily dependent on waiting for inputs or outputs, for example waiting for a response from a HTTP request sent to a web server via an API – hopefully you can see why this is relevant!

Creating our asynchronous client

Whilst there are a couple of different Python packages we can leverage, the one I am going to use is aiohttp which is an async-only package.

To begin with I’m going to create a minimal example of an asynchronous HTTP client and then compare the performance to the synchronous client from my original article, whilst also borrowing some of the class attributes as well.

My minimal example will involve obtaining the company profile, so first I add the get_company_profile method to our CompaniesHouseAPI class that runs synchronously using the requests package:

Screenshot of get_company_profile coding

Hopefully the above is self-explanatory given the information provided in my previous article, and is based off the API specification for this endpoint.

Using the same sort of structure in our asyncio example above, we create our coroutines to send these requests using the aiohttp.ClientSession object which is described as best practice here.

Screenshot of aiohttp.ClientSession coding

The above operates as follows:

  • fetch_async_task creates a coroutine to be executed that makes a GET request to the specified URL and returns the JSON response.
  • fetch_async aggregates this coroutine a fixed number of times equal to the iterations and also passes in any headers to the ClientSession object, which we require in order to authenticate to Companies House – again we use the asyncio.gather method for doing this.
  • fetch_sync uses our CompaniesHouseAPI class to make a get call to the company profile endpoint a fixed number of times equal to iterations.

To see the performance of both the synchronous and asynchronous approaches, we want to establish our parameters, run a benchmark for a single request, and then compare to the performance of both scenarios. I have done this like so:

Screenshot of the comparison between synchronous and asynchronous

I’ve included comments where relevant to describe the steps taken. I ran this 3 times, as given the nature of HTTP requests, the response times are likely to vary, and these are the results:

Benchmark performance results between the synchronous and asynchronous versions

Despite the third benchmark being wildly different, the asynchronous approach shows a clear sign of performance improvement of over 95%, with each request not being blocked by one another.

In part 2, Sam will explore how to apply this to using the Companies House Documents API.

About the author

Sam is a Chartered Accountant and Auditor with a keen interest in technology. He has previously managed an audit portfolio whilst leading an Innovations team building internal analytical tools for audit and providing automation and analytics to clients.

He now works at Circit, as a Product Manager for their Verified Transactions, Verified Insights and Verified Analytics modules.

Feel free to reach out to Sam at sam.bonser@circit.io.

Open AddCPD icon

Add Verified CPD Activity

Introducing AddCPD, a new way to record your CPD activities!

Log in to start using the AddCPD tool. Available only to ICAEW members.

Add this page to your CPD activity

Step 1 of 3
Download recorded
Download not recorded

Please download the related document if you wish to add this activity to your record

What time are you claiming for this activity?
Mandatory fields

Add this page to your CPD activity

Step 2 of 3
Mandatory field

Add activity to my record

Step 3 of 3
Mandatory field

Activity added

An error has occurred
Please try again

If the problem persists please contact our helpline on +44 (0)1908 248 250