In a previous article, Edward Franklin, founder of TechFranklin, explored the approach to take when getting started with programming as an accountant. The first article covered how to work with CSV formatted data and this following article will cover how to take advantage of JSON, a format commonly encountered with APIs.
You can read the first part here – ‘How to get started with programming as an accountant | ICAEW’.
JSON stands for ‘JavaScript Object Notation’ and it has become the standard format that APIs use to exchange data. Therefore, in order to benefit from APIs, the ability to manipulate JSON formatted data is essential.
For example, in the ICAEW article on interacting with the Companies House API you can see that the author is manipulating JSON. And, if you go to Xero’s API documentation you will see JSON throughout.
What is the difference between CSV and JSON?
CSV formats data in a tabular structure where columns narrate the data, and each grouping is reflected in a row. By comparison JSON structures data in key value pairs and each grouping is enclosed within curly brackets.
Example of CSV vs JSON
For example, we can transform the CSV data from my last article into JSON.
Which in a plain text format is:
When converted into JSON format this becomes:
This transformation from CSV to JSON can be done in Python using the Pandas library with the code below.
What is the point of JSON?
Save for the fact you have to use JSON in order to interact with APIs, the merits of JSON over CSV might not be immediately obvious and detailed analysis is beyond what is needed for this article.
However, at a high level the CSV format is severely constrained because it is restricted to the two dimensions of rows and columns.
By comparison JSON is much more flexible and allows for multiple dimensions through a hierarchy structure. So, you can nest key value pairs, which facilitates more complex and real-world data schemas.
Another example of CSV vs JSON
Suppose we extended the Price and Quantity data to reflect individual transactions and their associated locations. For each price, we may have multiple transactions; in JSON this would be reflected as follows:
By comparison in order to reflect this data in CSV we forced to create multiple repetitious columns as follows:
By organising the data in the CSV format, it creates an unpredictable data schema since the number of columns depends on the number of transactions, which cannot be known in advance. In this case the CSV no longer conforms to a data structure known as a flat file.
This unpredictable schema makes interacting with CSV in this instance on a recurring basis inherently unreliable since you cannot know in advance the coordinates of all of the data and therefore writing Excel functions which reliably capture all the data is highly problematic.
Alternatively, the data could be organised by repeating the price '5' over multiple rows for each transaction date. However, this ‘vertical’ format can be inefficient and repetitive for large and multidimensional data sets.
Hopefully, this example demonstrates just one of the ways JSON can be superior to CSV.
This example also indirectly demonstrates one of the powers of programming over spreadsheets since spreadsheets are inherently curtailed by a 2-dimensional grid. By comparison, the abstraction of code means an essentially infinite number of dimensions can be handled.
Storing JSON and CSV
If you’d like to learn more about storing data in the CSV and JSON format I suggest you read the community’s article on datalakes.
XML vs JSON
In your career you may have come across data in XML format. XML is similar to JSON but predates it, so it is still commonly used in legacy applications and some APIs, for example HMRC uses XML for RTI Payroll Submissions. However, JSON is more lightweight, so is faster for computer programs to read and write and is easier to parse.
Next steps
If you have become familiar with Python and DataFrames arising from my previous article I suggest you seek to reperform the analysis but converting the CSV data into JSON.
You will need to perform another transformation which converts the JSON into a Python Dictionary (this is Python’s key-value data structure equivalent to JSON) as follows:
Once you have the hang of this you should be ready to interact with API data using Python.