ICAEW.com works better with JavaScript enabled.
Exclusive

How to get started with programming as an accountant Part 2- CSV vs JSON

Author:

Published: 05 Aug 2024

Exclusive content
Access to our exclusive resources is for specific groups of students, subscribers and members.

In a previous article, Edward Franklin, founder of TechFranklin, explored the approach to take when getting started with programming as an accountant. The first article covered how to work with CSV formatted data and this following article will cover how to take advantage of JSON, a format commonly encountered with APIs.

You can read the first part here – ‘How to get started with programming as an accountant | ICAEW’.

JSON stands for ‘JavaScript Object Notation’ and it has become the standard format that APIs use to exchange data. Therefore, in order to benefit from APIs, the ability to manipulate JSON formatted data is essential.

For example, in the ICAEW article on interacting with the Companies House API you can see that the author is manipulating JSON. And, if you go to Xero’s API documentation you will see JSON throughout.

What is the difference between CSV and JSON?

CSV formats data in a tabular structure where columns narrate the data, and each grouping is reflected in a row. By comparison JSON structures data in key value pairs and each grouping is enclosed within curly brackets.

Example of CSV vs JSON

For example, we can transform the CSV data from my last article into JSON.

Screenshot of CSV data

Which in a plain text format is:

Screenshot of plain text format

When converted into JSON format this becomes:

Screenshot of JSON format

This transformation from CSV to JSON can be done in Python using the Pandas library with the code below.

Screenshot of Python using the Pandas library

What is the point of JSON?

Save for the fact you have to use JSON in order to interact with APIs, the merits of JSON over CSV might not be immediately obvious and detailed analysis is beyond what is needed for this article.

However, at a high level the CSV format is severely constrained because it is restricted to the two dimensions of rows and columns.

By comparison JSON is much more flexible and allows for multiple dimensions through a hierarchy structure. So, you can nest key value pairs, which facilitates more complex and real-world data schemas.

Another example of CSV vs JSON

Suppose we extended the Price and Quantity data to reflect individual transactions and their associated locations. For each price, we may have multiple transactions; in JSON this would be reflected as follows:

Screenshot of a JSON example

By comparison in order to reflect this data in CSV we forced to create multiple repetitious columns as follows:

Screenshot of a CSV example

By organising the data in the CSV format, it creates an unpredictable data schema since the number of columns depends on the number of transactions, which cannot be known in advance. In this case the CSV no longer conforms to a data structure known as a flat file.

This unpredictable schema makes interacting with CSV in this instance on a recurring basis inherently unreliable since you cannot know in advance the coordinates of all of the data and therefore writing Excel functions which reliably capture all the data is highly problematic.

Alternatively, the data could be organised by repeating the price '5' over multiple rows for each transaction date. However, this ‘vertical’ format can be inefficient and repetitive for large and multidimensional data sets.


Screenshot of a CSV example

Hopefully, this example demonstrates just one of the ways JSON can be superior to CSV.

This example also indirectly demonstrates one of the powers of programming over spreadsheets since spreadsheets are inherently curtailed by a 2-dimensional grid. By comparison, the abstraction of code means an essentially infinite number of dimensions can be handled.

Storing JSON and CSV

If you’d like to learn more about storing data in the CSV and JSON format I suggest you read the community’s article on datalakes.

XML vs JSON

In your career you may have come across data in XML format. XML is similar to JSON but predates it, so it is still commonly used in legacy applications and some APIs, for example HMRC uses XML for RTI Payroll Submissions. However, JSON is more lightweight, so is faster for computer programs to read and write and is easier to parse.

Next steps

If you have become familiar with Python and DataFrames arising from my previous article I suggest you seek to reperform the analysis but converting the CSV data into JSON.

You will need to perform another transformation which converts the JSON into a Python Dictionary (this is Python’s key-value data structure equivalent to JSON) as follows:

Screenshot of a Python example

Once you have the hang of this you should be ready to interact with API data using Python.

Open AddCPD icon

Add Verified CPD Activity

Introducing AddCPD, a new way to record your CPD activities!

Log in to start using the AddCPD tool. Available only to ICAEW members.

Add this page to your CPD activity

Step 1 of 3
Download recorded
Download not recorded

Please download the related document if you wish to add this activity to your record

What time are you claiming for this activity?
Mandatory fields

Add this page to your CPD activity

Step 2 of 3
Mandatory field

Add activity to my record

Step 3 of 3
Mandatory field

Activity added

An error has occurred
Please try again

If the problem persists please contact our helpline on +44 (0)1908 248 250