ICAEW.com works better with JavaScript enabled.

Diving deeper into the data for combating modern slavery

Author: ICAEW Insights

Published: 26 Jun 2024

The challenges around understanding what the data does or doesn’t tell us is a major factor in making AI work effectively. In the latest in his series, David Wray looks at getting data right and how causal AI might aid in offering decision-useful options.

When first embarking on an AI project, the biggest question is about the quantity, type and quality of data. This question is a tricky one to answer. It all depends on the use case that the AI will be solving. 

For instance, if the AI is being trained for facial recognition purposes, training must include different lighting conditions, profiles (front, side, high and low, for instance), single faces, faces within a crowd, ethnicity, gender and age, among other attributes. Without facial diversity, the AI will innately contain biases because of its limited learning. It will recognise some facial characteristics with a high degree of accuracy, but struggle with others. 

In previous entries in this series, we looked at the objectives of our AI project to tackle modern slavery, and the various ways to tackle bias and hallucinations. In the context of modern slavery, the data issue is complex. Modern slavery occurs for a variety of reasons, in various ways, and is inherently concealed by its perpetrators. This means that data is often scarce, disparate, incomplete and in non-digital forms. 

When data elements are grouped together – typical when problem solving – it is known as a dataset. The datasets needed for iEARTHS are numerous, precisely because the issue is complex, has many layers, is varied and the task is specific. 

For instance, when considering the datasets needed to develop the economic model used to show the P&L impact of a given action or inaction, we had to normalise average country wage calculations. Individual countries’ calculation methods or standards of living often differ. 

Why is this important? Imagine the AI doesn’t ‘understand’ nuanced wage differences. The costing determination for each recommendation could inadvertently lead to a user deciding to continue suboptimal labour practices in lower-cost jurisdictions where individuals are more likely to be vulnerable and exploited. This would be an example of the AI causing more harm, which is clearly not an acceptable outcome. The key consideration in this aspect of data is quality. 

Data quality can help lessen the need for quantity. If we return to our facial recognition example, training the AI with high resolution photos rather than a larger volume of low-resolution photos is more effective. 

In the context of modern slavery, all the data consumed by iEARTHS is curated and of high quality. We took the time to ensure datasets were validated by NGOs, economists, supply chain experts, behavioural scientists, survivors or supply chain investigators, among others. It is a painstaking process, but also an essential one to ensure a solid foundation. 

Data terminology also posed some interesting challenges. How something is defined can differ between actors within the ecosystem. Even using the same term can mean very different things. 

Normalising definitions, or linking them as interoperable, ensures a consistent understanding within the domain and its users. We addressed this challenge by using taxonomies and knowledge graph technology. 

Knowledge graphs generally include individually sourced datasets, often with unique structures. These datasets are in turn connected through schemas, identities and context. 

A schema provides a wireframe for the knowledge graph. The context distinguishes the setting within which the knowledge exists. In other words, this approach allows the iEARTHS AI to distinguish definitions or concepts that hold multiple meanings. Structuring and contextualising data is the starting point for effective AI learning. 

Sourcing the data was also a consideration. The various sources are often correlated with trust, or a lack thereof. Some AI, such as GenAI, use all publicly available data – good and bad – the effects of which we saw in the first miniseries article

The very nature of the problem that iEARTHS is tackling is around human impacts, meaning the data learning must be of the highest quality, bias free and as complete as possible. 

Data gaps are addressed through targeted data acquisitions, such as on-the-ground NGOs, or advanced learning techniques, such as data augmentation. It is important to recognise that data processing by the AI is also constrained by computing power, so that needs to be managed as well.

Finally, large language models (LLMs) are designed to understand and generate human-like text based on their learning from the vast amount of data used to train them. LLMs can infer context, translate into multiple languages, answer questions and more. 

They can be a useful technological tool, but they are often quite biased, as seen in a recent scientific report. It demonstrates the gender and racial biases inherited from archival data when it is not adapted to eliminate such historical prejudice, racism or stereotypes. 

iEARTHS does not rely on LLMs or proxy approaches to AI training because of the bias findings in a recent Stanford University study. However, we are evaluating mechanisms for neutralising biases, allowing for possible future usage. 

Data is the foundation upon which any AI tool learns. Biases are built into, or prevented within, datasets, depending on whether the historical data is used as-is or adapted to reflect what we’d like to see in the future (such as an equitable and inclusive future). Sufficiency is relative to the problem the AI will be tasked with solving. Any AI project requires the user to understand the data’s limitations so they can be actively managed. 

In the next and final instalment of the miniseries, we will look at the intersection of ethics with AI, and how finance professionals can play a pivotal role in supporting businesses to transition towards sustainable operating models. 

Read the whole series...

Supporting AI adoption

In its Manifesto, ICAEW sets out its vision for a renewed and resilient UK, including incentivising the use of AI and upskilling the workforce to do so.

Manifesto 2024: ICAEW's vision for a renewed and resilient UK

You may also be interested in

Resources
Artificial intelligence
Artificial intelligence

Discover more about the impact of artificial intelligence and the opportunities it presents for the accountancy profession. Access articles, reports and webinars from ICAEW and resources from tech experts.

Browse resources
Event
Shape the future slogan banner
ICAEW annual conference

The 2024 ICAEW annual member conference focuses on technology, leadership and sustainability. Hear more on AI, ESG and leading through change.

Find out more Book now
ICAEW Community
Data visualisation on a smartphone
Data Analytics

Helping finance professionals develop the advanced data analytics and visualisation skills needed to succeed in this insight-driven era.

Find out more
Open AddCPD icon

Add Verified CPD Activity

Introducing AddCPD, a new way to record your CPD activities!

Log in to start using the AddCPD tool. Available only to ICAEW members.

Add this page to your CPD activity

Step 1 of 3
Download recorded
Download not recorded

Please download the related document if you wish to add this activity to your record

What time are you claiming for this activity?
Mandatory fields

Add this page to your CPD activity

Step 2 of 3
Mandatory field

Add activity to my record

Step 3 of 3
Mandatory field

Activity added

An error has occurred
Please try again

If the problem persists please contact our helpline on +44 (0)1908 248 250