The challenges around understanding what the data does or doesn’t tell us is a major factor in making AI work effectively. In the latest in his series, David Wray looks at getting data right and how causal AI might aid in offering decision-useful options.

When first embarking on an AI project, the biggest question is about the quantity, type and quality of data. This question is a tricky one to answer. It all depends on the use case that the AI will be solving.

For instance, if the AI is being trained for facial recognition purposes, training must include different lighting conditions, profiles (front, side, high and low, for instance), single faces, faces within a crowd, ethnicity, gender and age, among other attributes. Without facial diversity, the AI will innately contain biases because of its limited learning. It will recognise some facial characteristics with a high degree of accuracy, but struggle with others.

In previous entries in this series, we looked at the objectives of our AI project to tackle modern slavery, and the various ways to tackle bias and hallucinations. In the context of modern slavery, the data issue is complex. Modern slavery occurs for a variety of reasons, in various ways, and is inherently concealed by its perpetrators. This means that data is often scarce, disparate, incomplete and in non-digital forms.

When data elements are grouped together – typical when problem solving – it is known as a dataset. The datasets needed for iEARTHS are numerous, precisely because the issue is complex, has many layers, is varied and the task is specific.

For instance, when considering the datasets needed to develop the economic model used to show the P&L impact of a given action or inaction, we had to normalise average country wage calculations. Individual countries’ calculation methods or standards of living often differ.

Why is this important? Imagine the AI doesn’t ‘understand’ nuanced wage differences. The costing determination for each recommendation could inadvertently lead to a user deciding to continue suboptimal labour practices in lower-cost jurisdictions where individuals are more likely to be vulnerable and exploited. This would be an example of the AI causing more harm, which is clearly not an acceptable outcome. The key consideration in this aspect of data is quality.

Data quality can help lessen the need for quantity. If we return to our facial recognition example, training the AI with high resolution photos rather than a larger volume of low-resolution photos is more effective.

In the context of modern slavery, all the data consumed by iEARTHS is curated and of high quality. We took the time to ensure datasets were validated by NGOs, economists, supply chain experts, behavioural scientists, survivors or supply chain investigators, among others. It is a painstaking process, but also an essential one to ensure a solid foundation.

Data terminology also posed some interesting challenges. How something is defined can differ between actors within the ecosystem. Even using the same term can mean very different things.

Normalising definitions, or linking them as interoperable, ensures a consistent understanding within the domain and its users. We addressed this challenge by using taxonomies and knowledge graph technology.

Knowledge graphs generally include individually sourced datasets, often with unique structures. These datasets are in turn connected through schemas, identities and context.

A schema provides a wireframe for the knowledge graph. The context distinguishes the setting within which the knowledge exists. In other words, this approach allows the iEARTHS AI to distinguish definitions or concepts that hold multiple meanings. Structuring and contextualising data is the starting point for effective AI learning.

Sourcing the data was also a consideration. The various sources are often correlated with trust, or a lack thereof. Some AI, such as GenAI, use all publicly available data – good and bad – the effects of which we saw in the first miniseries article.

The very nature of the problem that iEARTHS is tackling is around human impacts, meaning the data learning must be of the highest quality, bias free and as complete as possible.

Data gaps are addressed through targeted data acquisitions, such as on-the-ground NGOs, or advanced learning techniques, such as data augmentation. It is important to recognise that data processing by the AI is also constrained by computing power, so that needs to be managed as well.

Finally, large language models (LLMs) are designed to understand and generate human-like text based on their learning from the vast amount of data used to train them. LLMs can infer context, translate into multiple languages, answer questions and more.

They can be a useful technological tool, but they are often quite biased, as seen in a recent scientific report. It demonstrates the gender and racial biases inherited from archival data when it is not adapted to eliminate such historical prejudice, racism or stereotypes.

iEARTHS does not rely on LLMs or proxy approaches to AI training because of the bias findings in a recent Stanford University study. However, we are evaluating mechanisms for neutralising biases, allowing for possible future usage.

Data is the foundation upon which any AI tool learns. Biases are built into, or prevented within, datasets, depending on whether the historical data is used as-is or adapted to reflect what we’d like to see in the future (such as an equitable and inclusive future). Sufficiency is relative to the problem the AI will be tasked with solving. Any AI project requires the user to understand the data’s limitations so they can be actively managed.

In the next and final instalment of the miniseries, we will look at the intersection of ethics with AI, and how finance professionals can play a pivotal role in supporting businesses to transition towards sustainable operating models.

Read the whole series...

How we applied AI to combat modern slavery

Article
10 Jun 2024

The iEARTHS project is a causal AI system designed to tackle modern slavery. David Wray, ICAEW member and one of its creators, explains the considerations that went into its development.

Combating modern slavery: when Gen AI isn't enough

Article
20 Jun 2024

Following his introduction to the iEARTHS’ artificial intelligence (AI) solution, David Wray discusses why causal AI was chosen over generative AI and how the team tackled bias and hallucinations.

Diving deeper into the data for combating modern slavery

Article
26 Jun 2024

The role of ethics within AI to combat modern slavery

Article
05 Jul 2024

We complete our iEARTHS miniseries journey through an ethical lens. The ethical risks and challenges are real and require thoughtful reflection to protect what really matters: vulnerable individuals.

Latest viewpoints
Latest tax news

You may also be interested in

Resources

Artificial intelligence

Discover more about the impact of artificial intelligence and the opportunities it presents for the accountancy profession. Access articles, reports and webinars from ICAEW and resources from tech experts.

Browse resources

Event

ICAEW annual conference

The 2024 ICAEW annual member conference focuses on technology, leadership and sustainability. Hear more on AI, ESG and leading through change.

Find out more Book now

From £250+VAT
04/10/2024, 08:00-17:15

ICAEW Community

Data Analytics

Helping finance professionals develop the advanced data analytics and visualisation skills needed to succeed in this insight-driven era.

Find out more

Free and open to everyone

Benefits of membership

Becoming a member

Pay fees and subscriptions

Member rewards

Support throughout your career

Digital learning materials via BibliU

Take a look at ICAEW training films

Volunteering roles

Advertise with ICAEW

"How to guides" for ACA students

Exam resources

Digital learning materials

My online training file

Student Insights

UK groups and societies

Worldwide support and services

Strengthening trust in the profession

Regulatory applications

Diving deeper into the data for combating modern slavery

Read the whole series...

You may also be interested in

Resources

Artificial intelligence

Event

ICAEW annual conference

ICAEW Community

Data Analytics

Add Verified CPD Activity

Introducing AddCPD, a new way to record your CPD activities!

Add this page to your CPD activity

Step 1 of 3

Download recorded

Download not recorded

Add this page to your CPD activity

Step 2 of 3

Add activity to my record

Step 3 of 3

Activity added

An error has occurred
Please try again