When Stuart Cobbe, Principal Consultant at The Analytical Accountant, decided to determine the abilities and limits of ChatGPT, he decided to give it a test. Having worked with artificial intelligence (AI) and machine learning for eight years, both for MindBridge and Moore Kingston Smith, he was familiar with the concept of benchmarking for AIs that usually involves some kind of measure against human performance.
“Being an ACA, I have deep familiarity with ICAEW’s exams,” he says. “So for me, it was the easiest way to perform the same idea with a more specific-use case in mind.”
Knowing that ChatGPT needs to work with text-based questions, he opted to put it through the ICAEW assurance assessment paper. The version in place at the time, dubbed GPT-3.5, failed with a score of 42%. While this seems like a fairly respectable score for AI, it uncovered some major issues with the bot.
“It suffered quite a bit from these hallucination problems,” explains Cobbe. “When it’s not sure, it has a tendency to make things up.”
In particular, it really struggled with areas where auditors and accountants have a narrow definition of terms, such as ‘observation of assets’. It also had a tendency to waffle on and go beyond the bounds of the question in its responses, and struggled with understanding the layout of the questions where multiple-choice questions have several components to them.
When ChatGPT-4 was released, Cobbe tried the assessment again. This time, it passed with 78% – a significant improvement on last time, with better understanding and reduced rambling.
“The hallucination problem was there, but it was less prevalent,” he says. “In a way, that makes it more of an issue. The hallucination problem didn’t exist as much because it was getting answers correct in a more consistent way, but when it was wrong, it was still being confident about the way it was wrong,
“It was better at interpreting the structure of the question and following the chain of reasoning required. It still struggled with certain terminology where we as auditors have kind of a narrow definition, but I believe that problem is very solvable.”
This demonstrates how quickly generative AI models such as ChatGPT are improving, but also the issues that are still present within them. The accountancy sector is rapidly adopting AI and machine learning within its various firms and functions, and available options are increasing – Google has just announced its generative AI, Bard.
With reports from members about recommendations for nonexistent books and Excel formulae conjured from thin air, it’s clear that accountants need to be more aware of generative AI’s uses and limitations.
“Members are clearly very excited about the opportunities that tools like ChatGPT can bring to their organisations and the way they work,” says Ian Pay, ICAEW’s Head of Data Analytics and Tech. “With new developments, products and integrations being announced on an almost daily basis, the rate of change is quite breathtaking. But ChatGPT is imperfect.”
Part of the issue is the confidence in which AIs can present incorrect data, which can lead people to trust what it’s giving them. You should consider it, says Cobbe, “like an ill-informed but confident junior.”
There is huge potential for generative AI technology in the accounting profession. For example, it could be useful to aid in ‘know your client’ research, searching for structured and unstructured information about a potential client to highlight any red flags.
It can also be applied within analytics platforms to make it easier to pull together the insights you want – you merely have to ask it a question. With some work, it could also be used to aid in navigating methodology and standards, helping accountants to find the answers to queries within the guidance.
However, any use of generative AI should be approached with caution and critical thinking. Members should engage their professional scepticism when using AIs, says Pay. “While GPT-4 cites its sources now (unlike GPT-3), those sources should still be checked and verified, and the output not taken at face value. Using it on client or customer-facing activities does open up a lot of risk.”
It’s important to pay extra attention to any topics that are specific to accounting, audit, standards and regulations, particularly where the UK regulatory landscape differs from other major markets. While it appears to have accounting content within its training corpus, it has been fine-tuned for specific accountancy-use cases.
Appropriate prompting and structuring is also required to get usable answers. Questions might have a number of contexts, so it’s important to be specific.
“I think people need to be careful about data privacy, security and IP,” adds Cobbe. “You have to be aware that you are consciously providing information to a third-party business with which you have the very basic terms and conditions in place that are written in their favour.”
Cobbe recommends that accountants look at various generative AIs when considering what to use, as they are better at some tasks than others. Perplexity AI, for example, provides relevant sources for its answers and is better at expressing uncertainty, which makes it good for research purposes.
ICAEW is currently looking at the role tools such as ChatGPT might play in the profession. It potentially impacts the entire professional journey, from students upwards, Pay explains. “It's quite a substantial undertaking and not something that will necessarily happen overnight, especially while the landscape continues to evolve. There is a regulatory element to bear in mind, too, as well as the integrity of the ACA, which is so important to preserve.”
Cobbe believes that generative AI language models will fundamentally change the profession in the next five to 10 years. “Any task that involves early-stage content creation, or summarisation, will be susceptible to automation or acceleration through these kinds of tools. So much of what we do as accountants and auditors can be classed as content creation, so the scope is quite broad.”
Generative AI guide
Explore the possibilities of generative AI in accounting.