Excel hits the headlines
It's not often that Excel features in the national news headlines and, when it does, it is usually for all the wrong reasons. Over the weekend of 3rd/4th October, the government's coronavirus dashboard flagged a problem with the reported figures of positive Covid-19 tests. Due to an unspecified error, nearly 16,000 positive cases had been omitted from the reported figures and. more seriously, had not been transferred from Public Health England (PHE) to the NHS Test and Trace system to initiate the process of notifying those with a positive test and identifying their contacts. As many commentators pointed out, this was a potentially life-threatening problem. Delays in the notification of a positive test and in the notification of contacts, could allow increased transmission of the virus.
On Monday 5th the results of investigations into the cause of the issue started to be reported, culminating in an analysis of the issue by Leo Kelion, a BBC technology desk editor, entitled:
"Excel: Why using Microsoft's tool caused Covid-19 results to be lost"
Leaving aside the exact details of this particular problem, the reactions to the revelations highlighted some very important issues around the use of Excel, as well as attitudes to the use of spreadsheets in general. Any major news story that implicates spreadsheets in a disastrous IT error leads to an understandable debate on the suitability of spreadsheets for certain, or indeed any, applications. In addition, it provides an ideal opportunity for those with a particular agenda to promote their own alternatives to spreadsheets.
The anti-spreadsheet backlash poses several dangers. Often 'spreadsheet errors' are errors that occurred in a process that happens to include spreadsheets. This doesn't mean that the spreadsheets actually caused those errors. In some cases, the errors were caused by a much wider failing that could well have happened regardless of whether or not the process involved the use of spreadsheets at all. In fact, the PHE catastrophe might well be an example of this. Data was transferred from one system to another. As part of that process, large files were truncated. The published reason was that, at some point in the process, the data was created as, or converted to, Excel workbooks and those workbooks were in the pre-2007 .xls format, which can only cope with 65,536 rows rather than the 1,048,576 available in the current, .xlsx. format. To describe this as an 'Excel' error is harsh. In his own analysis Patrick O'Beirne, a key contributor to the European Spreadsheet Risk Interest Group (EuSpRIG), showed that it would require the use of the Visual Basic for Applications (VBA) programming language to allow Excel to truncate the file without displaying an error message. The real failure here was the failure to include a relatively simple test to ensure that the same number of records left one system as arrived in the other. Such a test could have significantly reduced the chances of the error occurring whatever process was used.
Arguing over the nature of the error is not some idle attempt to defend the use of any one type of application, it is crucially important in preventing similar errors occurring in the future. If the error occurred because of the use of spreadsheets and could not have occurred had any other type of application been involved, then it would be right to see the eradication of Excel as the correct solution. On the other hand, if the same error might have occurred whatever type of application was used, then eliminating spreadsheets would still leave us open to the same problem recurring. For this reason, we do need to take care when resorting to the catchy and obvious, but potentially misleading, 'spreadsheet error' headline.
Coincidentally, a Tech News article had addressed exactly this issue towards the end of August in response to another 'spreadsheet error' story: When is a spreadsheet error not a spreadsheet error… This time the error involved issues with the design of a hospital.
Guilty pleasure
This is not to absolve Excel from responsibility for all the errors in which it is involved. EuSpRIG has a whole section of its website devoted to 'public reports of spreadsheet errors'. Some of these undoubtedly happened because of the nature of spreadsheets. The key question centres on whether there is something about spreadsheets that makes them particularly error prone, or perhaps this is better expressed as whether there is something about the way in which spreadsheets are used that explains the incidence of errors.
It is certainly true that spreadsheets lack enforced structure. For example, in most database applications a record is defined as being of a particular type, and the database will prevent the entry of data that falls outside that defined type. Put simply, in a database you won't be allowed to enter text characters in a record with a number data type. In contrast, by default, you can enter any sort of data that you like in any of the 17 billion plus cells in each Excel worksheet. It is this flexibility of spreadsheets that explains why they are so appealing to users but, at the same time, so likely to be subject to mistakes. Most applications have some sort of barrier to their use: few people without knowledge or experience of using a database would launch into the creation of a complex database system. Even if they tried to do so, it would soon become apparent that there were important concepts they needed to learn before they could get very far.
Excel really has no such barriers to use. The only underlying concept that you really have to understand is typing characters in cells. A cursory knowledge of playing battleships, or even just noughts and crosses, is more than sufficient for you to start work on your spreadsheet solution. The ability to use it so easily to generate some sort of (apparently) useful result, with no existing knowledge or expertise, also means that there is no requirement for anyone to understand how to use a spreadsheet properly in order to be able to use a spreadsheet.
What makes matters worse is that there is also no financial barrier to using a spreadsheet for most people. Free spreadsheets are readily available and, in the work and educational environment at least, nearly all users will have access to some sort of office suite that will include a spreadsheet at no additional cost.
Compounding these issues is the fact that lots of people find using spreadsheets to be one of the more enjoyable parts of their job. Spreadsheets are undeniably clever and let you achieve impressive results with what can sometimes seems like comparatively little effort. They also satisfy the yearning for the exercise of an element of creativity and intellectual achievement in jobs that might more usually be constrained by process and regulation. It is telling that when a spreadsheet horror story breaks there is often a flood of support for spreadsheets, and an outpouring of the sort of unconditional love that is more usually reserved for favourite sporting teams.
Once more unto the breach
So, there are aspects of the way in which spreadsheets work that might make errors more likely, but there is also the issue of the tasks that they are required to perform. Often, spreadsheets fill the gaps that more specialist systems leave. This means that spreadsheets are pressed into use when things stop being straightforward and they therefore must cope with more complicated processes and calculations. Increased complexity generally creates increased risk. In addition, because spreadsheets are used to handle problems that existing systems are unable to handle, there is often no obvious reference point to help identify whether the solution they produce is actually correct.
It is also the case that spreadsheets can seem to be the only viable option. The BBC article referred to earlier includes a quote from Professor Jon Crowcroft from the University of Cambridge
"Excel was always meant for people mucking around with a bunch of data for their small company to see what it looked like,"
"And then when you need to do something more serious, you build something bespoke that works - there's dozens of other things you could do."
Many people at the mercy of internal IT procedures might question the throwaway suggestion that you just 'build something bespoke that works'. The timescales and costs involved with initiating the creation of a bespoke solution for 'something more serious' can appear prohibitive. Experience would also suggest that bespoke solutions themselves are far from immune to the occasional error.
If you have a problem to solve, the immediate availability of spreadsheets can seem very attractive, particularly when compared to the requirements and internal controls associated with commissioning a bespoke solution.
Natural selection
Having said all that, there is still a very important question hanging over the use of Excel. If spreadsheets do, for whatever reason, result in so many significant errors, why are they still such an important part of so many business-critical, and indeed life-critical, processes? This question becomes even more significant when we take into account the frequently quoted estimate that about 90% of all spreadsheets contain material errors. To quote from something I wrote nearly 9 years ago:
"If someone used a tool that 9 times out of 10 made them look like a complete idiot in front of their colleagues and bosses they’d stop using it pretty quickly. Even if they didn’t stop of their own volition, their continuation in their current position would be unlikely."
In that article I came to the conclusion that it was wrong to think of spreadsheets in isolation, and that instead we should allow for the fact that many spreadsheets are used to support a wider process of decision-making, with human judgement playing its role in the interpretation of the results.
Whether this explanation fully explains the 90% paradox is doubtful and more research that helps reconcile the continued role of spreadsheets with their apparent vulnerability to error is vital.
Are catastrophic errors really the main issue anyway?
Going back to the PHE crisis that sparked the recent discussion of spreadsheet errors, in this case the possible effects went beyond financial loss to become a threat to health and even life, but in many of the publicised cases it was the size of a financial loss that dominated the headline. If it's financial loss that we are concerned with, then it is likely that the loss caused by identified errors is a tiny fraction of the overall amount wasted through inappropriate and inefficient use of spreadsheets.
Returning to the possibility that the prevalence of spreadsheet errors is at least partially countered by human involvement in the process, our nine-year-old article concluded with the worrying thought that:
"We might be concentrating on the wrong problem where spreadsheets are concerned. The real cost to us, our organisations and the global economy [might] not be the high-profile catastrophic spreadsheet errors. Instead, it could be the enormous amount of time spent compensating for the structural weakness of spreadsheets and for the widespread lack of proper education as to their efficient and safe use."
Eradication or education? Just do something
One of the aspects of Professor Crowcroft's condemnation of Excel that worried me the most was that it was so dismissive. Obviously, giving an instant response to a journalist might not allow for the sort of proper evidential analysis that we might expect from a Cambridge professor, but to characterise the PHE issue as some sort of inexplicable aberration is to completely misrepresent the extent to which organisations of all types continue to rely on spreadsheets as part of mission-critical processes.
When people rush to defend Excel against the latest horror headline, they often highlight the importance of proper training to promote the more efficient and safer use of spreadsheets. Perhaps the key question that we need to address is whether proper training and education can make spreadsheets safe enough to use, or whether we really do need to stop using them for a range of inappropriate applications or to try and eradicate them altogether. What we absolutely can't afford to do is to pretend spreadsheets and spreadsheet education don't matter and just to continue to express amazement that people use spreadsheets in the way that they do.
In accordance with its role in promoting business efficiency through not only the qualifications of its members but also through the millions of individuals and businesses that rely on its members' advice, the ICAEW has delivered three significant projects to promote the safe and efficient use of spreadsheets: Twenty Principles of Good Spreadsheet Practice, The Spreadsheet Competency Framework and the Financial Modelling Guide. Full details can be found at www.icaew.exceltl.com