ICAEW.com works better with JavaScript enabled.
Exclusive

Harness the potential of Power BI to cleanse and transform data Q&A

Author:

Published: 16 Dec 2024

Exclusive content
Access to our exclusive resources is for specific groups of students, subscribers and members.
Following on from the ‘Harness the potential of Power BI to solve data challenges’ webinar, Rishi Sapra, Chartered Accountant turned Microsoft Most Valuable Professional (MVP), returns to address more of the key questions that came out of the webinar on using Power BI in conjunction with Power Query to cleanse and transform data in this follow up article.

Rishi Sapra also covers more on using Power BI for financial reporting in the first part of this Q&A series.

Can Power BI be used to cleanse data?

Yes, Power BI can be used to clean data, specifically using the Power Query component that was the focus of the demos in the webinar. This includes the ability to replace values, fill in blanks, split text, deal with duplicates and much more! This is by far one of the most valuable components for finance professionals as it automates this cleaning process; each time a Power BI report is refreshed it pulls in all the data and applies the cleaning/transformation steps automatically. Using Power Query for data cleaning/shaping can also be centralised using Dataflows which is the online version of Power Query in the Power BI service.

Alternatively, if the data is coming from a location which has its own logic layer (e.g. SQL Server), where possible it may be preferable to clean the data there as the cleaned data is then available for all downstream activities including Power BI. One of the Power BI product Managers at Microsoft (Matthew Roche) has come up with a Maxim for data transformation which states that “Data should be transformed as far upstream as possible and as far downstream as necessary”!

Is the formatting and data "cleansing" logic stored in the same way as you would get with a SQL program logic table?

Yes - though most of the data cleansing/transformation steps in Power Query are applied through the UI, this logic is stored in functional code in a similar fashion to aSQL script. Power Query uses M language, and you can see and edit the code that has been written against each query by using the Advanced editor feature in Power Query. Writing and editing M language directly, while daunting at first, can give more advanced logic than is available purely through the user interface – it’s explored in this article for the Excel Community.

When connecting to a SQL database, you can even write your data cleaning/transformation logic directly as a native SQL query which could be alternative to using Power Query/M. However, there are still some advantages to learning the basics of the M language and building your logic in Power Query rather than SQL. This article by Gorilla BI compares the two!

In Power Query there were some examples of lookups (merges) between tables. Formatting is often a problem with lookups, does Power BI overcome this?

Unfortunately, when attempting lookups Power BI is no less sensitive to differing data types than many other applications. Ensuring that the data is cleaned (e.g. set as an appropriate data type with things like trailing/leading spaces removed) is an important step before joining or merging with other datasets and this is something that is made very easy with Power BI (or Excel!) using the Power Query component. The ribbon contains options to easily set a data type, trim text, extract out text using certain conditions (e.g. before or after a character or phrase) or deal with duplicates. These steps are automatically applied in sequence on each refresh, so any data cleaning steps should be carried out before lookups/joins so that the matches are more accurate.

Can you write out from Power BI as well as read? Can you include a data input aspect to a report i.e. for someone to see a report and then add some further information that could then be used centrally?

It is possible to write out from Power BI though this requires integration with other tools (Microsoft or third-party) and therefore potentially additional licence costs and increased complexity, so the use case for this should be carefully considered to ensure it is the right solution.

You can build an input form in PowerApps and embed this into a Power BI report, or can use third party custom visuals/platforms that support write-back to either source systems/files or an intermediary database. These third party tools include Power On, Acterys and Inforiver, and they all include workflows/an audit trail to enable write-back to a centralised location for purposes such as budgeting/forecasting in a structured manner. Alternatively, the Zebra BI visuals support annotations and are very popular with finance teams for variance analysis and for allowing end users to add their own analysis/comments directly on a visual.

How much of the audit trail/background is available to review (i.e. from an audit perspective to confirm completeness and accuracy of sources)?

Power BI is not designed to be a database that is the source of truth for your financial data, nor is it an operational system that has fixed auditable processes and workflows. In this way it is closer to Excel as a "self-service tool", but it does have more governance and control compared to Excel.

As a strong starting point, the data ingestion and transformation logic in Power Query can act as an audit trail to assist in validation of the data, however there is no logging of any changes to said logic.

The list of sharing/distribution and access activities carried out in the last 30 days in the Power BI service are available via an API, or through standard admin reports provided by Microsoft, but this isn't going to show where someone has changed logic or visualisations. It is possible to make updates to the logic/data model programmatically as a structured method of version control but this method of updating is difficult to enforce from a purely technical perspective - this is why Governance has to come from people and processes rather than just the technology!

In terms of the data, this is overwritten and will just reflect what is in the data source each time the model is refreshed. It is possible to store snapshots of dataflow outputs in a data lake and you could potentially build a process to check these outputs against a source system but I imagine it would be more robust to just set up the extraction/transformation logic as required and have proper governance and controls over who is able to edit that logic.

In the webinar, the end report showed a waterfall chart, but it looks like the waterfalls don't balance. Does Power BI not flag this as an error?

I believe the waterfall chart in the solution file does "balance" (i.e. the difference between the current/prior month values are explained by the differences that are broken down) but in case it doesn't, this isn't something that Power BI would flag out of the box. It is possible to have an output in Power Query that performs some sort of data reconciliation/data quality checks which you could then display in a report page to highlight anomalies or data quality checks.

Is Copilot able to do the data transformation (the extraction, transformation, loading) for us as well?

Yes, Power BI is just one of the Fabric Copilots - in Fabric there is a Copilot for notebooks which help you write Python (Pyspark) code to shape/transform data, a Copilot for Dataflows (which use the same Power Query UI and engine that I showed in Power BI desktop) for data ingestion/transformation/loading, as well as Copilots for SQL Warehouse and Real time Intelligence (Kuosto) if you are using these capabilities. To be clear, the Fabric Copilots aren’t necessarily doing the transformation for you, but they are helping to design the transformation steps for you to execute. Asking Copilot tools to perform transformation steps can be risky as it may not do exactly the same thing every time, although as Microsoft introduces the concept of AI Agents this is likely to improve over time. All these Copilots are automatically included as part of any Fabric capacity which is F64 SKU or higher. See this resource by Microsoft for more details on the Copilots. You can also refer to this webinar from the community to learn more about Microsoft Fabric.

Be part of our communities

Communities
Sustainability & Climate Change polaroid
Sustainability and Climate Change Community

Inspiration, insights and collective ambition for professionals delivering on sustainability and acting on climate change.

Communities
Small Practitioners polaroid
Ideal for small practitioners

Explore invaluable resources, content and networking opportunities exclusive to ICAEW members in practice.

Communities
Polaroids on pinboard
Explore our communities

Information, insights, guidance and networking opportunities on a range of industry sectors and at various stages throughout your career.

Open AddCPD icon

Add Verified CPD Activity

Introducing AddCPD, a new way to record your CPD activities!

Log in to start using the AddCPD tool. Available only to ICAEW members.

Add this page to your CPD activity

Step 1 of 3
Download recorded
Download not recorded

Please download the related document if you wish to add this activity to your record

What time are you claiming for this activity?
Mandatory fields

Add this page to your CPD activity

Step 2 of 3
Mandatory field

Add activity to my record

Step 3 of 3
Mandatory field

Activity added

An error has occurred
Please try again

If the problem persists please contact our helpline on +44 (0)1908 248 250