Query Folding in Power Query isn’t an entirely new concept but when I first heard of it, I thought “What is Query Folding”? There are probably a number of you out there who think the same, so let me explain it.
When carrying out transformations (e.g. Sort) in Power Query, it is possible that some will be sent back to the source. In other words, Power Query doesn’t attempt to sort millions of records within the application itself, but will use the source to undertake the work. This can drastically improve performance within Power Query.
Examples
I will now demonstrate some basic Query Folding techniques in Power Query. Until recently, it wasn’t easy to know if a transformation was performing Query Folding. This blog primarily talks you through monitoring outside of Power Query, but I will also touch upon how this is now possible to do within the application.
External Monitoring Tool
The monitoring tool used in this example if SQL Server Profiler. For a simple video walkthrough on how to use Profiler, click here.
1. Start a profiler session, ensuring you connect to the server where the AdventureWorksDW2014 Database resides – available to download here.
2. Now we can connect to the database in Power Query and more specifically to the dbo.FactProductInventory table.
3. I will perform three Power Query transformations:
a. Filter Rows
i. ProductTotal > £50000
b. Group By
i. DimProduct.EnglishProductName
ii. SUM(Unit Cost) As ProductTotal
c. Transform > Trim
i. DimProduct.Status
NOTE: This blog does not show you how to carry out the Power Query transformations but the workbook can be provided on request.
4. The transformed query looks like the below:
5. Now let’s take a look.
Points of Note:
1. The Filter Rows Power Query transformation on TotalUnitCost is implemented as a WHERE clause in SQL Server.
2. The Group By also becomes a T-SQL GROUP BY. With the SUM function also being applied to find the total value for the EnglishProductName Group By.
3. The T-SQL functions LTRIM/RTRIM are combined together on the Status column, replicating the functionality of the Trim feature in Power Query.
Within Power Query
Rather than show you this feature myself, I will point you to a great article by Chris Webb, which describes how you can easily check for Query Folding within your Power Query workbook.
In summary, there is an out of the box M function called GetMetaData that can be nested within your transformations. This will tell you the source of your query and if it has been query folded.
Supported Transformations
Here are some other transformation types in Power Query that support Query Folding:
– Filtering (on rows or columns)
– Joins
– Aggregates and GROUP BY
– Pivot and unpivot
– Numeric calculations
– Simple transformations, such as UPPER
Current Limitations
Although there is still no official list (from Microsoft) of the sources that support Query Folding, here are some of the ones currently known:
– Relational sources (SQL Server, Oracle).
– OData sources (e.g. Azure Marketplace)
– Active Directory
– Exchange
– HDFS, Folder.Files and Folder.Contents
This means all a lot of other data sources do not support Query Folding. For example, a flat file does not support Keep Top 5 Rows in Power Query, whereas SQL Server would use the TOP function or apply a filter in a WHERE clause. Whilst the lack of supported sources can be seen as a limitation, I would argue that any form of Query Folding, even if it were just SQL Server, is a big plus point.
You can deliberately prevent query folding, although the only time I see a benefit is if you are connecting to a server running at full capacity or during a large ETL. There are far more instances when Power Query will not apply query folding, because of limitations to the sources functionality. Check out Ken Verbeek’s blog if you would like know more on this.
Conclusion
This article is just scratching the surface of how beneficial Query Folding in Power Query can be. The key takeaway is remembering to put the transformation steps (that can be folded back to source) at the beginning. Steps that cannot be folded should be applied as late as possible. If this is adhered to, you will see huge performance benefits – especially on large datasets.
If anyone out there would like to share other data sources that use Query Folding or just have a general interest, feel free to comment below.
Meet the Team – Catherine Sachdev, Marketing Assistant
Next up we’re introducing you to Catherine Sachdev. Catherine joined us just over a year
Jan
Data Lineage with Azure Purview
I wrote an introductory Purview blog post previously, where I explored what the tool is
Jan
The Next Era of Retail: How Technology is driving change in a COVID-19 World
The retail sector is of great importance and accounts for almost 5% of GDP and
Jan
Meet the Team – Alex Kordbacheh, Junior Consultant
It’s time for another Meet the Team blog! This time we’re introducing you to Alex
Dec
Use cases for Recursive CTEs
Introductions Recursive CTEs are a way to reference a query over and over again, until
Dec
Azure Sentinel is named a ‘Leader’ in the Forrester Research Wave Report
Microsoft have recently announced that they have been named a Leader by Forrester Research in
Dec
Getting Started with Azure Purview
Azure Purview (a.k.a Data Catalog gen 2) has been released into preview and is currently
2 Comments
Dec
An Introduction to ApexSQL Complete – Integration with SSMS
We all know that the idea of add-ins is to make our lives easier. In
Dec