Query Folding in Power Query isn’t an entirely new concept but when I first heard of it, I thought “What is Query Folding”? There are probably a number of you out there who think the same, so let me explain it.
When carrying out transformations (e.g. Sort) in Power Query, it is possible that some will be sent back to the source. In other words, Power Query doesn’t attempt to sort millions of records within the application itself, but will use the source to undertake the work. This can drastically improve performance within Power Query.
Examples
I will now demonstrate some basic Query Folding techniques in Power Query. Until recently, it wasn’t easy to know if a transformation was performing Query Folding. This blog primarily talks you through monitoring outside of Power Query, but I will also touch upon how this is now possible to do within the application.
External Monitoring Tool
The monitoring tool used in this example if SQL Server Profiler. For a simple video walkthrough on how to use Profiler, click here.
1. Start a profiler session, ensuring you connect to the server where the AdventureWorksDW2014 Database resides – available to download here.
2. Now we can connect to the database in Power Query and more specifically to the dbo.FactProductInventory table.
3. I will perform three Power Query transformations:
a. Filter Rows
i. ProductTotal > £50000
b. Group By
i. DimProduct.EnglishProductName
ii. SUM(Unit Cost) As ProductTotal
c. Transform > Trim
i. DimProduct.Status
NOTE: This blog does not show you how to carry out the Power Query transformations but the workbook can be provided on request.
4. The transformed query looks like the below:
5. Now let’s take a look.
Points of Note:
1. The Filter Rows Power Query transformation on TotalUnitCost is implemented as a WHERE clause in SQL Server.
2. The Group By also becomes a T-SQL GROUP BY. With the SUM function also being applied to find the total value for the EnglishProductName Group By.
3. The T-SQL functions LTRIM/RTRIM are combined together on the Status column, replicating the functionality of the Trim feature in Power Query.
Within Power Query
Rather than show you this feature myself, I will point you to a great article by Chris Webb, which describes how you can easily check for Query Folding within your Power Query workbook.
In summary, there is an out of the box M function called GetMetaData that can be nested within your transformations. This will tell you the source of your query and if it has been query folded.
Supported Transformations
Here are some other transformation types in Power Query that support Query Folding:
– Filtering (on rows or columns)
– Joins
– Aggregates and GROUP BY
– Pivot and unpivot
– Numeric calculations
– Simple transformations, such as UPPER
Current Limitations
Although there is still no official list (from Microsoft) of the sources that support Query Folding, here are some of the ones currently known:
– Relational sources (SQL Server, Oracle).
– OData sources (e.g. Azure Marketplace)
– Active Directory
– Exchange
– HDFS, Folder.Files and Folder.Contents
This means all a lot of other data sources do not support Query Folding. For example, a flat file does not support Keep Top 5 Rows in Power Query, whereas SQL Server would use the TOP function or apply a filter in a WHERE clause. Whilst the lack of supported sources can be seen as a limitation, I would argue that any form of Query Folding, even if it were just SQL Server, is a big plus point.
You can deliberately prevent query folding, although the only time I see a benefit is if you are connecting to a server running at full capacity or during a large ETL. There are far more instances when Power Query will not apply query folding, because of limitations to the sources functionality. Check out Ken Verbeek’s blog if you would like know more on this.
Conclusion
This article is just scratching the surface of how beneficial Query Folding in Power Query can be. The key takeaway is remembering to put the transformation steps (that can be folded back to source) at the beginning. Steps that cannot be folded should be applied as late as possible. If this is adhered to, you will see huge performance benefits – especially on large datasets.
If anyone out there would like to share other data sources that use Query Folding or just have a general interest, feel free to comment below.
How Artificial Intelligence and Data Add Value to Businesses
Knowledge is power. And the data that you collect in the course of your business
May
Databricks Vs Synapse Spark Pools – What, When and Where?
Databricks or Synapse seems to be the question on everyone’s lips, whether its people asking
1 Comment
May
Power BI to Power AI – Part 2
This post is the second part of a blog series on the AI features of
Apr
Geospatial Sample architecture overview
The first blog ‘Part 1 – Introduction to Geospatial data’ gave an overview into geospatial
Apr
Data Lakehouses for Dummies
When we are thinking about data platforms, there are many different services and architectures that
Apr
Enable Smart Facility Management with Azure Digital Twins
Before I started writing this blog, I went to Google and searched for the keywords
Apr
Migrating On-Prem SSIS workload to Azure
Goal of this blog There can be scenario where organization wants to migrate there existing
Mar
Send B2B data with Azure Logic Apps and Enterprise Integration Pack
After creating an integration account that has partners and agreements, we are ready to create
Mar