In a previous post we saw how formulas can solve a partial match with conditions. Now I will solve it using Power Query because my friend Kunle challenged me to do so 🙂
Requirements
We have 10000 rows of data like this:
Filter values:
Search words and State can be changed at any time.
Requirements:
- Count rows: State = “California”, “lamp” or “lamps” found within ProductName
- BONUS: Display all rows that qualify
Why Power Query?
Filters or formulas could solve this. Why Power Query?
Reapplying filters is tedious. On large data-sets formulas can be slow to calculate.
Power Query does the work in the background! Just refresh after changing filter values.
I once re-shingled a roof. I used a hammer but the pros use a hammer drill! That’s Power Query! Practice using the hammer drill.
Power Query Solution
Download my Excel file.. Here’s my YouTube video.
Solution Steps:
- 0:00 Requirements
- 0:24 Load Data into Power Query load 3 tables
- 1:24 Requirement Adjustment ‘lamp’ or ‘lamps’ NOT ‘clamps’
- 1:56 Prep text wrap with spaces, text to lowercase
- 3:34 Merge tables filters 10000 rows to selected state
- 4:36 Add SearchWord repeat search words beside rows
- 5:50 Search words found? function Text.PositionOf
- 6:39 Filter out rows if search words not found remove rows
- 7:26 Export rows to Excel sheet
- 7:44 Create Row Count query group by query counts rows
- 9:11 Export Row Count query to Excel sheet
Key Points
It’s important to point out a few things in my solution.
Join vs Cartesian Product
I used a merge (inner join) to connect tables ‘ProductData’ and ‘SelectState’. This filters ‘ProductData’ to those rows having the selected state. We select a single State at a time.
I used a CARTESIAN PRODUCT (cross join) to connect the search words! A cartesian product means there is no join! We get all combinations between tables ProductData and SearchWord. If one of the two tables is short then it’s fine. Our search word table only has 2 rows (‘lamp’, ‘lamps’). If both tables have 1000s of rows creating all combinations can crash your computer!
We start with 10000 rows but reduce it due to state selection. Then we double the remaining rows as we have two search words. Finally we filter out rows that don’t contain either search word.
Text.PositionOf
Power Query has menu options to clean & rearrange data but we still need functions. Power Query’s formula language is called M. Some functions are similar to Excel while others are different. Instead of Search or Find we use function Text.PositionOf. Read Microsoft’s M reference!
Advanced Editor
I like to see the individual steps on the right hand side of a query. Selecting them to walk through the query is helpful. However, I recommend looking at the M code that is generated.
Click ‘Advanced Editor’ and you’ll see M code like this below. It gets easier to read the more time you spend studying it!
let
Source = Table.NestedJoin(SelectState,{“Select State”},ProductData,{“State”},”ProductData”,JoinKind.Inner),
#”Expanded ProductData” = Table.ExpandTableColumn(Source, “ProductData”, {“Product Name”, “State”, “Product2”}, {“ProductData.Product Name”, “ProductData.State”, “ProductData.Product2”}),
#”Added Custom” = Table.AddColumn(#”Expanded ProductData”, “SearchWord”, each SearchWord),
#”Expanded SearchWord” = Table.ExpandTableColumn(#”Added Custom”, “SearchWord”, {“Search2”}, {“Search2”}),
#”Added Custom1″ = Table.AddColumn(#”Expanded SearchWord”, “FOUND?”, each Text.PositionOf([ProductData.Product2],[Search2])),
#”Filtered Rows” = Table.SelectRows(#”Added Custom1″, each [#”FOUND?”] <> -1),
#”Removed Columns” = Table.RemoveColumns(#”Filtered Rows”,{“ProductData.State”, “ProductData.Product2”, “FOUND?”}),
#”Renamed Columns” = Table.RenameColumns(#”Removed Columns”,{{“ProductData.Product Name”, “Product Name”}})
in
#”Renamed Columns”
Your Solution
Please share your thoughts or solution in the comments section below. I’ve learned Power Query from many different people including the readers of my blog 🙂
A special thanks to Kunle for challenging me to create a Power Query solution!
About Me
My name is Kevin Lehrbass. I live in Markham Ontario Canada.
This is my personal blog about Microsoft Excel.
Hi Kevin,
Great Solution.
Initially was wondering why the use of Text.PositionOf() function and NOT Text.Contains() function.
On checking the documentation (https://docs.microsoft.com/en-us/powerquery-m/text-positionof) , I discovered that Text.PositionOf() function with occurrence and comparer options is more robust in this instance. 👍
Thanks for sharing !
Hi Kunle,
I’m sure that Text.Contains() function is also very helpful. Thanks for pointing that out and thanks for reading!
Cheers,
Kevin
Pingback: Partial Match Count with a Condition | My Spreadsheet Lab