Incremental refresh in #PowerBI with data from BigQuery

Back in February 2020 the Incremental refresh in Power BI became GA (link) and supported in Power BI Pro as well.

I have not had a chance to use this a lot yet but in a recent project for a client we were working with a large dataset 500+ mill. rows of historical search data stored in Big Query (yes Google BigQuery 🙂 ).

One of the prerequisites for Incremental Refresh to work is that your Power Query can be query folded – ie. sent back to the source system as a native query from Power Query so all can be folded into a query understood by the source.

You will find the documentation here.

But let me take you through the steps if you want to do this with a query against Google BigQuery.

The data

I am using the flight dataset that you are guided through creating when you create a new project in BigQuery.

It contains 7,95 gb of data and 70.588.485 rows with 10 years of flight data from jan 2002 until dec 2012. (To work with incremental refresh I will add 8 years to all dates)

Setting up incremental refresh for this dataset

Open a new pbix file and select Get data in the ribbon – filter the connectors by big to find the Google BigQuery connector.

You will then be prompted to signin to BigQuery and then presented by the projects you have access to.

When you select the tables or views you want to import – hit “Transform data”

As the BigQuery connector also supports direct query you will be prompted whether you want to use import or direct Query mode – in this case we want to import the data – so I select Import.

This will open the Power Query editor and display all the columns and the data for the first 1000 records

Native Query greyed out – Why ?

If you have worked with query folding before you properly know that you can view the native query constructed via the shortcut menu “View Native Query”

But when we use the BigQuery connector it is greyed out – but let’s check whether it does fold or not.

You can see the query constructed in Query History pane in Google Console for BigQuery

And the query folds nicely – at least until we do Power Query transformations.

Let’s setup DateRanges for incremental refresh

In order to do a incremental refresh we need to create two parameters in our PBIX – called RangeStart and RangeEnd

Remember the parameters must be Date/Time Type.

In our flights query – we then add a filter to the date column.

But we can’t really select the parameters

This is because the date column is the datatype – Date and not Date/Time.

So, let us change the date columns datatype to Date/Time

The rows in the table should be filtered like this

REMEMBER that only one of the filters must have the “equal to” – otherwise you risk two partitions include the same period.

The Native query is still grey.

But when we check the query history in BigQuery we can see that it is nicely folded back to the engine

Let us build the simple datamodel and setup the incremental refresh

Clicking the Close & Apply in the editor we start the import of the filtered information

After importing the data, I create a simple measure that calculates the number of records in the flights table.

And we can start visualizing the data even more using the other fields in the dataset

If we publish the dataset now – the only data we will ever get in model is for January 2003 (2011) – but we want more data so we have to setup incremental refresh so when the data is refreshed in the service it will contain all the data for the period we want in the model.

!!!! – As the refresh uses the date/time value of the current date I have modified the data by adding 8 years to all the records so it (And even that is folded back)

I then select Incremental refresh for the flights table by selecting Incremental refresh in the shortcut menu.

This opens the following dialog.

Don’t worry about the warning in the top – this is due to “native query” being greyed out as the connector apparently doesn’t return the information to Power Query – but as you have seen previously we do know that the query is folded.

I set the model to contain

8 years of date in the column “date” and the model should only refresh the latest rows within the last month.

When you click Apply All you cannot really see nothing happened – but if you open the dialog again you will see it contains the settings you have set.

Publishing the model to the service

When the model is published it

The dataset will before refresh the first time will only contain the data, we have imported in the pbix

We need to set the credentials in the data

And set a time of day where you want to refresh the dataset.

We can activate the first refresh by clicking the refresh button next to the dataset name

Or you can republish the pbix file to the service – this will also activate a refresh. This refresh didn’t happen with the first publish as you need to specify the credentials to BigQuery.

Successful refresh

After the refresh has finished, we can see the total number of rows has increased to 56.007.003.

And the period included is eight years plus current year and as we selected 1 month of update the dataset will include all the dates in may 2020. In your case you might only have data until Today() and this will then be included in your dataset.

The first refresh took almost 20 minutes.

But when we hit refresh again –

The refresh only took 1 min and 47 seconds as it only had to update the data for May 2020.

No premium

All of this is done without a Premium capacity – 👍

With a premium capacity you get a XMLA endpoint with which you can do even more – this is perhaps another blogpost worthy – so leave a comment if you want me to write a post about this.

Power On !

Update – check out this follow up post from Daniel Lennartson – https://www.linkedin.com/pulse/partitioned-tables-bigquery-power-bi-incremental-daniel-lennartsson/?trackingId=maXDCJvRSgaan6uAZHlBqg%3D%3D – describing how to use partitioned table to limit the cost.

DAX Time intelligence and the 29th of February – #PowerBI

Yesterday I visited a client and was asked – how do the time intelligence functions handle the fact that February has 29 days in 2020.

Well – in fact there was a few surprises depending on what you select from you date table.

Let’s look as some examples – I will use the following Internet Sales Amount from the years 2011-2013 from Adventure Work Database where we in February 2012 have 29 days.

As you can see, we have the year 2012 where we have 29 days.

SAMEPERIODLASTYEAR()

In order to calculate Internet Sales Amount LY – I use the following

Internet Sales Amount LY = CALCULATE([Internet Sales Amount];SAMEPERIODLASTYEAR(DimDate[Date]))

Which works fine

But notice the behavior if we put dates or days numbers on the rows

SURPRISE – Internet Sales Amount LY will show the value for the 28th of February 2011 instead of a blank value as you perhaps would expect

If you select year 2013 we will see this

The 29 of feb 2012 will “disappear” but the total for February will include the number.

DATEADD() – last year

If we use the function DATEADD instead – it will work exactly the same way.

IAS LY = CALCULATE([Internet Sales Amount];DATEADD(DimDate[Date];-1;YEAR))

DATEADD() – same day last year

If you want to compare the same Saturday (the 29th of feb 2020 is a Saturday) last year – which is the 2nd of march we can do this by using the same DATEADD function but with different parameters

IAS LY same weekday = CALCULATE([Internet Sales Amount];DATEADD(DimDate[Date];-364;DAY))

This will compare the same day 52 weeks ago (52 * 7 = 364) and there by giving us the value from the 29th of feb 2012 on the 27th of feb 2013.

DATESMTD()

Now what about the function DATESMTD()

ISA MTD = CALCULATE([Internet Sales Amount];DATESMTD(DimDate[Date]))

ISA MTD LY = CALCULATE([Internet Sales Amount LY];DATESMTD(DimDate[Date]))

These functions will calculate the running total for the month for the given day number

Notice that the ISA MTD works fine in 2012 for the 29th and the LY measure will show the same result for the 28th and 29th in 2012 – and in 2013 it will actually for the 28th show the sum of both the 28 and 29th 

Conclusion

You might find that some users find it difficult to understand how the calculations works when the look at dates instead of month totals especially in the case where they will get the same value for LY on both the 28 and 29th in 2012/2020.

If you compare cross years on calendar dates I find the result that SAMEPERIODLASTYEAR() returns makes better sense than leaving it empty/blank but what do you or your users think. Let me know in the comments.

Hope you find this little walkthrough useful.

And remember to ALWAYS use a datetable in your model if you do time intelligence calculations in DAX.

#PowerBI – A quicker way to set Summarize to none for all your fields in your date dimension (or others)

When designing your model you will in most cases need a date dimension and one of the things you always should do is set the Summarization of the number columns in your table to none instead of the default Sum – to avoid this when you insert for instance Calendar year in your visuals

So set the summarization to Don’t summarize to avoid this.

But you have to do this for all the number columns in your date table – –

but if you use the Model view instead –

We can multi select columns and set it for all at once

In the model view we have to select None to avoid summarization of number fields – and voila all fields in my date table is set to don’t summarize

Hope this can help you as well.

#PowerQuery – Control the expand columns so it includes new columns

Image a scenario where your column in your PowerQuery that contains a table with a set a columns that you know at some point will have more columns added.

In the case above I know that we at some point will add more columns in the merged ProductAttributes table.

How can we make this dynamic using PowerQuery

When we click the icon for expanding the table, we might just select this and move on

But notice the formula created in

It says

= Table.ExpandTableColumn(#”Merged Queries”, “ProductAttributes”, {“Brand”}, {“Brand”})

This means that even though we might add new columns to the ProductsAttributes table – it will still only be Brand that is expanded and only that column.

The bolded arguments is 2 lists that contains the Column names to expand and the new names of the columns – the last argument is optional so we can actually skip that if we want the original names – https://docs.microsoft.com/en-us/powerquery-m/table-expandtablecolumn

Now by changing the formula to this

= Table.ExpandTableColumn(#”Merged Queries”, “ProductAttributes”,List.RemoveItems(Table.ColumnNames(#”Merged Queries”[ProductAttributes]{0}), {“ProductKey”})
)

We can make the table dynamically expand when adding new columns in the table ProductAttributes

We get the new column included as well

The magic formula does this

Table.ColumnNames(#”Merged Queries”[ProductAttributes]{0})

Will return a list of column names from the step before
expansion (note I use the step name and column name) – and I use the {0} to extract the column names only form the first row – otherwise the formula will fail.

But as we cannot have the same column names twice (i.e. ProductKey needs to go away) so we need to use the List.RemoveItems functions

List.RemoveItems(Table.ColumnNames(#”Merged Queries”[ProductAttributes]{0}), {“ProductKey”})

Thereby removing the ProductKey Item in the list

And this means that when we get more columns in the table “ProductAttributes” table they will automatically be included in the expanded columns

Hope this can help you power queries even more dynamic.

Here is an example file – Link

Power Query On !

Adding keyboard shortcuts to your favourite buttons in #powerbi

If you find yourself clicking the same button in the Power BI Desktop you might find this little tip useful.

When you right click a button in the ribbon – you can add it to the Quick Access Toolbar.

This will add the button at the top (unless you have positioned it below the ribbon)

The button in the quick access toolbar can now be activated using a shortcut key combination.

When you press ALT the buttons will be labelled with numbers

So, pressing ALT + 7 will I my case open the Query Editor –

You can use the same technique in the Power Query Editor so by pressing ALT + 2 in my Query Editor will open the Advanced Editor

Hope this can help you too and do things even faster in Power BI with fewer clicks.

#powerbi Report to browse and watch sessions from #mbas 2019 using the #powerapp visual

Unfortunately I wasn’t able to participate in the Microsoft Business Application Summit this year – but luckily we can watch all the session via https://community.powerbi.com/t5/MBAS-Gallery/bd-p/MBAS_Gallery

But that would mean I had to leave Power BI Desktop in order to search and watch the videos – and the website also made it hard to see the distribution of sessions between categories.

So, I created this –

And to watch the video from within PowerBI I created a drill through page where I used the PowerApp visual to be able to show the

As none of the Microsoft standard visuals can play videos from within a report – I created a power App to show the video based on the selected video link.

If you want to embed this huge resource of learning material in your own tenant you can download the elements from here

The Desktop file – link

The Power App – link

If you are interested in learning how I scraped the web page for all the relevant data – check out these functions to extract data from pages using CSS query capabilities in the power query function Html.Table

Highly inspired by this blog post by Chris Webb – https://blog.crossjoin.co.uk/2018/08/30/power-bi-extract-urls-web-page/

Move your pbix datamodel to Azure Analysis services – #powerbi

After the Azure Analysis Services web designer was discontinued per march 1 2019 – link – there is no official tool to do a move of a PBIX datamodel to Azure Analysis Service. But by using a few different tools we do have ways of doing it anyway.

Step 1 – DAX Studio

Open your PBIX file in the Power BI Desktop and then open the DAX Studio (link to download) and connect DAX studio to the PBI model

In the status bar you will see the local port of the Analysis model running on your machine

Step 2 SQL Management Studio

Connect to Analysis Services using the server address from Step 1

This will give you a connection to the model

Now right click the Database and choose to script the Database to a new Query Editor window (or the clipboard)

Step 3 Connect to Azure Analysis Services

Use the SQL Server Management Studio to connect to Azure Analysis Services

Select to run a New XMLA Query on the server

And paste the query created in Step 2 in the new Query window – You can specify a model name in the highlighted area

Run the Query – and after a few seconds you should get this result – that the query has completed successfully.

And after a refresh of the server object you should see the newly scripted data model

Step 4 Finish the move

Now depending on the data sources in your model you need to setup the necessary connections and gateway.

And use your favourite Analysis Services tool to modify model – for instance the Tabular editor ( link )

You will have to go through all your data sources and Tables to make sure the connections and M-Scripts are functioning.

Tip

Before you script your PBIX data model – turn off the Time intelligence in your current file – otherwise you could get a lot of extra date/time tables in your model

E-mail the selected record in #powerbi with #powerapps and #flow and include a CSV File with the records

One of my clients called me the other day and asked whether it was possible to export the selected order that was selected in the current report page – as she wanted to send the information to another user. I explained the export data feature from the visual action menu but she didn’t want to download a file and then locate that and then switch to Outlook and click new mail – type the correct the e-mail and attach the file – that was not very Power like – to much clicky clicky – because all the data was actually available when she had filtered the report for that particular record – the e-mail she wanted to mail the data to and off course the data she saw on the screen.

Hmm… Let’s see how we can use the PowerPlatform stack to solve this requirement.

A Power BI report to use a the demo

Based on the AdventureWorks database I build a report to select a Customer Key and filter it by a specific SalesOrderNumber

So, when the user has selected a sales order we should be able to send the information about the current sales order to either the customer or a alternate e-mail address.

PowerApps to works

Well we do have visual to integrate a PowerApp in our report and even though its in preview and there are some hickup’s we can make it work

But in order to insert it we do need to do this in a browser so let’s publish the report.

After I publish the report to the service, we can switch to edit mode and insert the PowerApp visual in the service.

This will introduce to a start screen with instructions on how to get started.

The important part is to include the fields from the datamodel you want to have access to in the PowerApp and this is done by adding the fields to the “PowerApps data” sink in the visuals pane.

As soon as you add the first field the PowerApp visual will change and let you select either to choose and app or create a new app

I added all these fields as I want to be able to access them in PowerApps

You have to use either Chrome or Edge browser to do this and keep calm – you will experience difficulties selecting the PowerApp visual with the mouse if you deselect it – try using the tab to rotate through the visual selection instead.

When all the fields is added we are ready to create a new app – well depending on the browser behaviour – so if it doesn’t work in Edge try in Chrome 🙂 – so click new app and a new tab with PowerApps will appear

Notice that a Gallery control is inserted and its linked to “‘PowerBIIntegration’.Data” – which is the filtered records in PowerBI –

If I modify the Gallery Title field to show SalesOrderNumber instead – you can see the magic

As is selected in PowerBI

Let’s modify the PowerApp

We don’t need the gallery control – so I delete this and insert a couple of other controls

So, I insert

  • Label to show sales order number
  • Text box to enter an email address
  • Check box to select whether to include a csv file
  • A mail icon for mail using office365
  • A mail icon for mail using flow and include the csv file
  • HTML text control to construct the mail body

One of the first thing I get trapped in is when showing the SalesOrderNumber in the label control.

The PowerBIntegration object is a table with records so you have to refer to a specific record/row to get the information from the field.

Therefore, use the FILTER function or in this case the FIRST function to get the value you need.

Let’s construct a message body

We can do this by constructing a HTML text with this formula

“Dear ” & First([@PowerBIIntegration].Data).EmailAddress & “<br><br>” &

“<b> Sales Order Number: </b>” & First([@PowerBIIntegration].Data).SalesOrderNumber & “<br>”

& “Products: “

& “<table width=’100%’ border=’1′ cellpadding=’5′ style=’border:1px solid black; border-collapse:collapse’>” &

“<tr style=’background-color:#efefef’>

<th>Model</th> <th> Sales Amount </th>

</tr><tr>” &

Concat(PowerBIIntegration.Data, “<td>” & ModelName & “</td> ” & “<td>” & ‘Sales Amount’ & “</td>”, “</tr><tr>”)

& “</table>”

This will give us this table in the HTMLText control – you can expand the table to your needs.

Let’s send an email

In order to send an email without a CSV file we can use the built in Office 365 Outlook datasource via View and Data sources

After adding this we can add a formula to the OnSelect property of the mail icon.

Test the function by clicking the run button

And hit the envelope – and

We get a notification

And after a while an email

Does it work from PowerBI – yes

You have to give the app an name and save it (and publish) – and switch back to your Power BI report

And your PowerApp visual will now show the newly created app in the visual – and when we select another customer and sales order – we see the app values changes.

And when clicking the send mail icon, we get a notification and an email.

Now the tricky part – include a CSV file

In order to do this, we need help from another member of the PowerPlatform family – Microsoft Flow – this is because we cannot include files or create files using the Office 365 Outlook connector.

This can all be done using the flow – and this is the part that took me the longest time to solve.

First let’s connect a flow to the second envelope on the PowerApp screen –

You can either create a new flow or use an existing.

As this blog post is a bit long, I will show you how the final flow looks like and take you through the steps/actions in the “CreateCSVFile” flow

When I add the flow, you see that the run statement has 4 arguments

These are added in the flow via the “Ask in PowerApps” in the different steps.

Let’s look at the flow

The flow has 5 steps

The first step “PowerApps” has no settings but builds the connection to the PowerApp “click”

Next step is creating a variable called csvdataasjson.

And the argument value contains a string value that is provided by the call from PowerApps.

This will actually be a JSON string containing the records we want in our CSV file.

Next step “Parse JSON” takes the value of the variable and uses the dataoperation Parse JSON to parse the JSON string into an object

Remember to update the Schema if you change what you send to the flow via PowerApps.

The next step “Create CSV table” takes the output from “Parse JSON” and can convert the JSON to a CSV table

I have used the advanced options to show the headers.

The final step uses the “Send an email (V2)” action to send and a email

The To, Subject and Body argument uses the “Ask in PowerApps” to create arguments we can use when we run the flow in PowerApps

Call the flow from PowerApps with arguments

In PowerApp we can now activate the flow with arguments from the data using the OnSelect action on the second Mail icon

Notice the arguments the Run statement matches the PowerApps variable names in the Flow.

The initialize_value contains the a formula to construct a JSON string – if you need to include more fields from the PowerBI Dataset you simply extend the string construction.

If you check a flow run you can see the result in the Value box

Now save and republish the app and lets see if it works via PowerBI.

Let’s mail a CSV file

– A tip when moving back and forth from PowerApps to PowerBI – switch between report pages to refesh the PowerApp visual – this updates the app better than refresh of the page.

When clicking the second envelope we will activate the flow via PowerApps

And after a few seconds I receive an email

Including a CSV file with the Orderlines

To summarize

By using the power of each element of the PowerPlatform we can send an email with the selected record/s in PowerBI – but it might be a bit advanced for the citizen developer.

  • Power BI to find the data
  • PowerApps to mail a simple e-mail with the information
  • Flow to include a CSV file in the mail

Please vote for this idea if you want to make it easier to create the JSON string in PowerApps – https://powerusers.microsoft.com/t5/PowerApps-Ideas/Add-JSON-stringify-or-something-similar-for-debugging/idc-p/268352#M25884

And let me know if you have questions or feedback to my method – POWER ON !

Guide – How to import data from Eurostat directly into #PowerBI

I follow EU Eurostat on twitter (link – https://twitter.com/EU_Eurostat ) and often see a lot of interesting facts and infographics like this one.

And I have for a long time wanted to see if I could use the webservices that Eurostat also provides (link – https://ec.europa.eu/eurostat/data/web-services) to import the data directly into Power BI.

So here is a guide on how you can do it – and the example will try to extract the data for the Orange production Infographic.

There is a LOT of different datasets in Eurostat and this guide should work on most of them – you just need to find the dataset (https://ec.europa.eu/eurostat/data/database) in the catalogue.

Construct the Query

The REST request we need to construct is defined like this

So, we need to find the datasetCode and specify the filters.

You can find the dataset code by browsing the data catalogue – and the dataset code is stated at the end.

If you need specific items the data explorer you need to specify the code of the items in the request and the Dataexplorer is a good way to find these.

Using the codes we have found we can now use the Query builder to construct our query (click on the picture to try it out)

So after entering the dataset code we get the option to specify the filter and select measures under strucpro

Notice that I have selected Yes to Exclude EU aggregates

The query option will vary from dataset to dataset but the principles are the same.

Clicking the “Generate query filter” will give you

And you can copy the dataset code to clipboard

apro_cpsh1?filterNonGeo=1&precision=1&crops=T1000&strucpro=AR&strucpro=PR_HU_EU&time=2014&time=2015&time=2016&time=2017

Now we have the filter part and this can of course be parametrized in your Power Query.

And we must add

http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/

before so the full web query is

http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/apro_cpsh1?filterNonGeo=1&precision=1&crops=T1000&strucpro=AR&strucpro=PR_HU_EU&time=2014&time=2015&time=2016&time=2017

In order to get the data into Power BI we choose get data and select the Web connector

And press OK

This will return the following to the query editor

Convert the JSON response to tables in Power Query

We get a JSON response returned from the Web query and this has to be transformed into a table – and in order to get that we need to understand what and how the data is returned.

When we use the query in the browser you can see that its not a normal structured JSON document.

And what we want is a table with the following fields – Country, Year, Area, Production

I start by renaming the query to “QueryResult” and disable the load – and the following query will use this as a reference

So lets create the Geo dimension

By clicking the Record on the dimension field

Drill down to geo

Down to category

And to label

Turn it into a table

And then add an index column starting from 0 and rename the columns

I then do the same for the other dimensions and end up with the following queries

Getting the values

Now in order to get the values we need to do a little more

The web query returns a one long list of values when converted into a table.

Values contains all the “values” in the grid when

Each cell in the table is referred by a running index starting from 0 to Facts multiplied by Geo by Time.

So when we have 2 facts, 38 countries and 4 years will give us 2 * 39 * 4 = 304 – this should be the number of rows in our Values table.

But when we look at the Values before we do anything we have only 282 rows.

The missing Values in because cells with missing values (represented by a : or :z ) is located in the Status field in the QueryResult.

So we have to add the missing rows from this Status – this gives us the missing 22 cells (304-282)

And we then convert these Values to null

In our values Query we want to append these rows to our table – and we can do this by modifying the step – by modifying the expression to

= Record.ToTable(value) & MissingRows

And we rename and change the Name column to Index and change the data type to an integer.

The index column is our number reference to each cell in the table/matrix from EuroStat.

Next step is to calculate the keys for time, geo and Facts.

To calculate the TimeKey we add a column

And divide it by the number to time periods

This gives a step with the following expression

= Table.AddColumn(#”Added Index”, “TimeKey”, each Number.Mod([Index], 4), type number)

And after click OK we can make it more dynamic by modifying the 4 to the number of rows in

= Table.AddColumn(#”Added Index”, “TimeKey”, each Number.Mod([Index], Table.RowCount(Time)), type number)

And now we have the TimeKey for each row.

To add the FactKey we add another calculated column

= Table.AddColumn(#”Inserted Modulo”, “FactKey”, each Number.IntegerDivide([Index], Table.RowCount(Geo)*Table.RowCount(Time)), Int64.Type)

This will give us the factkey and we can see it shifts when we reach row 152 – time count = 4 and geo count = 38 = 152

Now we need the final key column is the geoKey

= Table.AddColumn(#”Inserted Integer-Division”, “GeoKey”, each Number.IntegerDivide([Index], Table.RowCount(Time) ) – [FactKey]*Table.RowCount(Geo), Int64.Type)

And we are now ready to load the data into our data model.

The data is ready to model

After closing the query editor we get the tables returned to Power BI Desktop and we can finalize the datamodel

And create the relationships between our tables.

And create some measures

And start to visualize the Oranges in Europe

One interesting fact is that the Orange trees are highly productive in Albania.

And the cool part is

When 2018 figures is available in EuroStat – we just modify our query to

http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/apro_cpsh1?filterNonGeo=1&precision=1&crops=T1000&strucpro=AR&strucpro=PR_HU_EU&time=2014&time=2015&time=2016&time=2017&time=2018

and refresh our model.

What do you think?

You can download a copy of the pbix file – here.

Did you like it then give the post a like or a comment – let me know what you think – hope you find it useful.

And at least take a spin around Eurostat to what interesting facts that you can find in the many datasets that are provided to you for free.

Creating a Power Query function browser – #powerbi

One of the best functions in Power Query (M) is for sure the function =#shared that inside the Query editor lists all the functions in M and in your current pbix file.

Picture 1 Result of =#shared

But sometimes you do want to check out the documentation online and then you have to leave Power BI desktop – so I thought would it be possible to use Power Query to scrape the documentation (link) – and create a Power BI Report in which I could browse the documentation.

And because the documentation is well structured is turned out to be quite easy

Let’s start by getting all the function categories

The start page of the Power Query M function reference contains a nice list of all the categories

So begin with we activate a new query with data from the web

And by using the new feature “Add tables using examples” we get a lot of help

Just type in the first 3-4 elements in the “In this section” list and we get a nice result

Click on the edit button and we have a nice list of the categories in our Query editor

Now each of the categories points to a specific web page

And the address for all of them is the name and the hyphens instead of spaces so I do a simple replace values where we add a new column with the relative path

And to get the function category URL I add a parameter called PQ Documentation URL that points to the relative path and that I can use in my other queries as well later.

And then add a column where I concatenate the path and the name of each categories relative path

And now we have a reference to all the pages of the Power Query function categories.

Let’s create a function to extract all the functions in a category

Now to create a table of all the functions I create a new query and reference the first Query I made

And the idea is then to create a custom function that returns a table of all the functions within a category

Each page of contains tables of the functions

And when creating functions I prefer to create a query that works and then turn it into a function so let’s dig in – and the navigator sometimes tries to help us a little bit to much so I rightclick the address and choose edit.

And we get a nice structure of the html page.

We only want the tables and as you can see in the preview each table is actually a list of the functions

Expand the Custom1 column table

And voila – a nice list of all the List functions in this case is returned

The other columns can now be removed, and we are ready to turn this into a function

Open the advanced editor and modify the query from

To

And we can use this function to add as a column in our function query to get all the functions in the different categories

So now we have all the categories and the functions within

Now to the hard part – get the description of each function into the model

Each of the function has a specific page – in this example the List.Accumulate

The address of each of these function pages is the name of the function https://docs.microsoft.com/en-us/powerquery-m/list-accumulate

With a hyphen instead of the . (dot) – so constructing the address is quite easy.

But I only wanted to extract the data within the black square.

By looking at different pages I found out that all of the information I wanted was located in a specific named element <pre>

So on each page I have to locate that tag and then the get all the html for there to a named element called </main> that marks the end of the area.

Now when we use the Web query interface we get a lot of help but in this case we do not want the help

So after clicking Edit in the navigator menu

We get this – and this is because the Web.Contents is wrapped in the Web.Page function

So by modifying it to Text.FromBinary instead we get the html in text returned instead

Now we can click the Fx to create a new step to find the <pre> tag

And another step where we still refer to Source and find the position of </main>

Now as the final step – we extract the html from the source by using the Custom1 and Custom2 value with some offset

And we have the HTML from the page that we wanted.

And we can turn it into a function

Finally I reference the previous query and create the query with all the syntax and example text

And we are ready to create the data model

Now click close and apply and create a datamodel with the following relationships

I created a measure to the create a default URL to the documentation if a category isn’t selected

And sat the data category to the different URL’s as Web’s url

And the rest is slicers and presentations

To present the HTML extracted from the function page I use the custom visual “HTML Viewer”

And the final report looks like this

And now I can hit refresh and have an updated Power Query Documentation report

Example file and comments

I hope you find this useful and if so please like or comment if you have ideas or feedback.

Link to the pbix file is – here

Or if you just want to browse the report – https://app.powerbi.com/view?r=eyJrIjoiMDczMGE5NjMtMjA4Zi00NmE4LTlmODAtZDRhMWQzMTk3ZmUzIiwidCI6ImUwN2Y3YzBiLTk5NWItNDg0NS1iZWJhLTQxNmYwZDYxZTFlMSIsImMiOjh9