Incremental refresh in #PowerBI with data from BigQuery

Back in February 2020 the Incremental refresh in Power BI became GA (link) and supported in Power BI Pro as well.

I have not had a chance to use this a lot yet but in a recent project for a client we were working with a large dataset 500+ mill. rows of historical search data stored in Big Query (yes Google BigQuery 🙂 ).

One of the prerequisites for Incremental Refresh to work is that your Power Query can be query folded – ie. sent back to the source system as a native query from Power Query so all can be folded into a query understood by the source.

You will find the documentation here.

But let me take you through the steps if you want to do this with a query against Google BigQuery.

The data

I am using the flight dataset that you are guided through creating when you create a new project in BigQuery.

It contains 7,95 gb of data and 70.588.485 rows with 10 years of flight data from jan 2002 until dec 2012. (To work with incremental refresh I will add 8 years to all dates)

Setting up incremental refresh for this dataset

Open a new pbix file and select Get data in the ribbon – filter the connectors by big to find the Google BigQuery connector.

You will then be prompted to signin to BigQuery and then presented by the projects you have access to.

When you select the tables or views you want to import – hit “Transform data”

As the BigQuery connector also supports direct query you will be prompted whether you want to use import or direct Query mode – in this case we want to import the data – so I select Import.

This will open the Power Query editor and display all the columns and the data for the first 1000 records

Native Query greyed out – Why ?

If you have worked with query folding before you properly know that you can view the native query constructed via the shortcut menu “View Native Query”

But when we use the BigQuery connector it is greyed out – but let’s check whether it does fold or not.

You can see the query constructed in Query History pane in Google Console for BigQuery

And the query folds nicely – at least until we do Power Query transformations.

Let’s setup DateRanges for incremental refresh

In order to do a incremental refresh we need to create two parameters in our PBIX – called RangeStart and RangeEnd

Remember the parameters must be Date/Time Type.

In our flights query – we then add a filter to the date column.

But we can’t really select the parameters

This is because the date column is the datatype – Date and not Date/Time.

So, let us change the date columns datatype to Date/Time

The rows in the table should be filtered like this

REMEMBER that only one of the filters must have the “equal to” – otherwise you risk two partitions include the same period.

The Native query is still grey.

But when we check the query history in BigQuery we can see that it is nicely folded back to the engine

Let us build the simple datamodel and setup the incremental refresh

Clicking the Close & Apply in the editor we start the import of the filtered information

After importing the data, I create a simple measure that calculates the number of records in the flights table.

And we can start visualizing the data even more using the other fields in the dataset

If we publish the dataset now – the only data we will ever get in model is for January 2003 (2011) – but we want more data so we have to setup incremental refresh so when the data is refreshed in the service it will contain all the data for the period we want in the model.

!!!! – As the refresh uses the date/time value of the current date I have modified the data by adding 8 years to all the records so it (And even that is folded back)

I then select Incremental refresh for the flights table by selecting Incremental refresh in the shortcut menu.

This opens the following dialog.

Don’t worry about the warning in the top – this is due to “native query” being greyed out as the connector apparently doesn’t return the information to Power Query – but as you have seen previously we do know that the query is folded.

I set the model to contain

8 years of date in the column “date” and the model should only refresh the latest rows within the last month.

When you click Apply All you cannot really see nothing happened – but if you open the dialog again you will see it contains the settings you have set.

Publishing the model to the service

When the model is published it

The dataset will before refresh the first time will only contain the data, we have imported in the pbix

We need to set the credentials in the data

And set a time of day where you want to refresh the dataset.

We can activate the first refresh by clicking the refresh button next to the dataset name

Or you can republish the pbix file to the service – this will also activate a refresh. This refresh didn’t happen with the first publish as you need to specify the credentials to BigQuery.

Successful refresh

After the refresh has finished, we can see the total number of rows has increased to 56.007.003.

And the period included is eight years plus current year and as we selected 1 month of update the dataset will include all the dates in may 2020. In your case you might only have data until Today() and this will then be included in your dataset.

The first refresh took almost 20 minutes.

But when we hit refresh again –

The refresh only took 1 min and 47 seconds as it only had to update the data for May 2020.

No premium

All of this is done without a Premium capacity – 👍

With a premium capacity you get a XMLA endpoint with which you can do even more – this is perhaps another blogpost worthy – so leave a comment if you want me to write a post about this.

Power On !

Update – check out this follow up post from Daniel Lennartson – https://www.linkedin.com/pulse/partitioned-tables-bigquery-power-bi-incremental-daniel-lennartsson/?trackingId=maXDCJvRSgaan6uAZHlBqg%3D%3D – describing how to use partitioned table to limit the cost.

#PowerQuery – Using SQL scalar value functions in PowerQuery

Currently I am working with a project where we extract data from a SQL server – some of the business logic is built into scalar value functions (documentation).

Now the magic of PowerQuery enables us to reuse these functions within PowerQuery and Query Folding is supported – more about this at the end of this post.

Let’s step through how to do this

I will use the famous AdventureWorksDW2014 database.

In my database I have added a function called IsoweekwithYear

This function uses a date a parameter and returns an Integer representing the year and the week. In this example it returns 202018 as 28 april 2020 is in iso week 18 of 2020.

In the Power BI Desktop we can now connect to the SQL server

And in this example I pick the DimDate table from the database

And if we scroll down you will also see the functions, we have in the database

Notice that the data preview pane will show the function and the parameters it can take.

Then I click Transform Data and the Power Query Editor will appear

By default it will Invoke the function and in this case it will return Null

Notice the Applied Steps in the Query Settings pane

If you remove the last step where the function is invoked

You will see it as a function that has a parameter called @DATE which is optional.

Now let’s use the function

In the Query DimDate we can the call the function – as we now we need to call the function using the column FullDateAlternateKey as the parameter I select this column first and then click Invoke Custom Function via the Add Column section in the ribbon.

We can then choose the Function Query IsoweekWithYear and use the column name as the parameter value

This will add a new column at the end of my query

Containing the ISOWeek – yuppieee 🙂

Is Queryfolding supported ?

It sure is

As we can see as we can select “View Native Query” and see that the SQL statement is added.

This means that the server does the work instead of the M-engine.

What’s next

I must admit that I wasn’t aware of this possibility before – and I feel like it opens up a lot of scenarios to reuse already built and tested functions that your database developers have built.

Let me know if you find this useful as well by leaving a comment or give the post a like

Happy querying !!!

#PowerQuery – Easily Copy all queries from a PBIX to Excel and vice versa

Today I discovered a way to easily copy all queries (including the groups) from a PBIX file to Excel without entering the power query editor in Excel.

In my case I needed to copy the data returned by a query and find it some much easier to get the data using Excel instead of generating a csv file from a visual in Power BI Desktop and importing that into Excel.

My PBIX contains some queries to extract weather data from different stations in Sweden.

These are grouped to create a logic structure of the queries.

Select all queries and copy them.

Now open a blank workbook and click the Queries & connections in the Data tab in the ribbon

And the Queries & Connections panels opens

Now right click in the panel

And select Paste – and the queries, parameters and load settings for the queries are also included.

For the table you want to load you now need to specify whether you want to load to the table to the datamodel or to a worksheet – default will be worksheet.

If you collapse all your groups in the panel and select all of them you can also copy them

And paste them into the Power Query editor in the Power BI Desktop

And after paste

Hope you this can help you too and stay Queryious

#PowerQuery – Avoid an extra step by not clicking OK

Just wanted to share a small tip in PowerQuery where an extra step is added to your queries even though you might not want it.

In this example I have imported the FactInternetSales from the AdventureWorks database and have filtered it with orderdatekey after a specific date.

Now I just want to check the different product keys and I click the dropdown next to the field name.

If you do not specify a filter but just click OK (the default button) you will get an extra step inserted in the Applied Steps

= Table.SelectRows(#”Filtered Rows”, each true)

Which just means that all rows should be returned from the previous step.

If you press Cancel instead the step won’t be added.

So, keep your queries nice and clean and click Cancel instead of OK and stay querious 🙂

PS – and remember to rename the steps to keep them as explanatory as possible.

DAX Time intelligence and the 29th of February – #PowerBI

Yesterday I visited a client and was asked – how do the time intelligence functions handle the fact that February has 29 days in 2020.

Well – in fact there was a few surprises depending on what you select from you date table.

Let’s look as some examples – I will use the following Internet Sales Amount from the years 2011-2013 from Adventure Work Database where we in February 2012 have 29 days.

As you can see, we have the year 2012 where we have 29 days.

SAMEPERIODLASTYEAR()

In order to calculate Internet Sales Amount LY – I use the following

Internet Sales Amount LY = CALCULATE([Internet Sales Amount];SAMEPERIODLASTYEAR(DimDate[Date]))

Which works fine

But notice the behavior if we put dates or days numbers on the rows

SURPRISE – Internet Sales Amount LY will show the value for the 28th of February 2011 instead of a blank value as you perhaps would expect

If you select year 2013 we will see this

The 29 of feb 2012 will “disappear” but the total for February will include the number.

DATEADD() – last year

If we use the function DATEADD instead – it will work exactly the same way.

IAS LY = CALCULATE([Internet Sales Amount];DATEADD(DimDate[Date];-1;YEAR))

DATEADD() – same day last year

If you want to compare the same Saturday (the 29th of feb 2020 is a Saturday) last year – which is the 2nd of march we can do this by using the same DATEADD function but with different parameters

IAS LY same weekday = CALCULATE([Internet Sales Amount];DATEADD(DimDate[Date];-364;DAY))

This will compare the same day 52 weeks ago (52 * 7 = 364) and there by giving us the value from the 29th of feb 2012 on the 27th of feb 2013.

DATESMTD()

Now what about the function DATESMTD()

ISA MTD = CALCULATE([Internet Sales Amount];DATESMTD(DimDate[Date]))

ISA MTD LY = CALCULATE([Internet Sales Amount LY];DATESMTD(DimDate[Date]))

These functions will calculate the running total for the month for the given day number

Notice that the ISA MTD works fine in 2012 for the 29th and the LY measure will show the same result for the 28th and 29th in 2012 – and in 2013 it will actually for the 28th show the sum of both the 28 and 29th 

Conclusion

You might find that some users find it difficult to understand how the calculations works when the look at dates instead of month totals especially in the case where they will get the same value for LY on both the 28 and 29th in 2012/2020.

If you compare cross years on calendar dates I find the result that SAMEPERIODLASTYEAR() returns makes better sense than leaving it empty/blank but what do you or your users think. Let me know in the comments.

Hope you find this little walkthrough useful.

And remember to ALWAYS use a datetable in your model if you do time intelligence calculations in DAX.

#PowerBI – Copy a measure in the desktop – Why it doesn’t always work – mystery solved

Well – creating measures in the Power BI desktop is usually doing a lot of copying and pasting of measures.

And unfortunately, we haven’t yet gotten a Copy measure in the shortcut menu when we right click a measure

Here is an idea you can vote for if you would find it useful as well – https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/13219620-duplicate-measure-and-format-copy

So we end up copying the formula from text in the formula bar

And click new measure and Paste it into the formula bar

 

But 8 of 10 times nothing is pasted (at least when I select) – WHY ???

 

And now I finally took me the time to find out why –

Notice the colour of the selection in the picture above compare to this

If the colour is darker blue you can copy and paste but if its lighter blue – you can’t – and this is properly because you use your mouse to select the formula and if you release the mouse outside of the formula bar area – the selection will be light blue and when you then press CTRL + C to copy the formula text isn’t copied to the clipboard.

Solution

So, either make sure you don’t move your pointer outside the area or use CTRL + A to select all and then press CTRL + C to copy the formula.

 

Hope this can save you a couple of clicks and please vote for the idea – https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/13219620-duplicate-measure-and-format-copy – this also includes the idea that the formatting of the value should be included.

 

Power On!

 

 

 

 

 

#PowerBI – A quicker way to set Summarize to none for all your fields in your date dimension (or others)

When designing your model you will in most cases need a date dimension and one of the things you always should do is set the Summarization of the number columns in your table to none instead of the default Sum – to avoid this when you insert for instance Calendar year in your visuals

So set the summarization to Don’t summarize to avoid this.

But you have to do this for all the number columns in your date table – –

but if you use the Model view instead –

We can multi select columns and set it for all at once

In the model view we have to select None to avoid summarization of number fields – and voila all fields in my date table is set to don’t summarize

Hope this can help you as well.