Back in February 2020 the Incremental refresh in Power BI became GA (link) and supported in Power BI Pro as well.
I have not had a chance to use this a lot yet but in a recent project for a client we were working with a large dataset 500+ mill. rows of historical search data stored in Big Query (yes Google BigQuery 🙂 ).
One of the prerequisites for Incremental Refresh to work is that your Power Query can be query folded – ie. sent back to the source system as a native query from Power Query so all can be folded into a query understood by the source.
You will find the documentation here.
But let me take you through the steps if you want to do this with a query against Google BigQuery.
I am using the flight dataset that you are guided through creating when you create a new project in BigQuery.
It contains 7,95 gb of data and 70.588.485 rows with 10 years of flight data from jan 2002 until dec 2012. (To work with incremental refresh I will add 8 years to all dates)
Setting up incremental refresh for this dataset
Open a new pbix file and select Get data in the ribbon – filter the connectors by big to find the Google BigQuery connector.
You will then be prompted to signin to BigQuery and then presented by the projects you have access to.
When you select the tables or views you want to import – hit “Transform data”
As the BigQuery connector also supports direct query you will be prompted whether you want to use import or direct Query mode – in this case we want to import the data – so I select Import.
This will open the Power Query editor and display all the columns and the data for the first 1000 records
Native Query greyed out – Why ?
If you have worked with query folding before you properly know that you can view the native query constructed via the shortcut menu “View Native Query”
But when we use the BigQuery connector it is greyed out – but let’s check whether it does fold or not.
You can see the query constructed in Query History pane in Google Console for BigQuery
And the query folds nicely – at least until we do Power Query transformations.
Let’s setup DateRanges for incremental refresh
In order to do a incremental refresh we need to create two parameters in our PBIX – called RangeStart and RangeEnd
Remember the parameters must be Date/Time Type.
In our flights query – we then add a filter to the date column.
But we can’t really select the parameters
This is because the date column is the datatype – Date and not Date/Time.
So, let us change the date columns datatype to Date/Time
The rows in the table should be filtered like this
REMEMBER that only one of the filters must have the “equal to” – otherwise you risk two partitions include the same period.
The Native query is still grey.
But when we check the query history in BigQuery we can see that it is nicely folded back to the engine
Let us build the simple datamodel and setup the incremental refresh
Clicking the Close & Apply in the editor we start the import of the filtered information
After importing the data, I create a simple measure that calculates the number of records in the flights table.
And we can start visualizing the data even more using the other fields in the dataset
If we publish the dataset now – the only data we will ever get in model is for January 2003 (2011) – but we want more data so we have to setup incremental refresh so when the data is refreshed in the service it will contain all the data for the period we want in the model.
!!!! – As the refresh uses the date/time value of the current date I have modified the data by adding 8 years to all the records so it (And even that is folded back)
I then select Incremental refresh for the flights table by selecting Incremental refresh in the shortcut menu.
This opens the following dialog.
Don’t worry about the warning in the top – this is due to “native query” being greyed out as the connector apparently doesn’t return the information to Power Query – but as you have seen previously we do know that the query is folded.
I set the model to contain
8 years of date in the column “date” and the model should only refresh the latest rows within the last month.
When you click Apply All you cannot really see nothing happened – but if you open the dialog again you will see it contains the settings you have set.
Publishing the model to the service
When the model is published it
The dataset will before refresh the first time will only contain the data, we have imported in the pbix
We need to set the credentials in the data
And set a time of day where you want to refresh the dataset.
We can activate the first refresh by clicking the refresh button next to the dataset name
Or you can republish the pbix file to the service – this will also activate a refresh. This refresh didn’t happen with the first publish as you need to specify the credentials to BigQuery.
After the refresh has finished, we can see the total number of rows has increased to 56.007.003.
And the period included is eight years plus current year and as we selected 1 month of update the dataset will include all the dates in may 2020. In your case you might only have data until Today() and this will then be included in your dataset.
The first refresh took almost 20 minutes.
But when we hit refresh again –
The refresh only took 1 min and 47 seconds as it only had to update the data for May 2020.
All of this is done without a Premium capacity – 👍
With a premium capacity you get a XMLA endpoint with which you can do even more – this is perhaps another blogpost worthy – so leave a comment if you want me to write a post about this.
Power On !
Update – check out this follow up post from Daniel Lennartson – https://www.linkedin.com/pulse/partitioned-tables-bigquery-power-bi-incremental-daniel-lennartsson/?trackingId=maXDCJvRSgaan6uAZHlBqg%3D%3D – describing how to use partitioned table to limit the cost.