Incremental refresh in #PowerBI with data from BigQuery

Back in February 2020 the Incremental refresh in Power BI became GA (link) and supported in Power BI Pro as well.

I have not had a chance to use this a lot yet but in a recent project for a client we were working with a large dataset 500+ mill. rows of historical search data stored in Big Query (yes Google BigQuery 🙂 ).

One of the prerequisites for Incremental Refresh to work is that your Power Query can be query folded – ie. sent back to the source system as a native query from Power Query so all can be folded into a query understood by the source.

You will find the documentation here.

But let me take you through the steps if you want to do this with a query against Google BigQuery.

The data

I am using the flight dataset that you are guided through creating when you create a new project in BigQuery.

It contains 7,95 gb of data and 70.588.485 rows with 10 years of flight data from jan 2002 until dec 2012. (To work with incremental refresh I will add 8 years to all dates)

Setting up incremental refresh for this dataset

Open a new pbix file and select Get data in the ribbon – filter the connectors by big to find the Google BigQuery connector.

You will then be prompted to signin to BigQuery and then presented by the projects you have access to.

When you select the tables or views you want to import – hit “Transform data”

As the BigQuery connector also supports direct query you will be prompted whether you want to use import or direct Query mode – in this case we want to import the data – so I select Import.

This will open the Power Query editor and display all the columns and the data for the first 1000 records

Native Query greyed out – Why ?

If you have worked with query folding before you properly know that you can view the native query constructed via the shortcut menu “View Native Query”

But when we use the BigQuery connector it is greyed out – but let’s check whether it does fold or not.

You can see the query constructed in Query History pane in Google Console for BigQuery

And the query folds nicely – at least until we do Power Query transformations.

Let’s setup DateRanges for incremental refresh

In order to do a incremental refresh we need to create two parameters in our PBIX – called RangeStart and RangeEnd

Remember the parameters must be Date/Time Type.

In our flights query – we then add a filter to the date column.

But we can’t really select the parameters

This is because the date column is the datatype – Date and not Date/Time.

So, let us change the date columns datatype to Date/Time

The rows in the table should be filtered like this

REMEMBER that only one of the filters must have the “equal to” – otherwise you risk two partitions include the same period.

The Native query is still grey.

But when we check the query history in BigQuery we can see that it is nicely folded back to the engine

Let us build the simple datamodel and setup the incremental refresh

Clicking the Close & Apply in the editor we start the import of the filtered information

After importing the data, I create a simple measure that calculates the number of records in the flights table.

And we can start visualizing the data even more using the other fields in the dataset

If we publish the dataset now – the only data we will ever get in model is for January 2003 (2011) – but we want more data so we have to setup incremental refresh so when the data is refreshed in the service it will contain all the data for the period we want in the model.

!!!! – As the refresh uses the date/time value of the current date I have modified the data by adding 8 years to all the records so it (And even that is folded back)

I then select Incremental refresh for the flights table by selecting Incremental refresh in the shortcut menu.

This opens the following dialog.

Don’t worry about the warning in the top – this is due to “native query” being greyed out as the connector apparently doesn’t return the information to Power Query – but as you have seen previously we do know that the query is folded.

I set the model to contain

8 years of date in the column “date” and the model should only refresh the latest rows within the last month.

When you click Apply All you cannot really see nothing happened – but if you open the dialog again you will see it contains the settings you have set.

Publishing the model to the service

When the model is published it

The dataset will before refresh the first time will only contain the data, we have imported in the pbix

We need to set the credentials in the data

And set a time of day where you want to refresh the dataset.

We can activate the first refresh by clicking the refresh button next to the dataset name

Or you can republish the pbix file to the service – this will also activate a refresh. This refresh didn’t happen with the first publish as you need to specify the credentials to BigQuery.

Successful refresh

After the refresh has finished, we can see the total number of rows has increased to 56.007.003.

And the period included is eight years plus current year and as we selected 1 month of update the dataset will include all the dates in may 2020. In your case you might only have data until Today() and this will then be included in your dataset.

The first refresh took almost 20 minutes.

But when we hit refresh again –

The refresh only took 1 min and 47 seconds as it only had to update the data for May 2020.

No premium

All of this is done without a Premium capacity – 👍

With a premium capacity you get a XMLA endpoint with which you can do even more – this is perhaps another blogpost worthy – so leave a comment if you want me to write a post about this.

Power On !

Update – check out this follow up post from Daniel Lennartson – https://www.linkedin.com/pulse/partitioned-tables-bigquery-power-bi-incremental-daniel-lennartsson/?trackingId=maXDCJvRSgaan6uAZHlBqg%3D%3D – describing how to use partitioned table to limit the cost.

10 thoughts on “Incremental refresh in #PowerBI with data from BigQuery

  1. Hi Erik,
    This is really useful to set up an incremental refresh. I have data imported from 10 different tables from big query which are appended to 5 and unpivoted in Power BI(around 50,00,000 rows before unpivot). When I try to refresh the data, it takes around 8hrs on the desktop version and it fails on power BI service because of timeout. Can you suggest me on how to make the refresh faster? Will it be fast if I unpivot and append on bigquery server instead of power BI?

    1. Hi Swetha – an unpivot with 50 mio rows will be slow – very slow and is not recommendable in Power Query – so first of all rethink why you need to unpivot and if needed do it in BQ.

      1. Retreiving big data out of Bigquery through billable, SQL query is slow, by nature. Incremental refresh will help to reduce daily refresh time, but won’t help each time you have to perform a full refresh. Keep this in mind ! Aggregate as soon as possible, limit the number of columns, so that size of data transfered from BQ to Power BI gets lower.

  2. I have dates as my column headers and to use them in visuals, I have to unpivot the data which is increasing the data from 50 million rows to more than 30 billion rows and the data load is still slow. These rows will also increase drastically with daily refresh. Refresh is very very slow and timesout on service. Does Power BI have some limitations in terms of rows or the data refresh time from bigquery? Is there some workaround for this?

    1. hi Swetha – this sounds like you need a date table instead – dates as column headers shouldnt be needed. you are welcome to create a dummy dataset in an excel file plus the visual you want and i will create a mockup for you

  3. Hi Erik great post. Have you ever encountered measures not calculating correctly once BigQuery Incremental refresh is made. The weird thing is the data is in powerbi web if I cut table by date and Measure but the total is off. When I publish and refresh in Desktop measures calculate correctly

  4. Pingback: Power BI Workspace Roadmap, DirectQuery dataflows and more… (Roundup | June 1, 2020) – Datalytical

  5. Hi Erik,
    Just want to know if the incremental refresh from BigQuery cost anything in GCP or in Power BI.
    Thanks in advance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s