Skip to Content

Power BI Datamart Vs. Dataflow Vs. Dataset

Blog | January 2, 2023 | By Sushmitha M

What is Power BI?

Power BI is an interactive data visualization software product developed by Microsoft with a primary focus on business intelligence. It is part of the Microsoft Power Platform. Power BI is a collection of software services, apps, and connectors that work together to turn unrelated sources of data into coherent, visually immersive, and interactive insights. Data may be input by reading directly from a database, webpage, or structured files such as spreadsheets, CSVXML, and JSON.

What is Dataflow?

Power BI Dataflow is the data transformation component in Power BI. It is a Power Query process that runs in the cloud, independent of Power BI report and dataset, and stores the data into Azure Data Lake storage (or Dataverse). Dataflow is not only limited to Power BI; it can be created and used in other services such as Power Platform (Power Apps). Dataflow will give you both the transformation engine of Power Query plus the storage option. Dataflow will give you a re-usable ETL (Extract-Transform-Load) component.

What is Dataset?

Power BI Dataset is the object that contains the connection to the data source, data tables, the data itself, the relationship between tables, and DAX calculations. Usually, Power BI Dataset is usually hidden from the Power BI Desktop view but can be easily seen in the Power BI service. Power BI Dataset is used commonly when sharing a model between multiple visualization objects (such as multiple Power BI reports, paginated reports, and even Excel reports). The ability to use a shared Dataset will give you a re-usable modeling layer in Power BI.

What is Datamart?

Power BI Datamart is a recently added component to the Power BI ecosystem. Power BI Datamart is a combination of Dataflow, an Azure SQL Database (acting like a data warehouse), and Dataset. Power BI Datamart also comes with a unified editor in the Power BI Service. Power BI Datamart is more like a container around other components of Power BI (Dataflow, Dataset, and Azure SQL Database).

Features of Dataflow:

1. Dataflow is the Power Query Component

Dataflow is the Power Query component. Dataflow is decoupling the Power Query logic and code from the Power BI file so that it can be used in multiple files. We will have the ability to get data from many different data sources, do the transformations using Power Query online and get the data in the shape we want, set a scheduled refresh for it, and load it into storage options (Azure Data Lake storage, or Dataverse)

2. Dataflow is not just for Power BI

Dataflow is the only component that can be created outside Power BI. We do not need a business intelligence or data analysis requirement to use Dataflow. Sometimes we may need Dataflow for just data integration; For example, we may want to get data from some source systems, transform it and store it into data storage. This might be for other applications to use. Dataflow in Power BI might be used for data analysis purposes, but we can also create dataflow in Power Platform under the Power Apps portal.

3. Dataflow is the ETL Layer

Dataflow is the Data Transformation layer in your Power BI implementation. The terminology for this layer is ETL (Extract, Transform, Load). This will extract data from data sources, transform the data, and load it into the CDM.

4. Dataflow Feeds Data into the Dataset

The result of dataflow will be fed into a dataset for further modeling; a dataflow by itself is not a visualization-ready component.

5. Dataflow Access the Dataflow Directly

Unless we use a linked entity or computed entity, a Dataflow usually gets data directly from the data source.

6. Dataflow Developer Needs Power Query Skills

One of the reasons to use dataflows and shared datasets is to decouple the layers, so we have multiple developers building the Power BI solution at the same time. In such an environment, the skillset needed for a Dataflow developer is all about Power Query and how to build Star-Schema, etc. No DAX or Visualization skills are required for a Dataflow developer.

7. Users of Dataflow are Data Modelers

Dataflow’s result can be used for data modelers. It is not a great approach to give the output of Dataflow to report visualizers. Because the Dataflow still has to be loaded into a model with proper relationships and calculations added to it.

8. Dataflow solves the problem of having multiple versions of the same table in different PBIX files.

Using the Dataflow, you reduce the need to copy and paste your Power Query script into other files. You can re-use a table in multiple files.

Features of Dataset:

1. Dataset is a replacement for DAX Calculations and Relationships

Using a shared dataset, we can re-use the DAX calculations and relationships which we have created for one model in the other Power BI files. If we want to re-use the DAX measures, calculated columns and tables, the hierarchies, field-level formatting, and relationships defined in model for multiple files, then Shared Dataset does it. We can have multiple reports connecting to it and re-using the data model.

2. Why not DirectQuery from the source instead of Dataset?

Power BI Dataset uses in-memory engine storage for the data. The in-memory storage for the data ensures the best performance in the report and visualization, as the interaction in the report would be the fastest. It also brings a powerful calculation language called DAX to help with some analytical requirements and calculations. So even if we already have a data warehouse, using a Dataset is highly recommended.

3. Dataset is the Modeling Layer

Dataset is the layer of all the calculations and modeling. It will get data from the Dataflow (or other sources) and build an in-memory data model using the Power BI (Analysis Services) engine.

4. Dataset Feeds Data into Visualizations

Dataset Feeds Data into Visualizations because the dataset is an in-memory model built and ready for visualization, the result that usually used directly to build a visualization.

5. Dataset Can Access the Data from the Dataflow

Although a Dataset can directly get data from a data source, it is a best practice that a shared Dataset gets the data from Dataflows. This is to have a multi-developer implementation of Power BI.

6. Dataset Developer Needs DAX and Modeling Skills

On the other hand, the Dataset developer needs to know everything about the relationships in Power BI and calculations in Power BI using DAX. Although the Dataset developer can know Power Query and visualization, it is not his/her primary skill.

7. Users of Dataset are Report Visualizers

The result of a dataset is ready for report visualizers. They can have a live connection simply to the shared Dataset and build their visualizations from it.

8. Dataset solves the problem of having multiple versions of the same DAX code in different PBIX files.

Using a shared dataset, you can have multiple reports using the same calculations and data model without duplicating the code.

Dataflow Vs. Dataset

Dataflow and Dataset are not the replacement of each other, they are the compliment for each other.

What about Datamart?

Power BI Datamart is more like a container of components rather than a single object itself. When we create a Power BI Datamart, we are creating a Dataflow, an Azure SQL Database, and a Dataset. This means that Datamart is already having all the benefits mentioned for the Dataflow and Dataset in it. It also has an extra component: storing data in the Azure SQL Database. After processing by the Dataflow, the data is loaded into the Azure SQL Database. Some call this a data warehouse, and some may even call this a data mart. But Power BI Datamart includes all of these three components: Dataflow, Azure SQL Database, and Dataset.

Power BI Datamart components

Power BI Datamart also brings the unified web UI to build and manage all these components. With the appearance of the Datamart, the Multi-developer architecture of Power BI looks more like the below

Is Dataflow getting replaced by Datamart?

No. Certainly not. Dataflow is a component by itself. As mentioned earlier, we can build and use a Dataflow without needing a BI solution. Datamart is normally useful when we are building a BI solution. We may just want to integrate some tables into storage and re-use them in other applications; in that case, the Dataflow by itself can be used.

Another use case of Dataflow by itself is that even if we have a Datamart, you may still want to implement multiple layers of Dataflow for staging and transformation. Multiple layers of Dataflow are an effective technique in data source isolation and re-using Dataflow components. 

Is Dataset getting replaced by Datamart?

The answer to this question is also No. Although it is now easier to create a Dataset from the unified UI of the Datamart, still that won’t reduce the need for the Dataset to be a separate component of its own. There are plenty of use-cases for a Dataset as a component of its own. Imagine we are implementing an architecture in which the data transformation is done using another service (such as Azure Data Factory), and the data warehouse is in Azure Synapse. We may and can still use Power BI Datasets to build the data model and calculations on top of that without building a full Datamart.

Another use case is that even if you use a Power BI Datamart, we may still create chained Datasets on top of your existing Dataset. These chained Datasets are DirectQuery to Power BI Dataset (which in this case is part of a Datamart) but can have other data sources combined in it. Chained Datasets are a very useful method of working with self-service data analysts in an enterprise organization.

Which one should I use? Datamart, Dataflow, or Dataset?

Now is time for the million-dollar question; which one of these components should you use? Each of these comes with some benefits, as you see above. Let’s answer that through a scenario.

Subhas is a BI developer who is building a new BI solution. His implementation includes stages such as  getting data from the source, doing the transformation, loading it into a data warehouse, writing calculations in the model, and visualization. Power BI Datamart enables him to build most of that in one unified structure but still a multi-layered approach that can be easily maintained in the future.

After some time, Subhas realizes that the data transformation side of his implementation is heavy. He wants the data transformation to be isolated from the data source so that just in case the source changes, his solution still works with minimal changes required. So, he uses more Dataflows as staging and transformation Dataflows in his architecture.

Subhas has some other colleagues who are business analysts in other departments. They want to use the Dataset that Subhas provided as the main source but add more data sources into that and some calculations too. They create chained Datasets on top of that.

The above scenario is a scenario that uses all of these three components in an architecture. Choosing which components, you need will be mainly based on what re-usable components you have and where you want to use them.

author image
About the Author
Associate BI Analyst with experience in Data Visualization and analysis. Working on tools like Power BI and Tableau.
Sushmitha M | Associate BI Analyst | USEReady
Back to top