Introduction
In modern data engineering, organizations deal with multiple data sources such as Excel files, SQL databases, APIs, and cloud storage systems. The challenge is not just collecting the data, but transforming and storing it efficiently for analytics.
Microsoft Fabric simplifies this process using Dataflow Gen2, a low-code data ingestion and transformation tool.
In this blog, we walk through how to ingest data from any source and store it in a Lakehouse using Dataflow Gen2.
What is Dataflow Gen2?
Dataflow Gen2 is a cloud-based data transformation and ingestion tool inside Microsoft Fabric. It is built on Power Query Online and enables users to connect to multiple data sources, apply transformations, and load data into a Lakehouse or Warehouse.
It is ideal for both business users and data engineers who want a scalable, low-code ingestion solution.
What is a Lakehouse?
A Lakehouse in Microsoft Fabric is a unified storage architecture that combines the flexibility of a Data Lake with the performance of a Data Warehouse.
It supports structured and semi-structured data, Delta tables, SQL querying, notebook integration, and Power BI connectivity.
When data is loaded into a Lakehouse via Dataflow Gen2, it is stored in Delta format, ensuring reliability and performance.
Step 1: Create a Lakehouse
1. Open Microsoft Fabric
2. Navigate to your Workspace
3. Click New → Lakehouse
4. Provide a name and create it
This Lakehouse will act as your storage destination.
Step 2: Create a Dataflow Gen2
1. In the same workspace, click New → Dataflow Gen2
2. Provide a name
3. The Power Query Online editor will open
This is where data transformation happens.
Step 3: Connect to a Data Source
Dataflow Gen2 supports a wide range of sources including Excel files, CSV files, SQL Server, Azure SQL Database, SharePoint, and Web APIs.
Select the required source, configure authentication, and preview the dataset.
Step 4: Transform the Data
Using Power Query Online, you can remove unnecessary columns, rename columns, change data types, split or merge columns, add conditional logic, and filter rows.
These transformations ensure clean and structured data before loading.
Step 5: Configure Destination
1. Click Add Destination
2. Select Lakehouse
3. Choose your Lakehouse
4. Select Create new table, Append, or Replace
5. Map the columns
6. Save settings
Without destination configuration, the data will not be stored.
Step 6: Publish and Refresh
1. Click Save
2. Click Publish
3. Run Refresh
The data will now be stored inside the Lakehouse as a Delta table.
What Happens Behind the Scenes?
When Dataflow Gen2 loads data into the Lakehouse:
– Data is stored in Delta format
– ACID transactions are enabled
– Schema enforcement is applied
– Metadata is maintained
– SQL endpoint automatically becomes available
This ensures reliability, scalability, and performance for enterprise workloads.
Conclusion
Dataflow Gen2 in Microsoft Fabric provides a powerful yet simple way to ingest data from any source and store it inside a Lakehouse.
Without writing a single line of code, organizations can connect to multiple sources, transform data, store it in scalable Delta tables, and automate workflows.
Mastering Dataflow Gen2 is essential for building modern data platforms in Microsoft Fabric.
