How to Ingest Data from Any Source and Store It in a Lakehouse Using Dataflow Gen2 (Microsoft Fabric) 

Introduction 

In modern data engineering, organizations deal with multiple data sources such as Excel files, SQL databases, APIs, and cloud storage systems. The challenge is not just collecting the data, but transforming and storing it efficiently for analytics. 
 
Microsoft Fabric simplifies this process using Dataflow Gen2, a low-code data ingestion and transformation tool. 
 
In this blog, we walk through how to ingest data from any source and store it in a Lakehouse using Dataflow Gen2. 

What is Dataflow Gen2? 

Dataflow Gen2 is a cloud-based data transformation and ingestion tool inside Microsoft Fabric. It is built on Power Query Online and enables users to connect to multiple data sources, apply transformations, and load data into a Lakehouse or Warehouse. 
 
It is ideal for both business users and data engineers who want a scalable, low-code ingestion solution. 

What is a Lakehouse? 

A Lakehouse in Microsoft Fabric is a unified storage architecture that combines the flexibility of a Data Lake with the performance of a Data Warehouse. 
 
It supports structured and semi-structured data, Delta tables, SQL querying, notebook integration, and Power BI connectivity. 
 
When data is loaded into a Lakehouse via Dataflow Gen2, it is stored in Delta format, ensuring reliability and performance. 

Step 1: Create a Lakehouse 

1. Open Microsoft Fabric 
2. Navigate to your Workspace 
3. Click New → Lakehouse 
4. Provide a name and create it 
 
This Lakehouse will act as your storage destination. 
 

Step 2: Create a Dataflow Gen2 

1. In the same workspace, click New → Dataflow Gen2 
2. Provide a name 
3. The Power Query Online editor will open 
 
This is where data transformation happens. 
 
 

Step 3: Connect to a Data Source 

Dataflow Gen2 supports a wide range of sources including Excel files, CSV files, SQL Server, Azure SQL Database, SharePoint, and Web APIs. 
 
Select the required source, configure authentication, and preview the dataset. 
 
 

Step 4: Transform the Data 

Using Power Query Online, you can remove unnecessary columns, rename columns, change data types, split or merge columns, add conditional logic, and filter rows. 
 
These transformations ensure clean and structured data before loading. 

Step 5: Configure Destination 

1. Click Add Destination 
2. Select Lakehouse 
3. Choose your Lakehouse 
4. Select Create new table, Append, or Replace 
5. Map the columns 
6. Save settings 
 
Without destination configuration, the data will not be stored. 
 
 

Step 6: Publish and Refresh 

1. Click Save 
2. Click Publish 
3. Run Refresh 
 
The data will now be stored inside the Lakehouse as a Delta table. 

What Happens Behind the Scenes? 

When Dataflow Gen2 loads data into the Lakehouse: 
– Data is stored in Delta format 
– ACID transactions are enabled 
– Schema enforcement is applied 
– Metadata is maintained 
– SQL endpoint automatically becomes available 
 
This ensures reliability, scalability, and performance for enterprise workloads. 

Conclusion 

Dataflow Gen2 in Microsoft Fabric provides a powerful yet simple way to ingest data from any source and store it inside a Lakehouse. 
 
Without writing a single line of code, organizations can connect to multiple sources, transform data, store it in scalable Delta tables, and automate workflows. 
 
Mastering Dataflow Gen2 is essential for building modern data platforms in Microsoft Fabric. 

Facebook
Twitter
LinkedIn

Addend Analytics is a Microsoft Gold Partner based in Mumbai, India, and a branch office in the U.S.

Addend has successfully implemented 100+ Microsoft Power BI and Business Central projects for 100+ clients across sectors like Financial Services, Banking, Insurance, Retail, Sales, Manufacturing, Real estate, Logistics, and Healthcare in countries like the US, Europe, Switzerland, and Australia.

Get a free consultation now by emailing us or contacting us.