Azure Data Factory
Azure Data Factory (ADF) is a cloud-based data integration service provided by Microsoft as part of the Microsoft Azure ecosystem. It allows you to create, schedule, manage, and monitor workflows that move and transform data between different sources and destinations.
Scenario
In the process of loading data into the data warehouse, the dimension tables are loaded before the fact table. When loading fact data from a CSV or Excel file, a situation may arise where the file contains values instead of IDs. These IDs are typically system-generated and lack business context. In our fact table structure, we prefer storing the IDs from the associated dimension table rather than the provided values.
The blog demonstrates how to handle this scenario in a data flow, showcasing how to reference the foreign key value in our data flow when ingesting data.
Structure of Dimension Tables
- Month (‘month’,’month_id’)
- Spend_type(‘spend_type’,’spend_type_id’)
- Customers(‘FullName’,’FirstName’,’LastName’,’CustomerId’)
Structure of Excel/CSV file
- Monthly_avg_csv(‘FirstName’,’LastName’,’month’,’spend_type’,’ Avg Amount Spent’)
Structure of Fact Table
- Avg_Monthly_Spends(‘avg_monthly_spend_id’, ‘CustomerID’ ,’spent_type_id’, ‘month_id’, ‘Avg Amount Spent’)
Steps to Handling Value-to-ID Mapping for Fact Tables
- Create the respective facts and dimensions table and insert sample values
- Create the sample CSV file with values present in the dimension table.
- Upload the file to the data lake storage
- Create a link service Azure blob storage and a dataset with CSV file format support
- Create a link service Azure SQL DB and data set to SQL DB
- Create a pipeline in Azure Data Factory and add data flow activity.
- Next click on the + icon to create a new data flow
- Create a source in data flow to get data from CSV file by providing an appropriate dataset and configuring source options according to requirement
- Create a data flow source to retrieve information from the dimension table within the SQL Database, aiming to obtain corresponding IDs.
- Select the “+” icon to incorporate a lookup transformation. Within the lookup stream, choose the ‘month’ field. Then, within the lookup condition, specify the business key, for instance, the actual value of the month, utilizing the dynamic editor using the byName() function and provide column name
- Do a similar for getting the ID value from the other dimension table.
- Add sink transformation and keep mapping as auto mapping and in the dataset select the sql db and configure it with Avg_Monthly_Spends table.