Maintenance Pipeline for Incremental Data Load using CDC

Change Data Capture ie CDC captures the data that is being inserted, updated, deleted from a table in database.

This is done by enabling CDC on database and then on the concerned table.

Here we are going to understand once CDC is enabled how can we maintain the pipeline so that it does not miss out on any record. For this purpose we will create a Maintenance Pipeline.

Once the CDC p is done the important thing that needs to be taken care of is of its Maintenance, by default the retention period of data to be inserted, deleted , modified to the target is of 3 days ie 4320 minutes. So, if data is not updated, inserted or deleted in the target within these 3 days then those records will go missing.

In order to avoid such cases of missing records, we create a pipeline for it. Wherein we capture such missing records and insert, update, or delete the records based on the operation.

Steps for creating a Maintenance Pipeline

Create a new pipeline in the Azure Data factory

Give a name to the pipeline as Maintenance Pipeline

Then add a Look Up activity in the canvas of the pipeline

Lookup Activity

GetMaxLsn : So this Look up Activity  will fetch the Maximum LSN that is the record that was inserted, updated or deleted recently for that purpose we need to create a schema util

and a table name util.load_table

syntax to create util.load_table

Maintenance Pipeline for Incremental Data Load using CDC - 1

Output of util.load_table

Maintenance Pipeline for Incremental Data Load using CDC - 2

Settings in LookUp Activity

These are the configurations to be used in the lookup activity to get the MaxLSN from the util table.

Maintenance Pipeline for Incremental Data Load using CDC - 3

Data set for look up activity

Maintenance Pipeline for Incremental Data Load using CDC - 4

Then the next we have to create a copy activity on Azure Portal

Below is the setting for copy activity

In this copy activity we will fetch the records that we want to store in the Target table for that we need to write a query as given below

Source:

Maintenance Pipeline for Incremental Data Load using CDC - 5

This is the query we need to write in order to fetch the latest record which has been inserted, deleted  or updated

This query will only fetch data where the operation is 1 for deleted, 2 for inserted and 4 for update after since operation = 3 is record before modification so we do not want to create redundancy so we will not have records where operation =3

Maintenance Pipeline for Incremental Data Load using CDC - 6

Sink

Here we will specify the target table where we want to store the data.

Maintenance Pipeline for Incremental Data Load using CDC - 7

Next ,we need to create a Type Table in our Target Dataset

This is like a temporary table which will help us in storing the data in the target table.

Maintenance Pipeline for Incremental Data Load using CDC - 8

Then we need to create a stored procedure in Target Database that we will perform Insert, Update, Delete based on the operations present in the __$operation column present in cdc.dbo_sales_CT

If __$operation are following values

__$operation  =  1 then delete

__$operation  = 2 then insert

__$operation = 3 then this signifies update before

__$operation = 4 this is update after

In case of update there are two records one is the original record another is the updated record so we need to use the record with __$operation = 4 and not use __$operation =3 record

Maintenance Pipeline for Incremental Data Load using CDC - 9

Then in order to test if our maintenance pipeline is working or not we change the retention period of cleanup of cdc in on premises database

Here we have set it to 5 mins so all the updates , insert, deletes which are performed gets cleaned up so we run the maintenance pipeline to capture the records.

EXECUTE sys.sp_cdc_change_job

    @job_type = N’cleanup’,    @retention = 5;

Addend Analytics is a Microsoft Gold Partner based in Mumbai, India, and a branch office in the U.S.

Addend has successfully implemented 100+ Microsoft Power BI and Business Central projects for 100+ clients across sectors like Financial Services, Banking, Insurance, Retail, Sales, Manufacturing, Real estate, Logistics, and Healthcare in countries like the US, Europe, Switzerland, and Australia.

Get a free consultation now by emailing us or contacting us.