Incremental Load With Azure Data Factory (ADF)
4.8 out of 5 based on 9597 votesLast updated on 18th Dec 2024 20.7K Views
- Bookmark

Incremental load in Azure Data Factory (ADF) efficiently transfers data from on-premises SQL Server to Azure SQL Database.

What is a Watermark?
A watermark refers to a column in the source table that has the last updated time stamp or an incrementing key. This is primarily useful for identifying the new or updated records that need to be loaded into your target data store. Using Watermark facilitates efficient data loading and results in reducing the amount of data transferred and processed. To further know about it, one can visit Microsoft Azure Training. Furthermore, it ensures faster data pipelines due to smaller data loads.
Workflow for Using a Watermark:
Let's understand the step-by-step implementation process for the incremental loading of data.
Step 1- This includes Creating the Table and Data Population On-Premises.
First of all, you have to create a database on my local SQL Server. Inside that database, you have to create a table named 'dbo.student'. After this insert three records into this table and verify that the data is correct. Furthermore, to copy this data into a table named 'Student' in an Azure SQL database, use the 'updateDate' column in the 'Student' table as a watermark. This is necessary to determine which records need to be copied or updated.
Step 2: It Consists of Table Creation and Data Population in Azure
In this step, you have to create an Azure SQL Database through the Azure portal. After this, you have to connect the database through SSMS. After connecting, create a table, named Student in the same structure as the Student table created in the on-premise SQL Server. It includes a studentId column which is not IDENTITY and will be used to store the studentId values from the source table.
After this, create another table called stgStudent with the same structure as Student. You can use it as a staging table before loading data into the Student table. After that, create a table named WaterMark for maintaining the multiple tables in the source database.
Step 3: Create a Self-Hosted Integration Runtime
The next thing you have to do is to create an ADF resource from the Azure Portal. Then open the ADF resource by going to the Manage link of the ADF and create a new self-hosted integration runtime. The Integration Runtime (IR) is the compute infrastructure used by ADF for data flow, data movement and SSIS package execution. Having a self-hosted IR is necessary for measuring the movement of data from on-premise SQL Server to Azure SQL.
Step 4: It includes Creating the Azure Integration Runtime
The Azure Integration Runtime (IR) is necessary for copying the data between cloud data stores. For this, you can use the default options and set up the runtime with the name azureIR2.