Copying Multiple Files in a SharePoint Online Folder to Azure data lake using Azure Data Factory

  • Published
  • Posted in Azure
  • Updated
  • 4 mins read
  • Tagged as

To copy file from SharePoint Online requires Azure Active Directory/ Service Principal authentication and SharePoint API to retrieve files.

Step 1 – Register SharePoint Application and grant permissions.

A) Register Azure Active Directory Application

1)  On the Azure Portal go to Azure Active Directory App registration

2)  Click on New Registration and enter your App name.

3)  Go to “Certificates and Secrets”, Create New Client Secret and set it expire 1year/2year/Never.

B) Grant the SharePoint Site Permission to your registered App (need site owner permission on Sharepoint)

1) Open SharePoint online site link for ex. https://addendanalytics.sharepoint.com/sites/debootcamp/_layouts/15/appinv.aspx (replace this URL with your URL).

2) Now fill the required details as follows :

App id – your registered app id
Title – any title you want to give
App domain – localhost

Permission Request XML –

Click here.

Finally trust it.

Step 2 – Create an ADF Pipeline to get the data from SharePoint Online to Azure data lake Storage.

A) Create a Web activity to get the access token.

Fill the following details under the web activity as follows:

URL: https://accounts.accesscontrol.windows.net/[Tenant-ID]/tokens/OAuth/2

Replace your [Tenant-ID] in the URL
for ex:

URL: https://accounts.accesscontrol.windows.net/6f28e5b8-67fe-4207-a048-cc17b8e13499/tokens/OAuth/2

Method: POST

Headers:

Content-Type: application/x-www-form-urlencoded

Body:

grant_type=client_credentials&client_id=[Client-ID]@[Tenant-ID]&client_secret=[Client-Secret]&resource=00000003-0000-0ff1-ce00-000000000000/[Tenant-Name].sharepoint.com@[Tenant-ID]

[Client-ID]: Mention your client Id.

[Client-Secret]: Mention your client secret you got under certificates and secrets.

[Tenant-ID]: Mention your tenant Id.

[Tenant-Name]: Mention your tenant name.    

Debug run to check if the activity succeeds and also check the activity output to see if it returns the access token.

B)  Create another Web Activity to get the list of files present in the SharePoint Online Folder  

Fill the following details under the Web activity as follows:

URL: https://{site_url}/_api/web/GetFolderByServerRelativeUrl(‘/Folder Name’)/Files

Replace your site_url with your URL and Folder Name with the required folder name.
 

Method: GET

Headers:

Authorization: @{concat(‘Bearer ‘, activity(‘GetBearerToken’).output.access_token)}

Accept: application/json

Debug run to see if the activity succeeds and check it shows the list of files under the folder in the output.

From the below output we can see that we have successfully retrieved the list of files present in the SharePoint Online Folder.

C) Create a For Each activity with inner copy data activity to loop the list of relative file names.

Fill the details as follows:

Items: @activity(‘WebActivity2Name’).output.value

Replace WebActivity2Name with the name of your Web activity 2.

D) Create a Copy Activity inside the For each activity

1) Start with Creating a new dataset.

Take HTTP Connector and Binary datatype.

Then Create an HTTP Linked Service

Now Configure HTTP Linked Service as follows:

Base URL:

https://<SiteUrl>/_api/web/GetFileByServerRelativeUrl   (‘@{linkedService().FileName}’)/$value

Authentication Type: Anonymous           

2) Now configure Copy activity HTTP Source

Under dataset properties :

Name: Relative URL

Value: @{item().ServerRelativeUrl}

Request Method : GET

Additional Header:

@{concat(‘Authorization: Bearer ‘,activity(‘GetBearerToken’).output.access_token)}

3) Configure Linked Service properties:

Name: FileName (Any other name you can give)

Value: @dataset().RelativeURL

4) Create copy sink as follows :

You should have already had a data lake storage account in your resource group.

Create a new dataset of type data lake storage and select the datatype as binary.

Then Create a linked service for the dataset.

Now go to your storage account and create a container for receiving the files.

Create a dedicated folder under the container to receive files.

Now open the sink dataset and mention the File Path where you want to receive the files.

Now you can either add a trigger to the pipeline or you can manually debug it.

You can see the successful pipeline run as follows:

After the successful pipeline run, we can go to our data lake storage and check under our desired SharePoint container that we have received the files.

Conclusion:

By following the above steps, you can successfully get the files from your SharePoint Online folder to your desired location (Azure data lake Storage, Azure Blob Storage).

Aneesh
Addend Analytics 

Addend Analytics is a Microsoft Gold Partner based in Mumbai, India, and a branch office in the U.S.

Addend has successfully implemented 100+ Microsoft Power BI and Business Central projects for 100+ clients across sectors like Financial Services, Banking, Insurance, Retail, Sales, Manufacturing, Real estate, Logistics, and Healthcare in countries like the US, Europe, Switzerland, and Australia.

Get a free consultation now by emailing us or contacting us.