9 minute read

Overview

When starting with Azure data factory, one of the trickier concept to get your head around is setting up CI/CD.

In this guide, I want to simplify and elaborate on a very basic setup to get you off the ground. There is so much more you can do, once you realize that at the very core this is all about just moving your well tested artifacts from a lower environment to production.

When I say basic setup, I mean starting off with just two environment:

  • Dev
  • Prod

basic-cicd

Resources

You will need following resources to test a working setup at your end:

  1. Azure subscription - to create azure data factory.
  2. Azure DevOps account - to host your code repo and set up devops release pipelines

Create Azure Data Factory

Create two Azure data factories, representing the two environments - dev & prod. Here is a sample template of how you can go about creating it. create-adf

As a best practice, always create related resources in the same resource group. Always set up a naming convention and stick to it for consistency, little time spend in setting this up will pay huge dividends in long run.

Create code repo in Azure DevOps

  1. Sign in to Azure DevOps
  2. Optional Steps
    • Before you can create a project/repo, go to Organization setting > Azure Active Directory and connect to your default directory.
    • To make sure you are in the right org directory, click on your account name towards the top right hand side corner and click on Switch directory, make sure Default Directory is selected.
  3. Create a new project.
  4. Create and initialize a new code repo.
  5. Within your code repo, you can choose to create a directory structure that suits your needs. In my case I created a directory data_pipelines at the root. ado-repo-struc

Configuration

Git configuration in dev data factory

Why do we need Git configuration in dev data factory only ?

Because usually in your dev environment, more than one developer will be working to develop pipelines. If you have git integration, every developer can work independently on a copy (feature branch) of your main/master branch and merge code back to main/master branch when ready. This also allows code to be securely saved in the repo without the risk of losing your work or overwriting someone else’s work.

To setup Git configuration on an existing data factory, go to Manage > Git configuration

git-config

Git configuration is not required in Prod Data factory, to move code from Dev to Prod we will use Azure DevOps Release pipeline.

Deployment process

  1. Open Author panel in your Dev data factory (after completing Git configuration).
  2. You can choose a working branch from the top menu bar, when you are starting select the option New branch ado-new-branch
  3. You can then do your development in this branch, complete testing and when ready create a pull request to merge the code from your working branch to main/master branch.
  4. Once approved and merged, you can open Dev data factory again, select main/master branch and click on publish button. This will push your code to a special branch within your repo adf_publish.

ADO Release Pipeline

This is the final step, which will push the code from adf_publish branch in your repo to Prod data factory.

  1. Open Azure DevOps repo and create a new release ado-new-release
  2. Add an artifact
    • Select Empty job
    • Give a meaningful name to the Stage
    • Add artifact as show in the snapshot below release-config-1
  3. Create Task
    • Configure Agent Job agent-job
    • Add a task of type ARM template deployment to Agent job arm-template-deployment
    • Set up Azure Resource Manager Connection azure-res-mgr-conn
    • Override template parameters - this is where we will specify the name of our production data factory. override
  4. You can now create a New release and select option Deploy > Deploy multiple to start deployment.
  5. Once successfully completed, you can open your prod data factory to verify deployment.

Deployment error

One common error that you can encounter is not having free parallelism grant. As of writing this article you have to request it using a Microsoft form

Error message: No hosted parallelism has been purchased or granted. To request a free parallelism grant, please fill out the following form https://aka.ms/azpipelines-parallelism-request

adf-pipeline-error

Comments