Setting up Code Repository for Azure Data Factory v2

Setting up Code Repository for Azure Data Factory v2

In this blog post I want to quick go through one of useful capabilities that Microsoft provided with version 2 of Azure Data Factory.

As a developer I always want to use code repository to keep all my changes, to manage tasks, branches, share the code with a team and simply… keep it in safe place. That works very well for many years for various code of applications as well as database projects (SSDT). But how to use code repository for tools who shares UI via browser as ADFv2 does? Doing a copy of exported ARM templates and committing them manually to code repository isn’t efficient, repeatable and no-errors burdened process, after all.

Attach project to Git repository

We can attach a code repository to the Azure Data Factory v2. You can do that for existing factory or for new one and let’s assume that you already have got a Git repository. Microsoft supports Git in Azure DevOps (formerly VSTS) only as GitHub repositories would be enabled in GA version of factory.

You can start doing it from two places:
1) Main dashboard – “Overview”. Click “Set up Code Repository” button (first from the right)
2) Go to design mode by selecting “Author” button and click “Data Factory” > “Set up Code Repository” (top-left corner)

ADFv2 – Overview dashboard

Set up Code Repository from Author section

When we choose “Azure DevOps Git” in Repository Type – the list of accounts available for you will be automatically filled up:

Repository Settings – Azure DevOps Account step

Then you will see all the project inside the project account you have just selected in the field “Project Name”. In this case I’ve chosen “DataServices”:

Repository Settings – Project name step

Subsequently you must select Git repository name. In this step you can create new repository or use an existing one. In this case, I used an existing repository called “RepoTest1”. Having that, do point out to “Collaboration branch” which is your Azure repository collaboration branch that is used for publishing. By default it is “master”, but you can change it if you want to publish resources from other branch.
In “Root folder” you can put the path which will be used to locate all resources of your Azure Data Factory v2, i.e. pipelines, datasets, connections, etc. Leave it as is or specify if you have more components/parts in the project’s repository.

[picture with all fields completed]

Import existing Data Factory resources to repository

That option allows you to make a initial commit of your current ADF into selected branch in the repository. Select the option when your repository has just been created or target folder is empty. Otherwise, you will get started with an empty project (synced to repo) whereas your existing project is left unassigned.

Leaving the option selected – you can select which branch will be your target of import:

  • Use Collaboration – the “Collaboration branch” will be used as a target (“master” in this case)
  • Create new – you might create new branch
  • Use Existing – select required branch you want use as a target

In my case – I will create new branch “InitAdfCode”:

Repository Settings – all complete

Once you click button “Save” – new branch is creating and one commit containing all resources (as JSON files) will be made. The following picture shows newly created branch and structure of folders.

Newly-created branch

Structure of folders. Familiar? Certainly.

The latest step is to pick up working branch. As I have just created the new one – I want to us it:

Once confirm that – you will see that UI is connected to the selected branch in GIT repository:

At any time you can change working branch selecting required from the list.

Making changes

Whenever you change any object in ADF v2 – Save button is become enabled. Once you click SAVE – all changes have made so far will be committed (PUSH) to the related folder in GIT repository. The only exception from that rule is when delete objects – the change is pushing automatically to the repo:

Azure DevOps – history of changes pushes by ADF v2

Publish

We are able to publish an edited ADF which generate publish branch which contains ARM templates that can be used for deployment purposes. Beforehand you must merge your changes from working branch to the master (or other branch pointed as collaboration branch).
To achieve that – use action Create pull request [Alt+P] (top-left corner) which does nothing but redirect you (in new browser’s tab) to project in Azure DevOps on a site with New Pull Request. Default direction is a “collaboration branch” (i.e. master) which can be the only source you can publish from. I’m not going to cover that topic in this post at all.

Switch assign repo to a different one

You have might noticed that you can have got only one assigned GIT repository. Which is completely fine (you can juggle within branches). But what if you make a mistake or just change your mind and need to re-assign from one GIT repo to another one? Still you can do that, but in that case having only one place where that action is available: Overview dashboard. Basically, the operation is based on remove Git and assign it again, so firstly you must do:

Detach / remove Git (repo)

Removing Git repo from Azure Data Factory is possible only in one place: Overview dashboard. In top-right corner you will see (or not. I haven’t for first time) small Git icon:

Clicking the icon you will gain access to information about attached GIT repository. At the bottom you’ll find “Remove Git” button. Once finish that – you can repeat the assigning process from the beginning.

Thanks for reading!

Previous Last week reading (2018-09-30)
Next Last week reading (2018-10-07)

About author

Kamil Nowinski
Kamil Nowinski 84 posts

Blogger, speaker. Data Platform MVP, MCSE. Senior Data Engineer & data geek. Member of Data Community Poland, co-organizer of SQLDay, Happy husband & father.

View all posts by this author →

You might also like

Big Data 1Comments

Auditing in Azure SQL Data Warehouse

The first article in a series on “Security Intelligence in Azure PaaS” inspired me to write something on auditing in Azure SQL Data Warehouse. To put it simply for anyone

Big Data 2 Comments

Functions in the USQL – the hidden gem in the Summer 2017 Update

Hello SQL Folks. We all love USQL for its great extensibility and how it makes our life easier in some particular “area of data”. There are a lot of things

Big Data 0 Comments

Azure Data Factory v2 and its available components in Data Flows

Many of you (including me) wonder about it. Namely: Is it possible to move my ETL process from SSIS to ADF? How can I reflect current SSIS Data Flow business

2 Comments

  1. Sachin
    December 13, 12:52 Reply

    The list of Azure DevOps Account is not getting listed for me. Can you tell me where could be the problem

    • Kamil Nowinski
      December 13, 23:41 Reply

      Hi Sachin. I had this issue at the beginning on my private Azure account. Once I connected my Azure DevOps Services (formerly Team Services) to AAD (Azure Active Directory) – Azure DevOps Account should be available on your list. Let me know whether that helped.

Leave a Reply

2 + 2 =

Protected with IP Blacklist CloudIP Blacklist Cloud