SCD type 1 & type 2 in MERGE statement

SCD type 1 & type 2 in MERGE statement

Merge for SCD with more than two conditions

In my previous article, I have explained what does the SCD and described the most popular types of Slowly Changing Dimensions. I also mentioned that for one process, one table, you can specify more than one method. In this article, I would like to show Type 1 and Type 2 for a table, which is probably the most common conjunction.

SCD Type 1 & 2 in one permanent house

Let’s create an example tables: [stg].[CustomerPhone] (source) and [dim].[CustomerPhone] (destination).
I’m planning to implement different behaviour for selected columns:

  • Telephone – always UPDATE this column, but I’m not interested in keeping history of the values. It is SCD Type 1 thus.
  • Address – INSERT new row once the value change in this column. Means keeping history, means SCD Type 2.
  • StartDate, EndDate – we need these columns to provide point in time for SCD Type 2.
  • ScdVersion – optional column.

Processing SCD through MERGE statement

To achieve the goal, I will use one of my favourite method – MERGE. This T-SQL statement was introduced to SQL Server 2008. In one command you can make either UPDATE, INSERT and DELETE. Considering that you can use it in conjunction with OUTPUT – makes the very powerful alternative for SSIS.

Nevertheless, MERGE does not belong to pleasant for writing. The syntax is complex and very easy to make a mistake. Not mention how to use it correctly to implement established logic. And then, SCD Merge Wizard comes with help. I will use this application to create full MERGE statement for above table.

In short, I would like to run the following script:

and whole logic should be applied. Only we need to have is Stored Procedure which contains “magic” MERGE. Let’s create the statement in an easy way. Using…

SCD Merge Wizard

You can download this free, open-source application from GitHub. When you install and run it – you’ll see the main window:

Click “Next” in order to go to the first step: set Source Connection. Use “Edit” button to establish the connection to source database:

Then select a source table like below:

Next step is Target Connection. If your server is the same as a source – just use “Copy from the Source Connection” option. Then, the only difference here will be a destination table.

Click “Next” to go to “User Variables” step, which contains certain pre-defined values we will use later on.

Step: Transformations

Yeah! This is the chamber of the sorcerer. At the beginning, the app maps all columns by names. Your role is to set up the transformation rules.


Here we prepare all magic, I mean… business logic:

  • Each MERGE must have a column key: set “Business Key” for column [Id]
  • Set “SCD1” for columns [Name] and [Telephone] as we want to update these fields every time.
  • Set “SCD2” for column [Address] as we want to create a new row in dimension table once the value change.

Next window is Business Key Check. Merge is very sensitive to duplicate values for column(s) defined as “Business Key”. They must be unique in the source, otherwise, it leads to unexpected behaviour or error. It’s totally understandable. In this step, you can check your source data with only one click.

Step: Options

Options are in the next step where you can establish the behaviour of MERGE and other parts of the script. The description of each option you can find on the bottom part of the window. For our purposes set the options as presented below:

Last step: Query

Congrats! Your first MERGE statement using SCD Merge Wizard has been prepared:

Now, copy the body (T-SQL) and enclose it into Stored Procedure:

Let’s get the party started!

It’s high time to test our brand-new procedure. Let’s insert the first row into staging and check if the row will be copied to target dimension.

So far, so good.
New customer added to staging should also appear in the target table.

New records always have min and max date, as we requested.

Test: SCD Type 1 (Update in place)

Now, we will test change for column [Telephone], which has been configured as SCD1. Update on this columns should cause a simple update on destination row, only for that column, nothing else. Let’s check out:

Great. A new value for telephone can be seen, no other columns were changed.

Test: SCD Type 2 (Update+Insert)

Next test is to update [Address] column which has been configured as SCD2. That means we want to keep history changes and must create new row holding old version as well.

In above example we experienced following actions:

  1. New record has been created with [StartDate] as Current Time – GETDATE()
  2. The previous record has been marked as ended at Current Time (in normal scenario, I suggest to use real time of the change from OLTP, when available)

Summary

As you noticed – you can very quickly develop the business requirements doing the loading process into Data Warehouse’s dimension. Furthermore, you are able to save to a file whole mapping, connections and all options. Need to change the logic? Simply, open saved project, made the changes and replace procedure.
At any time you can review the Stored Procedure and even maintenance of that code is much easier comparing to SSIS.

SCD Merge Wizard on GitHublast release
Facebook profile

Give me a shout if you have any concerns or questions.

Thanks!

Previous Last week reading (2018-01-28)
Next Last week reading (2018-02-04)

About author

Kamil Nowinski
Kamil Nowinski 84 posts

Blogger, speaker. Data Platform MVP, MCSE. Senior Data Engineer & data geek. Member of Data Community Poland, co-organizer of SQLDay, Happy husband & father.

View all posts by this author →

You might also like

Data Warehousing 1Comments

Slowly Changing Dimensions (SCD)

In today’s article I’d would like to focus on Slowly changing dimension, aka SCD. Dimension is a word excerpted from data warehousing as such. Not without a reason SCD is used

Data Warehousing 1Comments

Auditing in Azure SQL Data Warehouse

The first article in a series on “Security Intelligence in Azure PaaS” inspired me to write something on auditing in Azure SQL Data Warehouse. To put it simply for anyone

DataCommunity 0 Comments

BI Connected by SII – conference in Wrocław

Good day SQL folks! The conference season has begun. You can find very interesting conferences every week in almost every place in the globe. Of course you do not need

4 Comments

  1. DD
    July 17, 10:35 Reply

    After I downloaded the git folder how do I open the wizard?

    • Kamil Nowinski
      July 17, 22:56 Reply

      If you downloaded source code – there is a project file in “SCD-Merge-Wizard” folder. Open it in Visual Studio. Alternatively, you can download MSI file and install the tool. Let me know if you have any other questions.

  2. Persistent
    November 28, 20:14 Reply

    Great stuff! The MERGE statement is really useful, but a hassle to use, so a generator is a fantastic idea.

    Just had a chance to play with your sample. However I had errors in the MergeDimCustomer sproc when copying it from the web page (line 99 among other things), but following your sample does produce a correct result 🙂

    Could you elaborate on the effect of the various settings on the Options pane?

    • Kamil Nowinski
      November 28, 21:06 Reply

      Thank you for your kind words. Much appreciated!
      In terms of a bug in script: one of recently installed WordPress’ plugin had to screw it up as the code has not been changed since post publishing. I applied better plugin to present T-SQL code in this post and everything should be fine now. Thanks for being vigilant!
      Option pane – great idea as I suppose that some of those options could not have to be obvious. Stay tuned.
      Thank you for reading. I hope that the post helped you.

Leave a Reply

14 − 6 =

Protected with IP Blacklist CloudIP Blacklist Cloud