SCD type 1 & type 2 in MERGE statement

SCD type 1 & type 2 in MERGE statement

Merge for SCD with more than two conditions

In my previous article, I have explained what does the SCD and described the most popular types of Slowly Changing Dimensions. I also mentioned that for one process, one table, you can specify more than one method. In this article, I would like to show Type 1 and Type 2 for a table, which is probably the most common conjunction.

SCD Type 1 & 2 in one permanent house

Let’s create an example tables: [stg].[CustomerPhone] (source) and [dim].[CustomerPhone] (destination).
I’m planning to implement different behaviour for selected columns:

  • Telephone – always UPDATE this column, but I’m not interested in keeping history of the values. It is SCD Type 1 thus.
  • Address – INSERT new row once the value change in this column. Means keeping history, means SCD Type 2.
  • StartDate, EndDate – we need these columns to provide point in time for SCD Type 2.
  • ScdVersion – optional column.
CREATE SCHEMA dim AUTHORIZATION dbo;
GO
CREATE SCHEMA stg AUTHORIZATION dbo;
GO

CREATE TABLE dim.CustomerPhone
(
    [Id]		INT NOT NULL ,
    [Name]		VARCHAR(50) NULL ,
    [Telephone] VARCHAR(50) NULL ,  --For Update in place (without history) = SCD Type 1
    [Address]	VARCHAR(50) NULL ,  --For INSERT, so keep history = SCD Type 2
    [StartDate] DATETIME NOT NULL , --Date attribute for SCD Type 2
    [EndDate]	DATETIME NOT NULL , --Date attribute for SCD Type 2
    [ScdVersion] INT NULL			--version attribute for SCD Type 2
);

CREATE TABLE stg.CustomerPhone
(
    [Id]		INT NOT NULL ,
    [Name]		VARCHAR(50) NULL ,
    [Telephone] VARCHAR(50) NULL ,
    [Address]	VARCHAR(50) NULL ,
);
GO

Processing SCD through MERGE statement

To achieve the goal, I will use one of my favourite method – MERGE. This T-SQL statement was introduced to SQL Server 2008. In one command you can make either UPDATE, INSERT and DELETE. Considering that you can use it in conjunction with OUTPUT – makes the very powerful alternative for SSIS.

Nevertheless, MERGE does not belong to pleasant for writing. The syntax is complex and very easy to make a mistake. Not mention how to use it correctly to implement established logic. And then, SCD Merge Wizard comes with help. I will use this application to create full MERGE statement for above table.

In short, I would like to run the following script:

--Prepare test
TRUNCATE TABLE stg.CustomerPhone;
TRUNCATE TABLE dim.CustomerPhone;
GO

--TEST 0: Insert first row
INSERT INTO stg.CustomerPhone VALUES (1, 'ABC', '555-555', '5TH AVE');
SELECT * FROM stg.CustomerPhone;
EXEC dbo.MergeDimCustomer;
SELECT * FROM dim.CustomerPhone;
GO

and whole logic should be applied. Only we need to have is Stored Procedure which contains “magic” MERGE. Let’s create the statement in an easy way. Using…

SCD Merge Wizard

You can download this free, open-source application from GitHub. When you install and run it – you’ll see the main window:

Click “Next” in order to go to the first step: set Source Connection. Use “Edit” button to establish the connection to source database:

Then select a source table like below:

Next step is Target Connection. If your server is the same as a source – just use “Copy from the Source Connection” option. Then, the only difference here will be a destination table.

Click “Next” to go to “User Variables” step, which contains certain pre-defined values we will use later on.

Step: Transformations

Yeah! This is the chamber of the sorcerer. At the beginning, the app maps all columns by names. Your role is to set up the transformation rules.


Here we prepare all magic, I mean… business logic:

  • Each MERGE must have a column key: set “Business Key” for column [Id]
  • Set “SCD1” for columns [Name] and [Telephone] as we want to update these fields every time.
  • Set “SCD2” for column [Address] as we want to create a new row in dimension table once the value change.

Next window is Business Key Check. Merge is very sensitive to duplicate values for column(s) defined as “Business Key”. They must be unique in the source, otherwise, it leads to unexpected behaviour or error. It’s totally understandable. In this step, you can check your source data with only one click.

Step: Options

Options are in the next step where you can establish the behaviour of MERGE and other parts of the script. The description of each option you can find on the bottom part of the window. For our purposes set the options as presented below:

Last step: Query

Congrats! Your first MERGE statement using SCD Merge Wizard has been prepared:

Now, copy the body (T-SQL) and enclose it into Stored Procedure:

CREATE PROCEDURE dbo.MergeDimCustomer
AS
BEGIN
-- ==================================================
-- Slowly Changing Dimension script by SCD Merge Wizard
-- Author: Miljan Radovic
-- Official web site: https://github.com/SQLPlayer/SCD-Merge-Wizard/
-- Version: 4.2.1.0
-- Publish date: 08/11/2017 03:17:27
-- Script creation date: 28/01/2018 12:31:13
-- ==================================================

-- ==================================================
-- USER VARIABLES
-- ==================================================
DECLARE @CurrentDateTime datetime
DECLARE @MinDateTime datetime
DECLARE @MaxDateTime datetime

SELECT
	@CurrentDateTime = cast(getdate() as datetime),
	@MinDateTime = cast('1900-01-01' as datetime),
	@MaxDateTime = cast('9999-12-31' as datetime)


-- ==================================================
-- SCD1
-- ==================================================
MERGE [dim].[CustomerPhone] as [target]
USING
(
	SELECT
		[Address],
		[Id],
		[Name],
		[Telephone]
	FROM [stg].[CustomerPhone]
) as 
ON
(
	.[Id] = [target].[Id]
)

WHEN MATCHED AND
(
	([target].[EndDate] = @MaxDateTime OR ([target].[EndDate] IS NULL AND @MaxDateTime IS NULL))
)
AND
(
	(.[Name] <> [target].[Name] OR (.[Name] IS NULL AND [target].[Name] IS NOT NULL) OR (.[Name] IS NOT NULL AND [target].[Name] IS NULL)) OR
	(.[Telephone] <> [target].[Telephone] OR (.[Telephone] IS NULL AND [target].[Telephone] IS NOT NULL) OR (.[Telephone] IS NOT NULL AND [target].[Telephone] IS NULL))
)
AND
(
	(.[Address] = [target].[Address] OR (.[Address] IS NULL AND [target].[Address] IS NULL))
)
THEN UPDATE
SET
	[target].[Name] = .[Name],
	[target].[ScdVersion] = [target].[ScdVersion] + 1,
	[target].[Telephone] = .[Telephone]
;

-- ==================================================
-- SCD2
-- ==================================================
INSERT INTO [dim].[CustomerPhone]
(
	[Address],
	[EndDate],
	[Id],
	[Name],
	[ScdVersion],
	[StartDate],
	[Telephone]
)
SELECT
	[Address],
	[EndDate],
	[Id],
	[Name],
	[ScdVersion],
	[StartDate],
	[Telephone]
FROM
(
	MERGE [dim].[CustomerPhone] as [target]
	USING
	(
		SELECT
			[Address],
			[Id],
			[Name],
			[Telephone]
		FROM [stg].[CustomerPhone]
	) as 
	ON
	(
		.[Id] = [target].[Id]
	)
	WHEN NOT MATCHED BY TARGET
	THEN INSERT
	(
		[Address],
		[EndDate],
		[Id],
		[Name],
		[ScdVersion],
		[StartDate],
		[Telephone]
	)
	VALUES
	(
		[Address],
		@MaxDateTime,
		[Id],
		[Name],
		1,
		@MinDateTime,
		[Telephone]
	)
WHEN MATCHED AND
(
	([EndDate] = @MaxDateTime OR ([EndDate] IS NULL AND @MaxDateTime IS NULL))
)
AND
(
	([target].[Address] <> .[Address] OR ([target].[Address] IS NULL AND .[Address] IS NOT NULL) OR ([target].[Address] IS NOT NULL AND .[Address] IS NULL))
)
	THEN UPDATE
	SET
		[EndDate] = @CurrentDateTime
	OUTPUT
		$Action as [MERGE_ACTION_91025df2-1344-4e4e-84f7-e790d1f77d7c],
		.[Address] AS [Address],
		@MaxDateTime AS [EndDate],
		.[Id] AS [Id],
		.[Name] AS [Name],
		INSERTED.[ScdVersion] + 1 AS [ScdVersion],
		@CurrentDateTime AS [StartDate],
		.[Telephone] AS [Telephone]

) MERGE_OUTPUT
WHERE MERGE_OUTPUT.[MERGE_ACTION_91025df2-1344-4e4e-84f7-e790d1f77d7c] = 'UPDATE' 
	AND MERGE_OUTPUT.[Id] IS NOT NULL
;
END
GO

Let’s get the party started!

It’s high time to test our brand-new procedure. Let’s insert the first row into staging and check if the row will be copied to target dimension.

--TEST 0: Insert first row
TRUNCATE TABLE stg.CustomerPhone;
INSERT INTO stg.CustomerPhone VALUES (1, 'ABC', '555-555', '5TH AVE');
SELECT * FROM stg.CustomerPhone;
EXEC dbo.MergeDimCustomer;
SELECT * FROM dim.CustomerPhone;
GO

So far, so good.
New customer added to staging should also appear in the target table.

--TEST 1: MATCH doesn't happen using Id, then the record must be inserted
INSERT INTO stg.CustomerPhone VALUES (2, 'DEF', '666', '5TH AVE');
EXEC dbo.MergeDimCustomer;
SELECT * FROM dim.CustomerPhone;
GO

New records always have min and max date, as we requested.

Test: SCD Type 1 (Update in place)

Now, we will test change for column [Telephone], which has been configured as SCD1. Update on this columns should cause a simple update on destination row, only for that column, nothing else. Let’s check out:

--TEST 2: MATCH but only [Telephone] field has changed - UPDATE only
--Means SCD1
UPDATE stg.CustomerPhone
SET [Telephone] = '777'
WHERE Id = 2;
EXEC dbo.MergeDimCustomer;
SELECT * FROM dim.CustomerPhone;
GO

Great. A new value for telephone can be seen, no other columns were changed.

Test: SCD Type 2 (Update+Insert)

Next test is to update [Address] column which has been configured as SCD2. That means we want to keep history changes and must create new row holding old version as well.

--TEST 3: MATCH happens but only the [Address] field has changed I want to add a new record and update the start and end dates.
--means SCD Type 2
UPDATE stg.CustomerPhone
SET [Address] = 'Summerhill Way'
WHERE Id = 2;
EXEC dbo.MergeDimCustomer;
SELECT * FROM dim.CustomerPhone;
GO

In above example we experienced following actions:

  1. New record has been created with [StartDate] as Current Time – GETDATE()
  2. The previous record has been marked as ended at Current Time (in normal scenario, I suggest to use real time of the change from OLTP, when available)

Summary

As you noticed – you can very quickly develop the business requirements doing the loading process into Data Warehouse’s dimension. Furthermore, you are able to save to a file whole mapping, connections and all options. Need to change the logic? Simply, open saved project, made the changes and replace procedure.
At any time you can review the Stored Procedure and even maintenance of that code is much easier comparing to SSIS.

SCD Merge Wizard on GitHublast release
Facebook profile

Give me a shout if you have any concerns or questions.

Thanks!

Previous Last week reading (2018-01-28)
Next Last week reading (2018-02-04)

About author

Kamil Nowinski
Kamil Nowinski 39 posts

Data Architect, MCSE Data Platform, MS BI Developer, Member of PLSSUG, co-organizer of SQLDay, speaker. Happy husband & father. #SQLFamily member.

View all posts by this author →

You might also like

Data Warehousing 1Comments

Slowly Changing Dimensions (SCD)

In today’s article I’d would like to focus on Slowly changing dimension, aka SCD. Dimension is a word excerpted from data warehousing as such. Not without a reason SCD is used

0 Comments

No Comments Yet!

You can be first to comment this post!

Leave a Reply

35 + = 44

Protected with IP Blacklist CloudIP Blacklist Cloud