Slowly Changing Dimensions (SCD)

Slowly Changing Dimensions (SCD)

In today’s article I’d would like to focus on Slowly changing dimension, aka SCD. Dimension is a word excerpted from data warehousing as such. Not without a reason SCD is used very often in terms of Data Warehouse (DW) topics and can be use for audit purposes in OLTP systems.

The goal of the article is to review the types of Slowly Changing Dimension as there are few version of them.

Why do I need SCD?

The data is changing over time. You might not care about historical version of data in your OLTP database, but it might be business requirement in order to maintain all historical changes. In other words, you must be able to show, in a sales report, a correct regional office of a sales person, no matter when the person has changed the region. In that case we must keep the data before change and after the update.

How Transformations Work?

There are few types of SCD.

Depending of SCD type you can keep none, one or more historical changes of rows.
The idea is simple: every time the row is changing, a old version (before UPDATE or DELETE) of row will be kept in SCD table.
The implementation of these things is not always easy. Let’s have a look at examples.

As an example of source data I will use the following data row:

SCD Source data example

In the example above, CustomerId is a PrimaryKey in source (OLTP) database.
We introduce surrogate key (CustomerKey) in DW to avoid duplication of original CustomerId and to optimize performance, especially when source system uses character or complex keys.

SCD data in Data Warehouse example

SCD Type 0

This type doesn’t keep any changes of attributes. It keeps only the original value without any changes. It used mainly for attributes which can’t be changed, like BirthDate.

SCD Type 1: overwriting the history for an attribute

This method can not track the changes in data, and overwrites the old data with new. Once our customer change his address, new address would overwrite:

SCD Type 1

As you can see – it’s very simple to implement and maintain the process – it’s simple update. In other hands, the drawback of this approach is lack of row’s history in Data Warehouse.

SCD Type 2: adding new rows

The methodology is preserving the history by adding new rows and there is no limit on number of keeping changes.

SCD Type 2

Additionally, you can decide in which way you want to know older rows by maintain current flag column and/or UpdateDate:

SCD Type 2 with current flag

or version of row which will be incremented sequentially:

SCD Type 2 with version

SCD Type 3: additional historical columns

In this approach you can manage a limited amount of history by adding additional columns for tracking attributes.
Wanting to have previous and current city of customer you can implement columns: PreviousCity and CurrentCity. There will be only one row in Data Warehouse (DW) per source database therefore Type 3 has limited usability and is less popular than Type 1 and 2.

SCD Type 3

You must be careful if you want to use Type 3 of SCD. Make sure that Business Analyst and/or your Customer are aware of history’s limitations.

SCD Type 4: Separate historical table

This type uses separate tables for keeping the current records and historical changes. It worth to use when:
• Dimension is changing pretty often or
• Dimension is multi-million rows table

SCD Type 4 current table

SCD Type 4 – historical table

Margy Ross from Kimball Group calls such tables as “mini-dimension”.

SCD Type 6:

This approach is numbered as 6 as 1+2+3=6 as well 1x2x3=6. Therefore it’s a combination of these three types:
• Keep both previous and current values for attribute (as Type 3, look at columns: “HistoricCity”, “CurrentCity”)
• Overrides all records for current version of attribute (as Type 1, look at column “CurrentCity”)
• Keeps historical version of attribute (as Type 2, look as columns: “IsCurrent”, “StartDate”, “EndDate”)
Let’s have a look how the row is being changed and how its affect on DW. The first loading of the row:

SCD Type 6 (step 1)

Rows in DW’s Customer table following first customer relocation:

SCD Type 6 (step 2)

Rows in DW’s Customer table following second customer relocation:

SCD Type 6 (step 3)

Which Type should I choice?

As a professional consultant I reply professionally:

It does depend

Ask your business team, business analytics or your customer: What they need in data warehouse.
Maybe I didn’t say it loud enough, but in the same table you can apply different SCD Types to each column.
For instance: Type 2 to the City column, Type 1 to the BirthDate column. All in the same table.
Therefore ask them for every dimension:
• which attributes are constant,
• which attributes are variables?
• which attributes should be fully-tracked, which should keep only previous and current version?
• How many rows are in table right now / are expected in the next 10 years?
Keep these answers as a evidence. People, especially customers, love to change their mind. You know what I mean.
If you still not sure which one to be chosen, remember that:

Keeping a history of the changes are good for you

Simply because the requirements could change.
Beside of types 0 and 1, in my opinion, the most popular are Type 2 and Type 4.

How to load data into SCD?

Good question. But this topic is a completely different story.
Before I prepare next post about it – have a look on this tool:
SCD Merge Wizard
The tool helps you to create, manage and maintenance of mapping between source database (e.g. staging) and target database (Data Warehouse). Furthermore, with this approach you will use MERGE statement to perform loading operations using configured SCD Type for each column.
In further posts, I will explain how to apply different SCD Types in this tool.

Good luck with your loadings,
Kamil

 

Previous Last week reading (2018-01-07)
Next ASF 005: Marcin Szeliga interview

About author

Kamil Nowinski
Kamil Nowinski 47 posts

Data Architect, MCSE Data Platform, MS BI Developer, Member of PLSSUG, co-organizer of SQLDay, speaker. Happy husband & father. #SQLFamily member.

View all posts by this author →

You might also like

Data Warehousing 0 Comments

SCD type 1 & type 2 in MERGE statement

In my previous article, I have explained what does the SCD and described the most popular types of Slowly Changing Dimensions. I also mentioned that for one process, one table,

Big Data 1Comments

Auditing in Azure SQL Data Warehouse

The first article in a series on “Security Intelligence in Azure PaaS” inspired me to write something on auditing in Azure SQL Data Warehouse. To put it simply for anyone

1 Comment

Leave a Reply

+ 27 = 30

Protected with IP Blacklist CloudIP Blacklist Cloud