Tuesday, September 27, 2022
HomeBusiness IntelligenceSlowly Altering Dimension (SCD) in Energy BI, Half 1, Introduction to SCD

Slowly Altering Dimension (SCD) in Energy BI, Half 1, Introduction to SCD

Slowly altering dimension (SCD) is a knowledge warehousing idea coined by the superb Ralph Kimball. The SCD idea offers with transferring a particular set of knowledge from one state to a different. Think about we’ve got a human assets (HR) system; Stephen Jiang is a Gross sales Supervisor, managing 10 gross sales representatives in his group. The next screenshot exhibits the pattern knowledge:

SCD in Power BI, Stephen Jiang is the sales manager of a team of 10 sales representatives
Picture 1: Stephen Jiang is the gross sales supervisor of a group of 10 gross sales representatives

At this time, Stephen Jiang acquired his promotion to the Vice President of Gross sales position, so his group has grown in dimension from 10 to 17. Stephen is similar particular person, however his position is now modified, as proven within the following picture:

SCD in Power BI, Stephen's team after he was promoted to Vice President of Sales
Picture 2: Stephen’s group after he was promoted to Vice President of Gross sales

One other instance is when a buyer’s deal with adjustments in a gross sales system. Once more, the client is similar, however their deal with is now totally different. From a knowledge warehousing standpoint, we’ve got totally different choices to take care of the information relying on the enterprise necessities, main us to various kinds of SDCs. It’s essential to notice that the information adjustments within the transactional supply programs (in our examples, the HR system or a gross sales system). We transfer and remodel the information from the transactional programs by way of extract, remodel, and cargo (ETL) processes and land it in a knowledge warehouse, the place the SCD idea kicks in. SCD is about how adjustments within the supply programs replicate the information within the knowledge warehouse. These sorts of adjustments within the supply system don’t occur fairly often therefore the time period slowly altering. Many SCD varieties have been developed over time, which is out of the scope of this publish, however on your reference, we cowl the primary three varieties as follows.

SCD sort zero (SCD 0)

With any such SCD, we ignore all adjustments in a dimension. So, when an individual’s residential deal with adjustments within the supply system (an HR system, in our instance), we don’t change the touchdown dimension in our knowledge warehouse. In different phrases, we ignore the adjustments inside the knowledge supply. SCD 0 is additionally known as mounted dimensions.

SCD sort 1 (SCD 1)

With an SCD 1 sort, we overwrite the previous knowledge with the brand new. A wonderful instance of an SCD 1 sort is when the enterprise doesn’t want the client’s previous deal with and solely must preserve the client’s present deal with.

SCD sort 2 (SCD 2)

With any such SCD, we preserve the historical past of knowledge adjustments within the knowledge warehouse when the enterprise must preserve the client’s previous and present addresses. In an SCD 2 situation, we have to preserve historical past, so we insert a brand new row of knowledge into the information warehouse at any time when a transactional system adjustments. Inserting a brand new row of knowledge causes knowledge duplications within the knowledge warehouse, which implies that we can not use the CustomerKey column as the first key of the dimension. Therefore, we have to introduce a brand new set of columns, as follows:

  • A brand new key column that ensures rows’ uniqueness within the Prospects dimension. This new key column is just an index representing every row of knowledge saved in a knowledge warehouse dimension. The brand new key’s a so-called surrogate key. Whereas the Surrogate Key ensures every row within the dimension is exclusive, we nonetheless want to keep up the supply system’s major key. By definition, the supply system’s major keys are actually referred to as enterprise keys or alternate keys within the knowledge warehousing world.
  • Begin Date and an Finish Date column characterize the timeframe throughout which a row of knowledge is in its present state.
  • One other column exhibits the standing of every row of knowledge.

SCD 2 is essentially the most frequent sort of SCD.

Let’s revisit our earlier instance when Stephen Jiang was promoted from Gross sales Supervisor to Vice President of Gross sales. The next screenshot exhibits the information earlier than Stephen acquired the promotion:

SCD in Power BI, The employee data before Stephen was promoted
Picture 3: The worker knowledge earlier than Stephen was promoted

The EmployeeKey column is the Surrogate Key of the dimension, and the EmployeeBusinessKey column is the Enterprise Key (the first key of the client within the supply system); the Begin Date column exhibits the date Stephen Jiang began his job as North American Gross sales Supervisor, the Finish Date column has been left clean (null), and the Standing column exhibits Present. Now, let’s take a look on the knowledge after Stephen will get the promotion, which is illustrated within the following screenshot:

SCD in Power BI, The employee data after Stephen gets promoted
Picture 4: The worker knowledge after Stephen will get promoted

Because the above picture exhibits, Stephan Jiang began his new position as Vice President of Gross sales on 13/10/2012 and completed his job as North American Gross sales Supervisor on 12/10/2012.

Let’s see what SCD 2 means in relation to knowledge modeling in Energy BI. The primary query is: Can we implement SCD 2 instantly in Energy BI Desktop with out having a knowledge warehouse? To reply this query, we should do not forget that we create a semantic layer when constructing a knowledge mannequin in Energy BI. In a earlier publish, I defined totally different elements of a BI resolution, together with the semantic layer. However I repeat it right here. The semantic layer, by definition, is a view of the supply knowledge (normally a knowledge warehouse), optimised for reporting and analytical functions. The semantic layer doesn’t change the information warehouse or one other model of the information warehouse. So the reply is not any, we can not implement the SCD 2 performance in Energy BI. So we both want a knowledge warehouse, or the transactional system has a mechanism to help sustaining the historic knowledge, similar to a temporal mechanism. A temporal mechanism is a function that some relational database administration programs similar to SQL Server provide to offer details about the information saved in a desk at any time as an alternative of holding the present knowledge solely. To study extra about temporal tables in SQL Server, test this out.

After we load the information into the information mannequin in Energy BI Desktop, we’ve got all present and historic knowledge within the dimension tables. Subsequently, we’ve got to watch out when coping with SCDs. As an example, the next screenshot exhibits reseller gross sales for workers:

SCD in Power BI, SCD in Power BI, Reseller sales for employees without considering SCD
Picture 5: Reseller gross sales for workers with out contemplating SCD

At a primary look, the numbers appear to be right. Effectively, they might be proper; they might be improper. It will depend on what the enterprise expects to see on a report. Take a look at Picture 4, which exhibits Stephen’s adjustments. Stephen had some gross sales values when he was a North American Gross sales Supervisor (EmployeeKey 272). However after his promotion (EmployeeKey 277), he’s not promoting anymore. We didn’t contemplate SCD once we created the previous desk, which suggests we contemplate Stephen’s gross sales values (EmployeeKey 272). However is that this what the enterprise requires? Does the enterprise count on to see all workers’ gross sales with out contemplating their standing? For extra readability, let’s add the Standing column to the desk.

SCD in Power BI, Reseller sales for employees and their status without considering SCD
Picture 6: Reseller gross sales for workers and their standing with out contemplating SCD

What if the enterprise must solely present gross sales values just for workers when their standing is Present? In that case, we must issue the SCD into the equation and filter out Stephen’s gross sales values. Relying on the enterprise necessities, we would want so as to add the Standing column as a filter within the visualizations, whereas in different circumstances, we would want to switch the measures by including the Begin DateFinish Date, and Standing columns to filter the outcomes. The next screenshot exhibits the outcomes once we use visible filters to take out Stephen’s gross sales:

SCD in Power BI, SCD in Power BI, Reseller sales for employees considering SCD
Picture 7: Reseller gross sales for workers contemplating SCD

Coping with SCDs isn’t at all times so simple as this. Typically, we have to make some adjustments to our knowledge mannequin.

So, do all of the above imply we can not implement any sorts of SCDs in Energy BI? The reply, as at all times, is “it relies upon.” In some situations, we are able to implement an answer just like the SCD 1 performance, which I clarify in one other weblog publish. However we’re out of luck in implementing the SCD 2 performance purely in Energy BI.

Have you ever used SCDs in Energy BI, I’m curious to know in regards to the challenges you confronted. So please share you ideas within the feedback part under.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments