Thursday, November 24, 2016

Type 2 Slowly Changing Dimensions (SCD) with T-SQL using Merge statement

Changing value of Attribute in dimension table is called Slowly Changing dimensions in Data ware house (DWH).
To know it more please go to some web site in Internet. Today we are going to Implement SCD 2 by T-SQL.
We have taken a dimension table named ‘SCD_Emp’ and data source table named ‘Table_Dump

CREATE TABLE [dbo].[SCD_Emp](
       [Id] [int] NULL,
       [Name] [varchar](200) NULL,
       [Dept] [varchar](50) NULL,
       [StartDate] [date] NULL,
       [EndDate] [date] NULL,
       [IsActive] [bit] NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Table_Dump](
       [Id] [int] NULL,
       [EmpName] [varchar](200) NULL,
       [Dept] [varchar](50) NULL

) ON [PRIMARY]

Now we have inserted 1 Row in SCD_Emp and 2 Rows in Table_Dump.





From the above picture we can see that We have 1 row for ‘Pritam’ where Department is ‘AIM’, but in source table there are 2 rows. One for ‘Pritam’ with new department ‘DOD’ and other row is not exist in Dimension table ‘SCD_Emp’.


Now we have 3 Action point.
First: Insert the row with EmpName ‘Dinesh’ into SCD_Emp from Table_Dump because it is new Entry (Id not exist in SCD_Emp).
Second: Update the existing row for ‘Pritam’ with EndDate as Yesterday’s date and IsActive as ‘False’, because this record is no more valid since new department came for ‘Pritam’.

Third: Insert new Row for ‘Pritam’ with New department ‘DOD’, StartDate as today’s date and IsActive as ‘True’.

Let’s implement it using T-SQL Script:

Insert into [dbo].[SCD_Emp]

select SCDImplementation.Id,
SCDImplementation.EmpName,
SCDImplementation.Dept,
SCDImplementation.Startdate,
SCDImplementation.EndDate,
SCDImplementation.Isactive
from (
MERGE SCD_Emp DEST
                    USING Table_Dump SRC

                    ON ( DEST.Id = SRC.Id )
                    WHEN NOT MATCHED
                        THEN
              INSERT  VALUES
                ( SRC.Id ,
                  SRC.EmpName ,
                  SRC.Dept ,
                  Getdate() ,
                  '2099-12-31' ,
                  'True'
                )

                    WHEN MATCHED AND DEST.IsActive = 'True'
                        AND ( DEST.Dept <> SRC.Dept )
                        THEN
               UPDATE   SET
                DEST.IsActive = 'False' ,
                DEST.EndDate =Getdate()-1
                    OUTPUT
                        $Action Action_Out ,
                        SRC.Id ,
                        SRC.EmpName ,
                        SRC.Dept ,
                        Getdate() StartDate,
                        '12/31/2199' EndDate ,
                        'True' Isactive

) as SCDImplementation
where SCDImplementation.Action_Out='UPDATE';



Output:



How Does it work?
Here MERGE statement’s output option captured all the changes it makes and tags them in an output table.
This feature captures all Rows from Source table based on Marching ID in both tables.
  • If Id matched both the table and IsActive column is ‘True’ but other columns value having mismatch then, it will come under ‘Update’ Action Output.
  • If Id is not match in both tables, then the entire record will come under ‘Insert’ Action Output.
Let’s run Only MERGE statement:

MERGE SCD_Emp DEST
                    USING Table_Dump SRC

                    ON ( DEST.Id = SRC.Id )
                    WHEN NOT MATCHED
                        THEN
              INSERT  VALUES
                ( SRC.Id ,
                  SRC.EmpName ,
                  SRC.Dept ,
                  Getdate() ,
                  '2099-12-31' ,
                  'True'
                )

                    WHEN MATCHED AND DEST.IsActive = 'True'
                        AND ( DEST.Dept <> SRC.Dept )
                        THEN
               UPDATE   SET
                DEST.IsActive = 'False' ,
                DEST.EndDate =Getdate()-1
                    OUTPUT
                        $Action Action_Out ,
                        SRC.Id ,
                        SRC.EmpName ,
                        SRC.Dept ,
                        Getdate() StartDate,
                        '12/31/2199' EndDate ,

                        'True' Isactive;

Note: To run MERGE statement we have to give semi colon ‘;’ at the end of statement.

Output:

Now we have to insert those row, which only comes under ‘Update’ Action Output from MERGE statement into Dimension table (SCD_Emp).

Insert into [dbo].[SCD_Emp]
Select SCDImplementation.Id,
SCDImplementation.EmpName,
SCDImplementation.Dept,
SCDImplementation.Startdate,
SCDImplementation.EndDate,
SCDImplementation.Isactive
from (/***MARGE Statement***/) as SCDImplementation

where SCDImplementation.Action_Out='UPDATE'




We should not Insert row which comes under ‘Insert’ Action Output because we already written Insert query for that in 1st step of marge statement.


MERGE SCD_Emp DEST
                    USING Table_Dump SRC

                    ON ( DEST.Id = SRC.Id )
                    WHEN NOT MATCHED
                        THEN
              INSERT  VALUES
                ( SRC.Id ,
                  SRC.EmpName ,
                  SRC.Dept ,
                  Getdate() ,
                  '2099-12-31' ,
                  'True'
                )


No comments:

Post a Comment