SQL Server - Change Data Capture (CDC)
Change Data Capture is a mechanism built into SQL Server since 2008 that is intended to keep a history of changes to data in one or more tables. Indeed in some cases, you want to save all the steps (creation, modification, deletion) to arrive at the data of a table at a given time.
#Enable Change Data Capture
Change Data Capture is not enabled by default, so use the following command:
USE Sample
GO
EXEC sys.sp_cdc_enable_db
GO
You can check the good execution of the command via the following query:
SELECT [name], database_id, is_cdc_enabled
FROM sys.databases
You can also see that SQL Server adds some new tables to the database:
The names of the tables are explicit, but here is an example of the data contained in each of them:
cdc.captured_columns
: list of captured columnscdc.change_tables
: Tables whose capture is enabledcdc.ddl_history
: Contains all structure changes (DDL)cdc.index_columns
: Contains the indexes of the captured tablescdc.lsn_time_mapping
: Contains the link between the LSNs and the date.
#Enable Change Data Capture on a table
Once CDC is activated at the database level, it must be activated on the tables for which you want to follow the changes. For this we execute the following procedure:
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Employee',
@role_name = NULL
GO
If necessary you can filter the columns to track to reduce the amount of data saved:
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Employee',
@role_name = NULL,
@index_name = N'PK_Employee',
@captured_column_list = N'[Id],[FirstName],[LastName]'
GO
Once again you can check that CDC is activated:
SELECT [name], is_tracked_by_cdc
FROM sys.tables
You can also see that a new table has been created:
And 2 jobs:
And you can also use the following command to view the configuration of CDC:
EXEC sys.sp_cdc_help_change_data_capture
Now any changes in the Employee
table (insert, update, delete) will automatically add one or more rows to the table cdc.dbo_Employee_CT
, without any modification of the requests.
#Querying
Let's start by doing some operations on the table:
INSERT INTO Employee (FirstName, LastName, hid) VALUES ('John', 'Doe', '/1/')
UPDATE Employee
SET FirstName = 'Jane'
WHERE Id = 20
UPDATE Employee
SET hid = '/2/'
WHERE Id = 20
You can see that the changes are directly saved in raw form in the table created by CDC:
SELECT * FROM [Sample].[cdc].[dbo_Employee_CT]
In this table we find:
- The operation performed (__ $ operation)
- 1 = Delete
- 2 = Insert
- 3 = Value of the line before an update
- 4 = Value of the line after an Update
- A mask showing the columns that have been modified based on their ordinal
- 0x0F ⇒ column 1, 2, 3 and 4
- 0x02 ⇒ column 2
- 0x08 ⇒ column 4
- The last columns correspond to the values of the row in the table
In the example above, the 3 operations are:
- Inserting the line
- Update of the FirstName column from John to Jane
- Update the hid column from 0x58 to 0x68
SQL Server provides some functions to query this table more easily:
cdc.fn_cdc_get_all_changes_<capture_instance>(from_lsn, to_lsn, row_filter_options)
(doc): Returns all changes between 2 dates for all rowscdc.fn_cdc_get_net_changes_<capture_instance>(from_lsn, to_lsn, row_filter_options)
(doc): Returns only one change per row in the table. This change represents the final content of the line. For example if we make an insert and three update, the result will be the line resulting from these four operations.
These operations do not use the date format but LSNs. To obtain these we can use the following functions:
sys.fn_cdc_get_min_lsn('dbo_Employee')
sys.fn_cdc_get_max_lsn()
sys.fn_cdc_map_time_to_lsn('smallest greater than', GETDATE ())
sys.fn_cdc_map_time_to_lsn ('largest less than or equal', GETDATE ())
To illustrate here is an example listing the final changes to the table during the last 30 minutes:
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_Employee(
sys.fn_cdc_map_time_to_lsn('smallest greater than', DATEADD(minute, -30, GETDATE())),
sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE()),
'all')
Another allowing to list all the changes of the table since the beginning:
SELECT *
FROM cdc.fn_cdc_get_all_changes_dbo_Employee(
sys.fn_cdc_get_min_lsn('dbo_Employee'),
sys.fn_cdc_get_max_lsn(),
N'all');
Capturing data can quickly blow up the size of the database. This is why SQL Server adds the cdc. <database>_cleanup
job by default. This job deletes the contents of the tables regularly. This task is configured by default to run every day and keep the contents of the last 3 days. For this it uses the procedure sys.sp_cdc_cleanup_change_table
.
We took a tour of the Change Data Capture feature. As we have seen, this feature keeps the history of changes made to the data without having to code anything. Indeed a few lines of configuration are enough to get the desired result.
Do you have a question or a suggestion about this post? Contact me!