AWS Database Blog

Data masking and performance improvements in AWS DMS 3.5.4

We are excited to announce the availability of new features in AWS Database Migration Service (AWS DMS) replication engine version 3.5.4. This release includes two major enhancements: data masking for enhanced security and improved data validation performance.

In this post, we deep dive into these two features. Refer to the release notes to see a list of all the new features available in this version.

Data masking for enhanced security

A data masking capability was requested by our customers to enhance data security during migrations, enabling you to transform sensitive data at the column level during migration and helping you comply with data protection regulations like GDPR. With AWS DMS, you can now create copies of data that redacts information that you need to protect at a column level.

One of the biggest concerns for our customers during database migrations is the secure handling of sensitive information, such as account numbers, phone numbers, and email addresses. With AWS DMS 3.5.4, we have implemented three flexible data transformation rules:

  • Digits Mask
  • Digits Randomize
  • Hashing Mask

To illustrate these transformation rules, we migrate a table called “EMPLOYEES” from an Amazon RDS for Oracle instance to Amazon RDS for PostgreSQL instance. Complete the following steps:

  1. Use the following table DDL on your source (Oracle) instance to create the EMPLOYEES table:
CREATE TABLE EMPLOYEES (
    EMPLOYEE_ID NUMBER(6) PRIMARY KEY,
    FIRST_NAME VARCHAR2(50) NOT NULL,
    LAST_NAME VARCHAR2(50) NOT NULL,
    EMAIL VARCHAR2(100) UNIQUE,
    PHONE_NUMBER VARCHAR2(20),
    HIRE_DATE DATE NOT NULL,
    JOB_TITLE VARCHAR2(50),
    SALARY NUMBER(10,2),
    DEPARTMENT_ID NUMBER(4),
    MANAGER_ID NUMBER(6),
    ACCOUNT_NUMBER VARCHAR2(20),
    CREATED_DATE DATE DEFAULT SYSDATE
  
);
CREATE SEQUENCE emp_seq
    START WITH 1
    INCREMENT BY 1
    NOCACHE
    NOCYCLE;
  1. Insert the table EMPLOYEES with a few records.
INSERT INTO EMPLOYEES (EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_TITLE, SALARY, DEPARTMENT_ID, MANAGER_ID,ACCOUNT_NUMBER)
VALUES (emp_seq.NEXTVAL, 'John', 'Smith', '[email protected]', '555-0101', DATE '2020-01-15', 'CEO', 150000, 10, NULL,'456-123-456-789');

INSERT INTO EMPLOYEES (EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_TITLE, SALARY, DEPARTMENT_ID, MANAGER_ID,ACCOUNT_NUMBER)
VALUES (emp_seq.NEXTVAL, 'Sarah', 'Johnson', '[email protected]', '555-0102', DATE '2020-03-20', 'IT Director', 120000, 20, 1,'666-000-111-222');

INSERT INTO EMPLOYEES (EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_TITLE, SALARY, DEPARTMENT_ID, MANAGER_ID,ACCOUNT_NUMBER)
VALUES (emp_seq.NEXTVAL, 'Michael', 'Brown', '[email protected]', '555-0103', DATE '2021-02-10', 'Software Engineer', 85000, 20, 2,'777-333-444-555');

INSERT INTO EMPLOYEES (EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_TITLE, SALARY, DEPARTMENT_ID, MANAGER_ID,ACCOUNT_NUMBER)
VALUES (emp_seq.NEXTVAL, 'Emily', 'Davis', '[email protected]', '555-0104', DATE '2021-06-15', 'HR Manager', 75000, 30, 1,'899-987-654-321');

INSERT INTO EMPLOYEES (EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_TITLE, SALARY, DEPARTMENT_ID, MANAGER_ID,ACCOUNT_NUMBER)
VALUES (emp_seq.NEXTVAL, 'David', 'Wilson', '[email protected]', '555-0105', DATE '2022-01-20', 'Software Engineer', 80000, 20, 2,'567-111-222-333');
  1. Create an AWS DMS task using the option “Migrate” or “Migrate and replicate”.
  1. Set up the AWS DMS task with the following table mapping rule JSON. We mask the column ACCOUNT_NUMBER with the character #, the column PHONE_NUMBER with random numbers, and the column EMAIL with a hash. We have also used transformation rules to convert everything in lowercase which is optional.
{
    "rules": [
        {
            "rule-type": "transformation",
            "rule-id": "171087779",
            "rule-name": "171087779",
            "rule-target": "column",
            "object-locator": {
                "schema-name": "ADMIN",
                "table-name": "EMPLOYEES",
                "column-name": "ACCOUNT_NUMBER"
            },
            "rule-action": "data-masking-digits-mask",
            "value": "*",
            "old-value": null
        },
        {
            "rule-type": "transformation",
            "rule-id": "171057753",
            "rule-name": "171057753",
            "rule-target": "column",
            "object-locator": {
                "schema-name": "ADMIN",
                "table-name": "EMPLOYEES",
                "column-name": "PHONE_NUMBER"
            },
            "rule-action": "data-masking-digits-randomize",
            "value": null,
            "old-value": null
        },
        {
            "rule-type": "transformation",
            "rule-id": "169940283",
            "rule-name": "169940283",
            "rule-target": "column",
            "object-locator": {
                "schema-name": "ADMIN",
                "table-name": "EMPLOYEES",
                "column-name": "EMAIL"
            },
            "rule-action": "data-masking-hash-mask",
            "value": null,
            "old-value": null
        },
        {
            "rule-type": "transformation",
            "rule-id": "169926638",
            "rule-name": "169926638",
            "rule-target": "column",
            "object-locator": {
                "schema-name": "%",
                "table-name": "%",
                "column-name": "%"
            },
            "rule-action": "convert-lowercase",
            "value": null,
            "old-value": null
        },
        {
            "rule-type": "transformation",
            "rule-id": "169918368",
            "rule-name": "169918368",
            "rule-target": "table",
            "object-locator": {
                "schema-name": "%",
                "table-name": "%"
            },
            "rule-action": "convert-lowercase",
            "value": null,
            "old-value": null
        },
        {
            "rule-type": "transformation",
            "rule-id": "169908300",
            "rule-name": "169908300",
            "rule-target": "schema",
            "object-locator": {
                "schema-name": "%"
            },
            "rule-action": "convert-lowercase",
            "value": null,
            "old-value": null
        },
        {
            "rule-type": "selection",
            "rule-id": "169895493",
            "rule-name": "169895493",
            "object-locator": {
                "schema-name": "ADMIN",
                "table-name": "EMPLOYEES"
            },
            "rule-action": "include",
            "filters": []
        }
    ]
}

In the following sample output, we can see the output on the PostgreSQL instance with the data masking:

dmsdb=> select employee_id,phone_number,email,account_number from admin.employees;
 employee_id | phone_number |                              email                               | account_number
-------------+--------------+------------------------------------------------------------------+-----------------
          20 | 685-9897     | FDC2A4ABC53872D0F934B5614DDC312DAA165895065BB00A5986849AADE8C322 | ***-***-***-***
          21 | 579-3441     | 3A4FA9FE0AA0A2B468EDF13A29A75C4E3A20650243143D834D7898D40AA0FA2F | ***-***-***-***
          22 | 156-9277     | 0985B1D142A4067E397DF5AB56B03E3BF4857FB1F229CB39B49CE06E46B7AA98 | ***-***-***-***
          23 | 238-5321     | D07EAB207F4F1366C1E35B35E33F7842FF3EEB2C80E47FDAEB0900B49EE77697 | ***-***-***-***
          24 | 536-1233     | 3D438AF13A839ACDC24FD0CE8EB8C8C45083B90A31DF3583A88131614086C3B9 | ***-***-***-***
(5 rows)

The following image shows the sample output on Oracle for comparison.

Oracle output for comparison

In the preceding example, we have shown how you can use the data masking capability to hide sensitive information. Refer to Using data masking to hide sensitive information for further information.

Enhanced data validation performance

Maintaining data integrity is crucial during any database migration, but it can often be a time-consuming and resource-intensive process. AWS DMS 3.5.4 addresses this challenge with the enhanced data validation feature, which uses innovative techniques like fast partition validation to streamline the validation process.

Some of the key benefits of enhanced data validation are:

  • Redistribution of resource usage from a replication instance to the AWS DMS source and target endpoint
  • Potential decrease of network usage
  • Efficient for wide tables without LOB data types

The enhanced data validation feature is now available for specific AWS DMS migration paths, including Oracle to PostgreSQL, SQL Server to PostgreSQL, Oracle to Oracle, and SQL Server to SQL Server. To use this feature, make sure your environment meets the prerequisites.

You can confirm AWS DMS is using enhanced data validation by reviewing the Amazon CloudWatch logs, which will show messages like the following:

2025-02-12T21:23:26 [VALIDATOR ]I: Fast validation of table 'dbo'.'customer' : partition : 178 (partition_validator.c:1001)

To quantify the performance improvements, we conducted benchmarking using HammerDB with the settings as shown in the following screenshot.

HammerDB settings

We created a full load and change data capture (CDC) task with validation disabled to migrate approximately 93 million records (15 GB in size) from an Amazon Relational Database Service (Amazon RDS) for SQL Server to Amazon Aurora PostgreSQL-Compatible Edition across a total of nine tables as a baseline.

We then ran two validation-only tasks: one on AWS DMS 3.5.3 and one on AWS DMS 3.5.4 using r6i.xlarge instances. To speed up validation, we increased PartitionSize to 100,000 and ThreadCount to 15:

"ValidationSettings": {
"PartitionSize": 100000,
"ThreadCount": 15,
"ValidationOnly": true
}

The following screenshots show resource consumption on the AWS DMS replication instance running on engine version 3.5.4.

CPU utilization on engine version 3.5.4

Task memory usage on engine version 3.5.4

The following screenshots show resource consumption on the AWS DMS replication instance running running on engine version 3.5.3.

CPU utilization on engine version 3.5.3

Task memory usage on engine version 3.5.3

We can see a 91% reduction in TaskMemoryUsage of a validation-only task when run on AWS DMS 3.5.4 as compared to AWS DMS 3.5.3, and a 95% reduction in the CPU utilization of the underlying AWS DMS replication instance. For customers who want to run a separate validation-only task, you can use this feature and use the compute and memory of the AWS DMS replication instance in a more resourceful manner.

Conclusion

In this post, we discussed the transformation rules for data masking and enhanced data validation in AWS DMS 3.5.4. By implementing data masking features, you can ensure sensitive information remains protected throughout your database migration journey. With enhanced data validation feature, you have all the benefits of running validation with less resource consumption on DMS replication instance. Try these features and let us know how it helped your use case in the comments section.


About the authors

Suchindranath HegdeSuchindranath Hegde is a Senior Data Migration Specialist Solutions Architect at Amazon Web Services. He works with our customers to provide guidance and technical assistance on data migration to the AWS Cloud using AWS DMS.

Mahesh KansaraMahesh Kansara is a database engineering manager at Amazon Web Services. He closely works with development and engineering teams to improve the migration and replication service. He also works with our customers to provide guidance and technical assistance on various database and analytical projects, helping them improving the value of their solutions when using AWS.

Leonid SlepukhinLeonid Slepukhin is a Senior Database Engineer with the Database Migration Service (DMS) team at Amazon Web Services. He works on developing core features for AWS DMS and specialize in helping both internal and external customers resolve complex database migration and replication challenges. His focus areas include enhancing DMS capabilities and providing technical expertise to ensure successful database migrations to AWS cloud.

Sridhar RamasubramanianSridhar Ramasubramanian is a database engineer with the AWS Database Migration Service team. He works on improving the DMS service to better suit the needs of AWS customers.