[SEO Subhead]
This Guidance demonstrates an automated approach for generating rule recommendations to match, link, and enhance related records using AWS Entity Resolution rule-based matching. It showcases an AWS Glue notebook that streamlines the process of creating effective matching rules. The Guidance reads input data from Amazon S3, performs data quality analysis, and harnesses the power of a large language model (LLM) on Amazon Bedrock to produce customized rule recommendations. Each recommendation comes with accompanying reasoning, providing insights into the suggested rules. Furthermore, the Guidance implements a sampling approach to test the generated rules and resolve entities.
Please note: [Disclaimer]
Architecture Diagram

-
Overview
-
Incremental rule-based workflow
-
Overview
-
This architecture diagram shows an overview of how to generate rule recommendations using an LLM hosted on Amazon Bedrock and an AWS Glue notebook and how to use these rules in a rule-based matching workflow in AWS Entity Resolution.
Step 1
Load your input dataset (CSV/parquet) in an Amazon Simple Storage Service (Amazon S3) bucket and use an AWS Glue Crawler to create an AWS Glue table within the AWS Glue Data Catalog.Step 2
Create a schema mapping in AWS Entity Resolution using the AWS Glue table as the source.Step 3
Run the notebook in AWS Glue, which uses the AWS Entity Resolution schema mapping to understand the shape of the data. The notebook reads the data from Amazon S3 and generates data quality metrics. It feeds these metrics to an LLM hosted on Amazon Bedrock. The LLM recommends rules to apply to an AWS Entity Resolution matching workflow for resolving entities.Step 4
The recommended rules generated by the AWS Glue notebook are used to create a rule-based matching workflow within AWS Entity Resolution.
Step 5
An AWS Step Functions workflow orchestrates the execution of the rule-based matching workflow to process the incremental source data. -
Incremental rule-based workflow
-
This architecture diagram shows how to run an incremental rule-based matching workflow in AWS Entity Resolution using an AWS Step Functions workflow.
Step 1
Create a schedule in Amazon EventBridge to trigger Step Functions at a desired frequency.Step 2
Step Functions triggers an AWS Glue extract, transform, load (ETL) job that pre-processes the incremental source data and prepares it for AWS Entity Resolution rule-based matching workflow.
Step 3
An AWS Lambda function triggers the rule-based matching workflow in AWS Entity Resolution. The workflow reads the incremental data from the source Amazon S3 bucket and processes it.Step 4
The Lambda function checks the status of the matching workflow running in AWS Entity Resolution until the job status changes to Completed.
Step 5
Upon completion, the AWS Entity Resolution matching workflow writes the output to an S3 output bucket.
Step 6
The AWS Glue post-processing ETL job reads the output from AWS Entity Resolution and writes it to an Amazon S3 table. The Amazon S3 table is chosen as the destination because it supports Atomicity, Consistency, Isolation, Durability (ACID) transactions.
Step 7
The AWS Entity Resolution incremental matching workflow has the capability to merge or split records. Given this ability, a datastore that supports ACID transactions is an ideal choice to help ensure data integrity and consistency.
Get Started

Deploy this Guidance
Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
AWS Glue is a managed service that runs workloads and provides monitoring metrics for jobs. It offers fault tolerance with support for retries in case of failures. AWS Glue Crawler automates the discovery of data schematics. These features create a scalable, fault-tolerant system that provides insights into runtime metrics of jobs.
-
Security
AWS Identity and Access Management (IAM) policies are scoped down to the minimum permissions required for services to function properly. Data stored in Amazon S3 uses encryption at rest. These measures limit unauthorized access to resources and protect data integrity. By implementing tight access controls and encrypting data at rest, the Guidance enhances overall security posture and helps meet compliance requirements.
-
Reliability
As managed services, AWS Glue, AWS Entity Resolution, Amazon Bedrock, and Step Functions reduce the operational burden of maintaining reliability, allowing the system to recover from failures automatically. These services support retries for recovery from failures and integrate with Amazon CloudWatch to provide operational insights.
-
Performance Efficiency
AWS Glue offers a serverless architecture that scales compute resources up or down based on workload demands. It provides different instance types for users to choose based on their specific workload requirements. AWS Glue connects with other AWS services through AWS networking services and can run within a virtual private cloud (VPC). This flexibility in resource selection and automatic scaling helps ensure that the system can efficiently handle varying workload intensities.
-
Cost Optimization
This Guidance uses managed services that follow a pay-as-you-go pricing model, meaning you only pay for the resources you use. AWS Glue is serverless, providing scaling capabilities that help optimize costs. AWS Entity Resolution charges based on the volume of ingested data. Amazon S3 costs depend on data storage and access patterns. Step Functions charges based on the number of state transitions. This usage-based pricing across services helps ensure that costs align closely with actual resource consumption.
-
Sustainability
As a serverless service, AWS Glue only consumes resources when actively processing data. It offers features like data partitioning and compression, which reduce storage and compute resource requirements for data processing pipelines. AWS Glue offers automatic scaling based on workload helps optimize resource utilization and reduce energy consumption.
Related Content

[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.