This Guidance demonstrates an automated approach for generating rule recommendations to match, link, and enhance related records using AWS Entity Resolution rule-based matching. It showcases an AWS Glue notebook that streamlines the process of creating effective matching rules. The Guidance reads input data from Amazon S3, performs data quality analysis, and harnesses the power of a large language model (LLM) on Amazon Bedrock to produce customized rule recommendations. Each recommendation comes with accompanying reasoning, providing insights into the suggested rules. Furthermore, the Guidance implements a sampling approach to test the generated rules and resolve entities.

Please note: [Disclaimer]

Architecture Diagram

Download the architecture diagram PDF 
  • Overview
  • This architecture diagram shows an overview of how to generate rule recommendations using an LLM hosted on Amazon Bedrock and an AWS Glue notebook and how to use these rules in a rule-based matching workflow in AWS Entity Resolution.

  • Incremental rule-based workflow
  • This architecture diagram shows how to run an incremental rule-based matching workflow in AWS Entity Resolution using an AWS Step Functions workflow.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • AWS Glue is a managed service that runs workloads and provides monitoring metrics for jobs. It offers fault tolerance with support for retries in case of failures. AWS Glue Crawler automates the discovery of data schematics. These features create a scalable, fault-tolerant system that provides insights into runtime metrics of jobs.

    Read the Operational Excellence whitepaper 
  • AWS Identity and Access Management (IAM) policies are scoped down to the minimum permissions required for services to function properly. Data stored in Amazon S3 uses encryption at rest. These measures limit unauthorized access to resources and protect data integrity. By implementing tight access controls and encrypting data at rest, the Guidance enhances overall security posture and helps meet compliance requirements.

    Read the Security whitepaper 
  • As managed services, AWS Glue, AWS Entity Resolution, Amazon Bedrock, and Step Functions reduce the operational burden of maintaining reliability, allowing the system to recover from failures automatically. These services support retries for recovery from failures and integrate with Amazon CloudWatch to provide operational insights.

    Read the Reliability whitepaper 
  • AWS Glue offers a serverless architecture that scales compute resources up or down based on workload demands. It provides different instance types for users to choose based on their specific workload requirements. AWS Glue connects with other AWS services through AWS networking services and can run within a virtual private cloud (VPC). This flexibility in resource selection and automatic scaling helps ensure that the system can efficiently handle varying workload intensities.

    Read the Performance Efficiency whitepaper 
  • This Guidance uses managed services that follow a pay-as-you-go pricing model, meaning you only pay for the resources you use. AWS Glue is serverless, providing scaling capabilities that help optimize costs. AWS Entity Resolution charges based on the volume of ingested data. Amazon S3 costs depend on data storage and access patterns. Step Functions charges based on the number of state transitions. This usage-based pricing across services helps ensure that costs align closely with actual resource consumption.

    Read the Cost Optimization whitepaper 
  • As a serverless service, AWS Glue only consumes resources when actively processing data. It offers features like data partitioning and compression, which reduce storage and compute resource requirements for data processing pipelines. AWS Glue offers automatic scaling based on workload helps optimize resource utilization and reduce energy consumption.

    Read the Sustainability whitepaper 
[Content Type]

[Title]

This [blog post/e-book/Guidance/sample code] demonstrates how [insert short description].

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?