Protection of Sensitive Data During the Development Process Using Amazon Aurora
The Challenge
AGE Technology faces the challenge of ensuring the continuous delivery of new features. To achieve this, they release new versions every two weeks, subjecting them to rigorous security testing, common and customized functionalities for each client, in order to ensure the effectiveness of each delivery and validate migration scripts.
To facilitate this process and ensure that tests are conducted with an appropriate volume of data, the company uses a replica of production databases containing all sensitive data, properly anonymized for the testing environment.
Historically, this process was done manually, starting with the restoration of a 5TB snapshot, which took more than 12 hours to complete. Subsequently, SQL scripts were executed to clean confidential data directly from the restored database. Only after all validations were completed the database was released to be used in the testing environment, which could often take up to 3 business days. Another significant point is that this environment required a large amount of additional storage space, resulting in higher costs.
The Solution
CloudDog proposed a two-phase strategy for AGE Technology.
In the first phase, the migration of the RDS MySQL Community database to Amazon Aurora MySQL was carried out with the aim of improving the database's scalability, enabling the creation of read replicas to balance the load, increase fault tolerance, and make use of the "Fast Database Cloning," an exclusive feature of Amazon Aurora.
In the second phase, the manual preparation process of the testing database was automated using AWS Step Functions, which orchestrates the execution of Amazon Aurora cluster cloning APIs and creates a new instance in a dedicated VPC to perform the data cleaning process.
Subsequently, dozens of tasks are executed using AWS Batch, which removes all sensitive data such as employee information, company data, user records, etc. When completed, access to the database is granted by creating a new instance in a VPC accessible only by the AWS testing account.
Results
The preparation process of the testing database is scheduled to start every Sunday night and is carried out in isolation from the production environment, within a sandbox structure to ensure data security.
After the database sanitization process is completed, the testing environment is made available predictably every Monday morning, allowing the quality assurance team to conduct all tests before the release of each new version.
With the use of Amazon Aurora's fast database cloning feature, the cluster's storage in the new environment initially points to the same storage cells as the production environment. As the data cleaning process progresses, new storage cells are created. In the end, the difference in database size is only 750GB, or about 15% of the original size, significantly reducing storage costs, as charges apply only to the 750GB exceeding the storage in the new environment.