Creating a connection between an Amazon RDS instance and AWS Glue enables Glue to access data stored in the RDS database. This allows you to use Glue’s data catalog and ETL (Extract, Transform, Load) capabilities to process and analyze data from your RDS database instance.
First we need to launch an RDS Instance with public access for this.
Any database engine can be used for this. In this blog, My-SQL Database engines is used.
Once the Database is launched, it will appear as follows in the RDS Dashboard.
To create connection, navigate to AWS Glue Studio Dashboard. In left pane, click on “Connections” and hit “Create Connection”.
Note : Make sure you are in the same region where the Database is created.
Provide ‘Name’ to the Connection, select ‘Connection Type’, ‘Database Engine’
As shown in the figure below
Under ‘Connection Access’ section, the name of DB instance will appear in the drop down, select it and provide the admin username and password, as shown below.
And ‘Create Connection’
Note : The name of my database is ‘test_sql’
Once created, it will appear in the Connections section.
Select the connection, click on ‘Actions’ and edit the connection.
Scroll down to ‘Network Options’ and select the VPC, Subnet and Security Group where the RDS Instance is launched. Save the settings.
Now Navigate to VPC Dashboard, select ‘Endpoint’ from left pane and click on ‘create endpoint’.
Enter details as follows :
Name tag : s3-endpoint
Service category : AWS Services
Service : enter ‘S3’ and select item with Type “Gateway”
Select VPC in which RDS is launched
Policy : ‘Full access’
And create the endpoint.
Navigate to ‘NAT Gateway’ from left pane and create one. Provide details as follows :
- Name : ‘S3-RDS-Nat’
- Select Subnet from drop down
- Connectivity type : “Public”
- Allocate Elastic IP
And create NAT Gateway.
Now from VPC dashboard, under ‘Endpoint’ from left pane, select the endpoint created, click on the VPC as shown in the figure,
Go to Route Tables,
Edit routes, as shown in the figure,
Click on ‘Add Route’, Destination as ‘0.0.0.0/0’, Target as Nat Gateway created in previous steps and save changes.
Create an IAM role for AWS Glue, give sufficient access (Admin access or RDS Full access – as per your requirement).
Select the Connection created and under ‘Actions’, test the connection.
Before testing select the IAM Role created and test the connection.
On successful connection, message will appear as shown in the figure
We have successfully established the connection between AWS Glue and RDS Instance.