The AWS DataProcessing MCP server provides AI code assistants with comprehensive data processing tools and real-time pipeline visibility across AWS Glue and Amazon EMR-EC2. This integration equips large language models (LLMs) with essential data engineering capabilities and contextual awareness, enabling AI code assistants to streamline data processing workflows through intelligent guidance — from initial data discovery and cataloging through complex ETL pipeline orchestration and big data analytics optimization. Integrating the DataProcessing MCP server into AI code assistants transforms data engineering workflows across all phases, from simplifying data catalog management with automated schema discovery and data quality validation. Additionally, it streamlines ETL job creation with intelligent code generation and best practice recommendations. It accelerates big data processing through automated EMR cluster provisioning and workload optimization. Finally, it enhances troubleshooting through intelligent debugging tools and operational insights. All of this simplifies complex data operations through natural language interactions in AI code assistants.
This server provides the following tools for AI assistants:
Manage Amazon EMR Serverless job runs for data processing workloads.
Add a new inline policy to an IAM role for data processing services.
Create a new IAM role for data processing services like Glue, EMR, Athena.
Upload Python code content directly to S3 buckets.
Manage AWS Glue Data Catalog databases: create, delete, get, list, update.
Manage AWS Glue Data Catalog tables: create, delete, get, list, update, search.
Manage AWS Glue Data Catalog connections for data sources.
Manage AWS Glue Data Catalog partitions: create, delete, get, list, update.
Manage AWS Glue Interactive Sessions for Spark and Ray workloads.
Execute and manage code statements within Glue Interactive Sessions.
Orchestrate complex ETL activities through visual workflows.
Automate workflow and job execution with scheduled or event-based triggers.
Manage Amazon EMR clusters with comprehensive control over cluster lifecycle.
Manage Amazon EMR EC2 instances with instance fleets and groups.
Manage Amazon EMR steps for processing data on EMR clusters.
Manage Amazon EMR Serverless applications with lifecycle control.
Execute and manage AWS Athena SQL queries.
Manage saved SQL queries in AWS Athena.
Manage AWS Athena data catalogs with multiple catalog types.
Manage AWS Athena databases and tables for data discovery.
Manage AWS Athena workgroups for query execution environments.
Manage AWS Glue Usage Profiles for resource allocation and cost management.
Manage AWS Glue Security Configurations for data encryption.
Manage AWS Glue catalog encryption settings.
Manage resource policies for AWS Glue catalogs, databases and tables.
Manage AWS Glue ETL jobs and job runs.
Manage AWS Glue crawlers to discover and catalog data sources.
Manage AWS Glue classifiers to determine data formats and schemas.
Manage AWS Glue crawler schedules and monitor performance metrics.
Get all policies attached to an IAM role.
Get all IAM roles that can be assumed by a specific AWS service.
List S3 buckets with usage statistics for data processing.
Analyze S3 bucket usage patterns for data processing services.
Manage AWS Glue Data Catalog operations including import.