Deploying Java AI Services to AWS: ECS, Lambda, Bedrock & CloudFormation

Java AI services have deployment characteristics that differ from standard Spring Boot apps: they need more memory (JVM + model client libraries + concurrent LLM calls), longer timeouts (LLM calls can take 15+ seconds), and careful secrets management (API keys must never appear in environment variables visible in the ECS console). This final article in the Java & Spring AI series covers the complete production deployment on AWS — from container sizing to CloudFormation templates to AWS Bedrock as a managed alternative to third-party APIs.

Deployment Option Comparison

Option	Best For	Key Consideration
ECS Fargate	Long-running services, streaming responses, websockets	Always-on cost; no cold start; 15min+ request support
Lambda + SnapStart	Bursty, infrequent AI calls; async processing	15min max timeout; SnapStart eliminates cold start for Java
AWS Bedrock	Replace Anthropic/OpenAI with AWS-managed models	IAM auth (no API keys); stays within AWS; data residency
EC2 + Auto Scaling	GPU inference or extremely high throughput	Overkill for API-based LLM calls; use only for on-prem models

ECS Fargate: Container Sizing for Java AI

Java AI services need more memory than typical Spring Boot apps. The JVM baseline, Spring AI libraries, embedded vector client, and concurrent LLM responses in flight add up quickly:

Component	Memory
JVM base (Java 21, G1GC)	~200 MB
Spring Boot + Spring AI + dependencies	~350 MB
PGVector client + connection pool	~50 MB
Concurrent LLM responses in flight (10 × 8k tokens)	~400 MB
Headroom (GC, metaspace, thread stacks)	~200 MB
Recommended minimum	2 GB

Start with 1 vCPU / 2 GB for development. Scale to 2 vCPU / 4 GB for production handling 10+ concurrent AI requests.

ECS Task Definition (CloudFormation)

# cloudformation/ai-service.yaml
AiServiceTaskDefinition:
  Type: AWS::ECS::TaskDefinition
  Properties:
    Family: ai-service
    NetworkMode: awsvpc
    RequiresCompatibilities: [FARGATE]
    Cpu: '2048'         # 2 vCPU
    Memory: '4096'       # 4 GB
    ExecutionRoleArn: !GetAtt EcsExecutionRole.Arn
    TaskRoleArn: !GetAtt AiServiceTaskRole.Arn
    ContainerDefinitions:
      - Name: ai-service
        Image: !Sub '${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/ai-service:${ImageTag}'
        PortMappings:
          - ContainerPort: 8080
        Environment:
          - Name: ENVIRONMENT
            Value: !Ref Environment
          - Name: JAVA_OPTS
            # -XX:MaxRAMPercentage=75 = 3GB heap in a 4GB container
            Value: "-XX:MaxRAMPercentage=75.0 -XX:+UseG1GC -XX:+UseStringDeduplication"
        Secrets:
          # Keys loaded from Secrets Manager at task start — never in env
          - Name: ANTHROPIC_API_KEY
            ValueFrom: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:ai-service/anthropic-api-key'
          - Name: OPENAI_API_KEY
            ValueFrom: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:ai-service/openai-api-key'
        HealthCheck:
          Command: [CMD, curl, -f, "http://localhost:8080/actuator/health"]
          Interval: 30
          Timeout: 5
          Retries: 3
          StartPeriod: 60   # Java needs time to start; allow 60s before health checks
        LogConfiguration:
          LogDriver: awslogs
          Options:
            awslogs-group: !Ref AiServiceLogGroup
            awslogs-region: !Ref AWS::Region
            awslogs-stream-prefix: ai-service

Secrets Manager: Zero Plaintext API Keys

# Store the API key — run once during provisioning
aws secretsmanager create-secret \
  --name ai-service/anthropic-api-key \
  --secret-string "sk-ant-..." \
  --region us-east-1

# IAM policy for the ECS task role — least privilege
AiServiceTaskRole:
  Type: AWS::IAM::Role
  Properties:
    Policies:
      - PolicyName: SecretsAccess
        PolicyDocument:
          Statement:
            - Effect: Allow
              Action: [secretsmanager:GetSecretValue]
              Resource:
                - !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:ai-service/*'

Why Secrets Manager, not SSM Parameter Store?

Both work, but Secrets Manager has automatic rotation support, a 90-day rotation reminder, and the ability to version secrets. For API keys that cost money if leaked, Secrets Manager's audit trail via CloudTrail and automatic rotation scheduling are worth the small additional cost ($0.40/secret/month vs. free for SSM Standard).

Lambda + SnapStart for Infrequent AI Calls

Lambda's cold start problem (5–15 seconds for Java) is eliminated by SnapStart, which restores from a pre-warmed snapshot:

# Lambda function with SnapStart (CloudFormation)
AiLambdaFunction:
  Type: AWS::Lambda::Function
  Properties:
    FunctionName: ai-classifier
    Runtime: java21
    Handler: com.example.ai.LambdaHandler::handleRequest
    MemorySize: 2048      # 2GB for Java AI Lambda
    Timeout: 900          # 15min max — LLM calls can be slow
    SnapStart:
      ApplyOn: PublishedVersions  # enables SnapStart
    Environment:
      Variables:
        ENVIRONMENT: !Ref Environment
    VpcConfig:                   # if accessing RDS/ElastiCache
      SecurityGroupIds: [!Ref LambdaSecurityGroup]
      SubnetIds: !Ref PrivateSubnets

// Lambda handler — implement CRaC for SnapStart compatibility
import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;

@SpringBootApplication
public class LambdaHandler implements
        RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent>,
        Resource {

    private static ConfigurableApplicationContext context;

    static {
        context = SpringApplication.run(LambdaHandler.class);
        Core.getGlobalContext().register(new LambdaHandler());
    }

    @Override
    public void beforeCheckpoint(Context<? extends Resource> ctx) {
        // Close connections before snapshot is taken
        DataSource ds = context.getBean(DataSource.class);
        ((HikariDataSource) ds).getHikariPoolMXBean().softEvictConnections();
    }

    @Override
    public void afterRestore(Context<? extends Resource> ctx) {
        // Re-establish connections after snapshot restore
        // Spring will handle this automatically on first use
    }

    @Override
    public APIGatewayProxyResponseEvent handleRequest(
            APIGatewayProxyRequestEvent event, Context lambdaContext) {
        var handler = context.getBean(AiRequestHandler.class);
        return handler.handle(event);
    }
}

AWS Bedrock: Managed Models Without Third-Party API Keys

AWS Bedrock provides Claude (Anthropic), Llama (Meta), Titan (Amazon), and others via IAM authentication — no Anthropic API key required. All traffic stays within AWS, which matters for data residency and compliance:

# application.yml — Spring AI Bedrock config
spring:
  ai:
    bedrock:
      aws:
        region: us-east-1
        # No API key — uses IAM role of ECS task / Lambda execution role
      anthropic:
        chat:
          enabled: true
          model: anthropic.claude-sonnet-4-6-20251001-v2:0
          options:
            max-tokens: 2048
            temperature: 0.7


<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-bedrock-converse-spring-boot-starter</artifactId>
</dependency>

# IAM policy for ECS task — grants Bedrock model invocation
BedrockPolicy:
  Type: AWS::IAM::ManagedPolicy
  Properties:
    PolicyDocument:
      Statement:
        - Effect: Allow
          Action:
            - bedrock:InvokeModel
            - bedrock:InvokeModelWithResponseStream
          Resource:
            - "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-*"
            - "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-*"

Auto-Scaling: AI-Aware Scaling Policy

Standard CPU-based auto-scaling doesn't reflect AI workload well — threads spend most time waiting for the LLM API, keeping CPU low. Scale on custom metrics instead:

# Scale on pending AI requests in the queue (CloudWatch custom metric)
AiServiceScalingPolicy:
  Type: AWS::ApplicationAutoScaling::ScalingPolicy
  Properties:
    PolicyName: AiQueueDepthScaling
    PolicyType: TargetTrackingScaling
    TargetTrackingScalingPolicyConfiguration:
      CustomizedMetricSpecification:
        Namespace: AiService/production
        MetricName: ai.requests.pending
        Statistic: Average
      TargetValue: 10.0    # scale out when avg pending requests > 10
      ScaleInCooldown: 300  # 5min before scaling in (LLM calls are slow)
      ScaleOutCooldown: 60

VPC Configuration for AI Services

# ECS tasks in private subnets; NAT Gateway for outbound LLM API calls
AiServiceSecurityGroup:
  Type: AWS::EC2::SecurityGroup
  Properties:
    GroupDescription: AI Service
    VpcId: !Ref VpcId
    SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: 8080
        ToPort: 8080
        SourceSecurityGroupId: !Ref AlbSecurityGroupId
    SecurityGroupEgress:
      - IpProtocol: tcp    # HTTPS to Anthropic/OpenAI APIs
        FromPort: 443
        ToPort: 443
        CidrIp: 0.0.0.0/0
      - IpProtocol: tcp    # PostgreSQL
        FromPort: 5432
        ToPort: 5432
        DestinationSecurityGroupId: !Ref DatabaseSecurityGroupId
      - IpProtocol: tcp    # Redis
        FromPort: 6379
        ToPort: 6379
        DestinationSecurityGroupId: !Ref RedisSecurityGroupId

Production Deployment Checklist

Container memory — min 2 GB for Java AI; 4 GB for production with concurrent calls
JVM flags — -XX:MaxRAMPercentage=75.0 prevents OOM; -XX:+UseG1GC for balanced pauses
Health check start period — set StartPeriod: 60; Java takes 30–45s to start
Task timeout — ALB idle timeout must exceed max LLM latency (set to 300s)
Secrets Manager — API keys as Secrets; never in environment variables visible in console
SnapStart — mandatory for Java Lambda; eliminates 5–15s cold start
Bedrock consideration — if data residency or compliance matters, Bedrock keeps everything in AWS
Scale on custom metrics — CPU is a poor AI scaling signal; use pending request count
NAT Gateway — private subnets need NAT for outbound calls to Anthropic/OpenAI

Series Complete: Your Java AI Stack

You've now covered the complete Java AI stack from first call to production deployment:

Spring AI — ChatClient, PromptTemplate, streaming, conversation memory
Anthropic SDK — direct API access, streaming, tool use, vision, prompt caching
RAG — PGVector, document ingestion, hybrid search, QuestionAnswerAdvisor
Function Calling — @Tool, FunctionCallback, role-based tool authorization
Microservices — service decomposition, Kafka async, Resilience4j, distributed rate limiting
AI Router — Spring Cloud Gateway, provider selection, Redis rate limiting, response caching
Security — OWASP LLM Top 10, prompt injection, PII scrubbing, token budgets
CI/CD — GitHub Actions, WireMock testing, Docker multi-stage, ECS blue/green
Observability — Micrometer metrics, cost tracking, circuit breaker health, Grafana dashboards
AWS Deployment — ECS Fargate, Lambda SnapStart, Bedrock, Secrets Manager, CloudFormation

Tools-Hut

Deploying Java AI Services to AWS: ECS, Lambda, Bedrock & CloudFormation

Deployment Option Comparison

ECS Fargate: Container Sizing for Java AI

ECS Task Definition (CloudFormation)

Secrets Manager: Zero Plaintext API Keys

Lambda + SnapStart for Infrequent AI Calls

AWS Bedrock: Managed Models Without Third-Party API Keys

Auto-Scaling: AI-Aware Scaling Policy

VPC Configuration for AI Services

Series Complete: Your Java AI Stack

You've Completed the Java & Spring AI Series

Deployment Option Comparison

ECS Fargate: Container Sizing for Java AI

ECS Task Definition (CloudFormation)

Secrets Manager: Zero Plaintext API Keys

Lambda + SnapStart for Infrequent AI Calls

AWS Bedrock: Managed Models Without Third-Party API Keys

Auto-Scaling: AI-Aware Scaling Policy

VPC Configuration for AI Services

Series Complete: Your Java AI Stack

You've Completed the Java & Spring AI Series

Related Articles