Deployment Option Comparison
| Option | Best For | Key Consideration |
|---|---|---|
| ECS Fargate | Long-running services, streaming responses, websockets | Always-on cost; no cold start; 15min+ request support |
| Lambda + SnapStart | Bursty, infrequent AI calls; async processing | 15min max timeout; SnapStart eliminates cold start for Java |
| AWS Bedrock | Replace Anthropic/OpenAI with AWS-managed models | IAM auth (no API keys); stays within AWS; data residency |
| EC2 + Auto Scaling | GPU inference or extremely high throughput | Overkill for API-based LLM calls; use only for on-prem models |
ECS Fargate: Container Sizing for Java AI
Java AI services need more memory than typical Spring Boot apps. The JVM baseline, Spring AI libraries, embedded vector client, and concurrent LLM responses in flight add up quickly:
| Component | Memory |
|---|---|
| JVM base (Java 21, G1GC) | ~200 MB |
| Spring Boot + Spring AI + dependencies | ~350 MB |
| PGVector client + connection pool | ~50 MB |
| Concurrent LLM responses in flight (10 × 8k tokens) | ~400 MB |
| Headroom (GC, metaspace, thread stacks) | ~200 MB |
| Recommended minimum | 2 GB |
Start with 1 vCPU / 2 GB for development. Scale to 2 vCPU / 4 GB for production handling 10+ concurrent AI requests.
ECS Task Definition (CloudFormation)
# cloudformation/ai-service.yaml
AiServiceTaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: ai-service
NetworkMode: awsvpc
RequiresCompatibilities: [FARGATE]
Cpu: '2048' # 2 vCPU
Memory: '4096' # 4 GB
ExecutionRoleArn: !GetAtt EcsExecutionRole.Arn
TaskRoleArn: !GetAtt AiServiceTaskRole.Arn
ContainerDefinitions:
- Name: ai-service
Image: !Sub '${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/ai-service:${ImageTag}'
PortMappings:
- ContainerPort: 8080
Environment:
- Name: ENVIRONMENT
Value: !Ref Environment
- Name: JAVA_OPTS
# -XX:MaxRAMPercentage=75 = 3GB heap in a 4GB container
Value: "-XX:MaxRAMPercentage=75.0 -XX:+UseG1GC -XX:+UseStringDeduplication"
Secrets:
# Keys loaded from Secrets Manager at task start — never in env
- Name: ANTHROPIC_API_KEY
ValueFrom: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:ai-service/anthropic-api-key'
- Name: OPENAI_API_KEY
ValueFrom: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:ai-service/openai-api-key'
HealthCheck:
Command: [CMD, curl, -f, "http://localhost:8080/actuator/health"]
Interval: 30
Timeout: 5
Retries: 3
StartPeriod: 60 # Java needs time to start; allow 60s before health checks
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref AiServiceLogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: ai-serviceSecrets Manager: Zero Plaintext API Keys
# Store the API key — run once during provisioning
aws secretsmanager create-secret \
--name ai-service/anthropic-api-key \
--secret-string "sk-ant-..." \
--region us-east-1
# IAM policy for the ECS task role — least privilege
AiServiceTaskRole:
Type: AWS::IAM::Role
Properties:
Policies:
- PolicyName: SecretsAccess
PolicyDocument:
Statement:
- Effect: Allow
Action: [secretsmanager:GetSecretValue]
Resource:
- !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:ai-service/*'Both work, but Secrets Manager has automatic rotation support, a 90-day rotation reminder, and the ability to version secrets. For API keys that cost money if leaked, Secrets Manager's audit trail via CloudTrail and automatic rotation scheduling are worth the small additional cost ($0.40/secret/month vs. free for SSM Standard).
Lambda + SnapStart for Infrequent AI Calls
Lambda's cold start problem (5–15 seconds for Java) is eliminated by SnapStart, which restores from a pre-warmed snapshot:
# Lambda function with SnapStart (CloudFormation)
AiLambdaFunction:
Type: AWS::Lambda::Function
Properties:
FunctionName: ai-classifier
Runtime: java21
Handler: com.example.ai.LambdaHandler::handleRequest
MemorySize: 2048 # 2GB for Java AI Lambda
Timeout: 900 # 15min max — LLM calls can be slow
SnapStart:
ApplyOn: PublishedVersions # enables SnapStart
Environment:
Variables:
ENVIRONMENT: !Ref Environment
VpcConfig: # if accessing RDS/ElastiCache
SecurityGroupIds: [!Ref LambdaSecurityGroup]
SubnetIds: !Ref PrivateSubnets// Lambda handler — implement CRaC for SnapStart compatibility
import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;
@SpringBootApplication
public class LambdaHandler implements
RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent>,
Resource {
private static ConfigurableApplicationContext context;
static {
context = SpringApplication.run(LambdaHandler.class);
Core.getGlobalContext().register(new LambdaHandler());
}
@Override
public void beforeCheckpoint(Context<? extends Resource> ctx) {
// Close connections before snapshot is taken
DataSource ds = context.getBean(DataSource.class);
((HikariDataSource) ds).getHikariPoolMXBean().softEvictConnections();
}
@Override
public void afterRestore(Context<? extends Resource> ctx) {
// Re-establish connections after snapshot restore
// Spring will handle this automatically on first use
}
@Override
public APIGatewayProxyResponseEvent handleRequest(
APIGatewayProxyRequestEvent event, Context lambdaContext) {
var handler = context.getBean(AiRequestHandler.class);
return handler.handle(event);
}
}AWS Bedrock: Managed Models Without Third-Party API Keys
AWS Bedrock provides Claude (Anthropic), Llama (Meta), Titan (Amazon), and others via IAM authentication — no Anthropic API key required. All traffic stays within AWS, which matters for data residency and compliance:
# application.yml — Spring AI Bedrock config
spring:
ai:
bedrock:
aws:
region: us-east-1
# No API key — uses IAM role of ECS task / Lambda execution role
anthropic:
chat:
enabled: true
model: anthropic.claude-sonnet-4-6-20251001-v2:0
options:
max-tokens: 2048
temperature: 0.7
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bedrock-converse-spring-boot-starter</artifactId>
</dependency># IAM policy for ECS task — grants Bedrock model invocation
BedrockPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
PolicyDocument:
Statement:
- Effect: Allow
Action:
- bedrock:InvokeModel
- bedrock:InvokeModelWithResponseStream
Resource:
- "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-*"
- "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-*"Auto-Scaling: AI-Aware Scaling Policy
Standard CPU-based auto-scaling doesn't reflect AI workload well — threads spend most time waiting for the LLM API, keeping CPU low. Scale on custom metrics instead:
# Scale on pending AI requests in the queue (CloudWatch custom metric)
AiServiceScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: AiQueueDepthScaling
PolicyType: TargetTrackingScaling
TargetTrackingScalingPolicyConfiguration:
CustomizedMetricSpecification:
Namespace: AiService/production
MetricName: ai.requests.pending
Statistic: Average
TargetValue: 10.0 # scale out when avg pending requests > 10
ScaleInCooldown: 300 # 5min before scaling in (LLM calls are slow)
ScaleOutCooldown: 60VPC Configuration for AI Services
# ECS tasks in private subnets; NAT Gateway for outbound LLM API calls
AiServiceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: AI Service
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 8080
ToPort: 8080
SourceSecurityGroupId: !Ref AlbSecurityGroupId
SecurityGroupEgress:
- IpProtocol: tcp # HTTPS to Anthropic/OpenAI APIs
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
- IpProtocol: tcp # PostgreSQL
FromPort: 5432
ToPort: 5432
DestinationSecurityGroupId: !Ref DatabaseSecurityGroupId
- IpProtocol: tcp # Redis
FromPort: 6379
ToPort: 6379
DestinationSecurityGroupId: !Ref RedisSecurityGroupId- Container memory — min 2 GB for Java AI; 4 GB for production with concurrent calls
- JVM flags —
-XX:MaxRAMPercentage=75.0prevents OOM;-XX:+UseG1GCfor balanced pauses - Health check start period — set
StartPeriod: 60; Java takes 30–45s to start - Task timeout — ALB idle timeout must exceed max LLM latency (set to 300s)
- Secrets Manager — API keys as Secrets; never in environment variables visible in console
- SnapStart — mandatory for Java Lambda; eliminates 5–15s cold start
- Bedrock consideration — if data residency or compliance matters, Bedrock keeps everything in AWS
- Scale on custom metrics — CPU is a poor AI scaling signal; use pending request count
- NAT Gateway — private subnets need NAT for outbound calls to Anthropic/OpenAI
Series Complete: Your Java AI Stack
You've now covered the complete Java AI stack from first call to production deployment:
- Spring AI — ChatClient, PromptTemplate, streaming, conversation memory
- Anthropic SDK — direct API access, streaming, tool use, vision, prompt caching
- RAG — PGVector, document ingestion, hybrid search, QuestionAnswerAdvisor
- Function Calling — @Tool, FunctionCallback, role-based tool authorization
- Microservices — service decomposition, Kafka async, Resilience4j, distributed rate limiting
- AI Router — Spring Cloud Gateway, provider selection, Redis rate limiting, response caching
- Security — OWASP LLM Top 10, prompt injection, PII scrubbing, token budgets
- CI/CD — GitHub Actions, WireMock testing, Docker multi-stage, ECS blue/green
- Observability — Micrometer metrics, cost tracking, circuit breaker health, Grafana dashboards
- AWS Deployment — ECS Fargate, Lambda SnapStart, Bedrock, Secrets Manager, CloudFormation