SDK vs Spring AI: When to Use Which
| Use Case | Recommendation | Reason |
|---|---|---|
| Spring Boot app with standard chat | Spring AI | Auto-config, DI, provider portability |
| New Anthropic feature (not in Spring AI yet) | Anthropic SDK | Direct API access, no wait for Spring AI release |
| Non-Spring Java app (CLI, batch, library) | Anthropic SDK | No Spring overhead |
| Custom streaming logic / SSE handling | Anthropic SDK | Full control over stream lifecycle |
| Complex tool use orchestration | Either | Spring AI supports @Tool; SDK gives raw access |
| Vision / multimodal inputs | Either | Both support it; Spring AI adds convenience |
Setup: Maven Dependency
<!-- pom.xml -->
<dependency>
<groupId>com.anthropic</groupId>
<artifactId>anthropic-java</artifactId>
<version>0.8.0</version>
</dependency>Client Initialization
The SDK client is thread-safe and should be a singleton — create it once and reuse it across your application:
import com.anthropic.client.Anthropic;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
@Configuration
public class AnthropicConfig {
@Bean
public Anthropic anthropicClient(
@Value("${anthropic.api-key}") String apiKey) {
return AnthropicOkHttpClient.builder()
.apiKey(apiKey)
.build();
}
}Basic Message: Synchronous Completion
import com.anthropic.models.*;
@Service
public class ClaudeService {
private final Anthropic client;
public String complete(String userMessage) {
Message response = client.messages().create(
MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(1024)
.system("You are a helpful Java expert.")
.addMessage(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(userMessage)
.build())
.build()
);
// Content can be text or tool_use blocks; extract text
return response.content().stream()
.filter(block -> block instanceof ContentBlock.Text)
.map(block -> ((ContentBlock.Text) block).text())
.findFirst()
.orElseThrow();
}
}Multi-Turn Conversations
Claude's API is stateless — you must send the full conversation history with each request. Maintain the message list yourself:
@Service
public class ConversationService {
private final Anthropic client;
private final List<MessageParam> history = new ArrayList<>();
public String chat(String userMessage) {
// Add user turn to history
history.add(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(userMessage)
.build());
Message response = client.messages().create(
MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(2048)
.messages(history)
.build()
);
String assistantText = extractText(response);
// Add assistant turn to history for next call
history.add(MessageParam.builder()
.role(MessageParam.Role.ASSISTANT)
.content(assistantText)
.build());
return assistantText;
}
private String extractText(Message msg) {
return msg.content().stream()
.filter(b -> b instanceof ContentBlock.Text)
.map(b -> ((ContentBlock.Text) b).text())
.collect(Collectors.joining());
}
}Claude's context window is large (200k tokens for Sonnet) but not infinite. For long conversations, implement a sliding window: keep the system prompt + last N turns, or summarize older history. Sending 100 rounds of conversation on every request is expensive and eventually hits the limit. Spring AI's MessageChatMemoryAdvisor handles this automatically — one reason to prefer Spring AI for conversational applications.
Streaming: Real-Time Token Output
Streaming dramatically improves perceived latency. The SDK uses a lambda-based event handler:
import com.anthropic.models.RawMessageStreamEvent;
public void streamToOutput(String userMessage, PrintWriter output) {
client.messages().createStreaming(
MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(1024)
.addMessage(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(userMessage)
.build())
.build()
).subscribe(event -> {
// ContentBlockDelta events carry token chunks
if (event instanceof RawMessageStreamEvent.ContentBlockDelta delta) {
if (delta.delta() instanceof ContentBlockDelta.TextDelta text) {
output.write(text.text());
output.flush();
}
}
});
}
// Integrate with Spring MVC SSE endpoint:
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter stream(@RequestParam String message) {
SseEmitter emitter = new SseEmitter();
executorService.submit(() -> {
try {
client.messages().createStreaming(
MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(1024)
.addMessage(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(message).build())
.build()
).subscribe(event -> {
if (event instanceof RawMessageStreamEvent.ContentBlockDelta d
&& d.delta() instanceof ContentBlockDelta.TextDelta t) {
try {
emitter.send(SseEmitter.event().data(t.text()));
} catch (IOException e) { emitter.completeWithError(e); }
}
});
emitter.complete();
} catch (Exception e) { emitter.completeWithError(e); }
});
return emitter;
}Tool Use (Function Calling)
Tool use lets Claude call your Java methods when it needs real-world data. The flow: you declare tools → Claude decides which to call → you execute the call → you send the result back → Claude uses it in its response.
import com.anthropic.models.Tool;
import com.fasterxml.jackson.databind.JsonNode;
public String answerWithTools(String question) {
// Define the get_stock_price tool
Tool stockTool = Tool.builder()
.name("get_stock_price")
.description("Get the current stock price for a ticker symbol")
.inputSchema(Tool.InputSchema.builder()
.type("object")
.putProperty("ticker", JsonNode.class, Map.of(
"type", "string",
"description", "Stock ticker symbol, e.g. AAPL"
))
.required(List.of("ticker"))
.build())
.build();
// First request: Claude decides what tool to call
Message response = client.messages().create(
MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(1024)
.tools(List.of(stockTool))
.addMessage(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(question).build())
.build()
);
// If Claude wants to call a tool:
if (response.stopReason() == StopReason.TOOL_USE) {
var toolUse = response.content().stream()
.filter(b -> b instanceof ContentBlock.ToolUse)
.map(b -> (ContentBlock.ToolUse) b)
.findFirst().orElseThrow();
// Execute the actual tool call in your Java code
String ticker = toolUse.input().get("ticker").asText();
String price = stockPriceService.getPrice(ticker);
// Send tool result back to Claude
Message finalResponse = client.messages().create(
MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(1024)
.tools(List.of(stockTool))
.addMessage(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(question).build())
.addMessage(MessageParam.builder()
.role(MessageParam.Role.ASSISTANT)
.content(response.content()).build())
.addMessage(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(List.of(ToolResultContentBlock.builder()
.toolUseId(toolUse.id())
.content(price)
.build()))
.build())
.build()
);
return extractText(finalResponse);
}
return extractText(response);
}Article 4 in this series covers Spring AI's @Tool annotation, which eliminates most of this boilerplate while adding Spring bean integration.
Vision: Analyzing Images
Claude can analyze images sent as base64-encoded data or public URLs:
import java.util.Base64;
import java.nio.file.Files;
public String analyzeImage(Path imagePath, String question) throws IOException {
byte[] imageBytes = Files.readAllBytes(imagePath);
String base64 = Base64.getEncoder().encodeToString(imageBytes);
Message response = client.messages().create(
MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(1024)
.addMessage(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(List.of(
ImageContentBlock.builder()
.type("image")
.source(ImageSource.builder()
.type("base64")
.mediaType("image/jpeg")
.data(base64)
.build())
.build(),
TextContentBlock.builder()
.type("text")
.text(question)
.build()
))
.build())
.build()
);
return extractText(response);
}Error Handling: What Can Go Wrong
The SDK throws typed exceptions. Handle them appropriately — some are retryable, others are permanent:
import com.anthropic.errors.*;
public String robustCall(String message) {
try {
return complete(message);
} catch (RateLimitException e) {
// 429 — back off and retry; check Retry-After header
long retryAfter = e.headers().getFirstValueAsLong("retry-after")
.orElse(60);
sleepSeconds(retryAfter);
return complete(message); // or escalate to fallback model
} catch (OverloadedException e) {
// 529 — Anthropic servers overloaded; retryable
sleepSeconds(30);
return complete(message);
} catch (AuthenticationException e) {
// 401 — invalid API key; NOT retryable
throw new ConfigurationException("Invalid Anthropic API key", e);
} catch (BadRequestException e) {
// 400 — malformed request; NOT retryable; fix the code
log.error("Bad request to Claude API: {}", e.message());
throw e;
} catch (InternalServerException e) {
// 5xx — Anthropic server error; retryable with backoff
return complete(message);
}
}By default the Anthropic Java SDK retries rate limit (429) and server errors (5xx) automatically with exponential backoff. Configure via AnthropicOkHttpClient.builder().maxRetries(3). You only need custom retry logic when implementing fallback to a different provider or custom backoff strategies.
Counting Tokens Before Sending
The /v1/messages/count_tokens endpoint lets you check token usage before making a billable API call — useful for enforcing per-user limits:
public int countTokens(String systemPrompt, String userMessage) {
CountTokensResponse count = client.messages().countTokens(
MessageCountTokensParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.system(systemPrompt)
.addMessage(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(userMessage)
.build())
.build()
);
return count.inputTokens();
}
// Guard against expensive requests
public String safeComplete(String system, String user) {
int tokens = countTokens(system, user);
if (tokens > 10_000) {
throw new RequestTooLargeException(
"Request is " + tokens + " tokens; limit is 10,000");
}
return complete(user);
}Prompt Caching: Reduce Costs by 90%
For requests with a long, stable system prompt (documentation, a large codebase, a reference dataset), Anthropic's prompt caching charges subsequent requests at 10% of normal input token cost:
import com.anthropic.models.CacheControlEphemeral;
Message response = client.messages().create(
MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(1024)
// The large, stable system prompt — mark for caching
.system(List.of(
SystemMessageParam.builder()
.text(largeDocumentationContext) // could be 50k+ tokens
.cacheControl(CacheControlEphemeral.builder().build())
.build()
))
// The per-request user question — NOT cached
.addMessage(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(userQuestion)
.build())
.build()
);
// Check cache performance in the response
Usage usage = response.usage();
System.out.println("Cache read tokens: " + usage.cacheReadInputTokens());
System.out.println("Cache write tokens: " + usage.cacheCreationInputTokens());The cache lives for 5 minutes. Any request using the same prefix within that window gets the cached version. For a chatbot with a 10k-token system prompt and 100 messages per minute, this reduces input token costs by 80–90%.
Async Operations with CompletableFuture
The Anthropic SDK's async client returns CompletableFuture, letting you call multiple models concurrently:
import com.anthropic.client.AnthropicAsync;
@Service
public class ModelComparisonService {
private final AnthropicAsync asyncClient;
public CompletableFuture<ModelComparison> compareModels(String prompt) {
var params = MessageCreateParams.builder()
.maxTokens(1024)
.addMessage(MessageParam.builder()
.role(MessageParam.Role.USER)
.content(prompt).build());
CompletableFuture<String> sonnetFuture = asyncClient.messages()
.create(params.model(Model.CLAUDE_SONNET_4_6).build())
.thenApply(this::extractText);
CompletableFuture<String> haikuFuture = asyncClient.messages()
.create(params.model(Model.CLAUDE_HAIKU_4_5_20251001).build())
.thenApply(this::extractText);
// Both calls run concurrently
return CompletableFuture.allOf(sonnetFuture, haikuFuture)
.thenApply(v -> new ModelComparison(
sonnetFuture.join(),
haikuFuture.join()
));
}
}Use the Anthropic Java SDK when you need: (1) features not yet in Spring AI, (2) non-Spring applications, (3) maximum control over streaming, (4) custom retry/fallback beyond Spring AI's defaults. Use Spring AI for everything else — it handles the boilerplate and integrates cleanly with the Spring ecosystem. Both are production-ready; the choice is about fit, not quality.