Calling Claude from Java: Anthropic SDK, Streaming & Tool Use

Spring AI gives you a clean abstraction over Claude, but sometimes you need the raw SDK — to use a brand-new Anthropic feature before Spring AI supports it, to implement custom streaming logic, or to call the API from a non-Spring Java application. The official Anthropic Java SDK gives you direct, full-featured access to the entire API. This article covers the SDK from first call to advanced tool use and vision inputs.

SDK vs Spring AI: When to Use Which

Use Case	Recommendation	Reason
Spring Boot app with standard chat	Spring AI	Auto-config, DI, provider portability
New Anthropic feature (not in Spring AI yet)	Anthropic SDK	Direct API access, no wait for Spring AI release
Non-Spring Java app (CLI, batch, library)	Anthropic SDK	No Spring overhead
Custom streaming logic / SSE handling	Anthropic SDK	Full control over stream lifecycle
Complex tool use orchestration	Either	Spring AI supports @Tool; SDK gives raw access
Vision / multimodal inputs	Either	Both support it; Spring AI adds convenience

Setup: Maven Dependency

<!-- pom.xml -->
<dependency>
  <groupId>com.anthropic</groupId>
  <artifactId>anthropic-java</artifactId>
  <version>0.8.0</version>
</dependency>

Client Initialization

The SDK client is thread-safe and should be a singleton — create it once and reuse it across your application:

import com.anthropic.client.Anthropic;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;

@Configuration
public class AnthropicConfig {

    @Bean
    public Anthropic anthropicClient(
            @Value("${anthropic.api-key}") String apiKey) {
        return AnthropicOkHttpClient.builder()
            .apiKey(apiKey)
            .build();
    }
}

Basic Message: Synchronous Completion

import com.anthropic.models.*;

@Service
public class ClaudeService {

    private final Anthropic client;

    public String complete(String userMessage) {
        Message response = client.messages().create(
            MessageCreateParams.builder()
                .model(Model.CLAUDE_SONNET_4_6)
                .maxTokens(1024)
                .system("You are a helpful Java expert.")
                .addMessage(MessageParam.builder()
                    .role(MessageParam.Role.USER)
                    .content(userMessage)
                    .build())
                .build()
        );

        // Content can be text or tool_use blocks; extract text
        return response.content().stream()
            .filter(block -> block instanceof ContentBlock.Text)
            .map(block -> ((ContentBlock.Text) block).text())
            .findFirst()
            .orElseThrow();
    }
}

Multi-Turn Conversations

Claude's API is stateless — you must send the full conversation history with each request. Maintain the message list yourself:

@Service
public class ConversationService {

    private final Anthropic client;
    private final List<MessageParam> history = new ArrayList<>();

    public String chat(String userMessage) {
        // Add user turn to history
        history.add(MessageParam.builder()
            .role(MessageParam.Role.USER)
            .content(userMessage)
            .build());

        Message response = client.messages().create(
            MessageCreateParams.builder()
                .model(Model.CLAUDE_SONNET_4_6)
                .maxTokens(2048)
                .messages(history)
                .build()
        );

        String assistantText = extractText(response);

        // Add assistant turn to history for next call
        history.add(MessageParam.builder()
            .role(MessageParam.Role.ASSISTANT)
            .content(assistantText)
            .build());

        return assistantText;
    }

    private String extractText(Message msg) {
        return msg.content().stream()
            .filter(b -> b instanceof ContentBlock.Text)
            .map(b -> ((ContentBlock.Text) b).text())
            .collect(Collectors.joining());
    }
}

Context Window Management

Claude's context window is large (200k tokens for Sonnet) but not infinite. For long conversations, implement a sliding window: keep the system prompt + last N turns, or summarize older history. Sending 100 rounds of conversation on every request is expensive and eventually hits the limit. Spring AI's MessageChatMemoryAdvisor handles this automatically — one reason to prefer Spring AI for conversational applications.

Streaming: Real-Time Token Output

Streaming dramatically improves perceived latency. The SDK uses a lambda-based event handler:

import com.anthropic.models.RawMessageStreamEvent;

public void streamToOutput(String userMessage, PrintWriter output) {
    client.messages().createStreaming(
        MessageCreateParams.builder()
            .model(Model.CLAUDE_SONNET_4_6)
            .maxTokens(1024)
            .addMessage(MessageParam.builder()
                .role(MessageParam.Role.USER)
                .content(userMessage)
                .build())
            .build()
    ).subscribe(event -> {
        // ContentBlockDelta events carry token chunks
        if (event instanceof RawMessageStreamEvent.ContentBlockDelta delta) {
            if (delta.delta() instanceof ContentBlockDelta.TextDelta text) {
                output.write(text.text());
                output.flush();
            }
        }
    });
}

// Integrate with Spring MVC SSE endpoint:
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter stream(@RequestParam String message) {
    SseEmitter emitter = new SseEmitter();
    executorService.submit(() -> {
        try {
            client.messages().createStreaming(
                MessageCreateParams.builder()
                    .model(Model.CLAUDE_SONNET_4_6)
                    .maxTokens(1024)
                    .addMessage(MessageParam.builder()
                        .role(MessageParam.Role.USER)
                        .content(message).build())
                    .build()
            ).subscribe(event -> {
                if (event instanceof RawMessageStreamEvent.ContentBlockDelta d
                        && d.delta() instanceof ContentBlockDelta.TextDelta t) {
                    try {
                        emitter.send(SseEmitter.event().data(t.text()));
                    } catch (IOException e) { emitter.completeWithError(e); }
                }
            });
            emitter.complete();
        } catch (Exception e) { emitter.completeWithError(e); }
    });
    return emitter;
}

Tool Use (Function Calling)

Tool use lets Claude call your Java methods when it needs real-world data. The flow: you declare tools → Claude decides which to call → you execute the call → you send the result back → Claude uses it in its response.

import com.anthropic.models.Tool;
import com.fasterxml.jackson.databind.JsonNode;

public String answerWithTools(String question) {
    // Define the get_stock_price tool
    Tool stockTool = Tool.builder()
        .name("get_stock_price")
        .description("Get the current stock price for a ticker symbol")
        .inputSchema(Tool.InputSchema.builder()
            .type("object")
            .putProperty("ticker", JsonNode.class, Map.of(
                "type", "string",
                "description", "Stock ticker symbol, e.g. AAPL"
            ))
            .required(List.of("ticker"))
            .build())
        .build();

    // First request: Claude decides what tool to call
    Message response = client.messages().create(
        MessageCreateParams.builder()
            .model(Model.CLAUDE_SONNET_4_6)
            .maxTokens(1024)
            .tools(List.of(stockTool))
            .addMessage(MessageParam.builder()
                .role(MessageParam.Role.USER)
                .content(question).build())
            .build()
    );

    // If Claude wants to call a tool:
    if (response.stopReason() == StopReason.TOOL_USE) {
        var toolUse = response.content().stream()
            .filter(b -> b instanceof ContentBlock.ToolUse)
            .map(b -> (ContentBlock.ToolUse) b)
            .findFirst().orElseThrow();

        // Execute the actual tool call in your Java code
        String ticker = toolUse.input().get("ticker").asText();
        String price = stockPriceService.getPrice(ticker);

        // Send tool result back to Claude
        Message finalResponse = client.messages().create(
            MessageCreateParams.builder()
                .model(Model.CLAUDE_SONNET_4_6)
                .maxTokens(1024)
                .tools(List.of(stockTool))
                .addMessage(MessageParam.builder()
                    .role(MessageParam.Role.USER)
                    .content(question).build())
                .addMessage(MessageParam.builder()
                    .role(MessageParam.Role.ASSISTANT)
                    .content(response.content()).build())
                .addMessage(MessageParam.builder()
                    .role(MessageParam.Role.USER)
                    .content(List.of(ToolResultContentBlock.builder()
                        .toolUseId(toolUse.id())
                        .content(price)
                        .build()))
                    .build())
                .build()
        );
        return extractText(finalResponse);
    }

    return extractText(response);
}

Article 4 in this series covers Spring AI's @Tool annotation, which eliminates most of this boilerplate while adding Spring bean integration.

Vision: Analyzing Images

Claude can analyze images sent as base64-encoded data or public URLs:

import java.util.Base64;
import java.nio.file.Files;

public String analyzeImage(Path imagePath, String question) throws IOException {
    byte[] imageBytes = Files.readAllBytes(imagePath);
    String base64 = Base64.getEncoder().encodeToString(imageBytes);

    Message response = client.messages().create(
        MessageCreateParams.builder()
            .model(Model.CLAUDE_SONNET_4_6)
            .maxTokens(1024)
            .addMessage(MessageParam.builder()
                .role(MessageParam.Role.USER)
                .content(List.of(
                    ImageContentBlock.builder()
                        .type("image")
                        .source(ImageSource.builder()
                            .type("base64")
                            .mediaType("image/jpeg")
                            .data(base64)
                            .build())
                        .build(),
                    TextContentBlock.builder()
                        .type("text")
                        .text(question)
                        .build()
                ))
                .build())
            .build()
    );

    return extractText(response);
}

Error Handling: What Can Go Wrong

The SDK throws typed exceptions. Handle them appropriately — some are retryable, others are permanent:

import com.anthropic.errors.*;

public String robustCall(String message) {
    try {
        return complete(message);
    } catch (RateLimitException e) {
        // 429 — back off and retry; check Retry-After header
        long retryAfter = e.headers().getFirstValueAsLong("retry-after")
            .orElse(60);
        sleepSeconds(retryAfter);
        return complete(message);  // or escalate to fallback model
    } catch (OverloadedException e) {
        // 529 — Anthropic servers overloaded; retryable
        sleepSeconds(30);
        return complete(message);
    } catch (AuthenticationException e) {
        // 401 — invalid API key; NOT retryable
        throw new ConfigurationException("Invalid Anthropic API key", e);
    } catch (BadRequestException e) {
        // 400 — malformed request; NOT retryable; fix the code
        log.error("Bad request to Claude API: {}", e.message());
        throw e;
    } catch (InternalServerException e) {
        // 5xx — Anthropic server error; retryable with backoff
        return complete(message);
    }
}

The SDK includes automatic retries

By default the Anthropic Java SDK retries rate limit (429) and server errors (5xx) automatically with exponential backoff. Configure via AnthropicOkHttpClient.builder().maxRetries(3). You only need custom retry logic when implementing fallback to a different provider or custom backoff strategies.

Counting Tokens Before Sending

The /v1/messages/count_tokens endpoint lets you check token usage before making a billable API call — useful for enforcing per-user limits:

public int countTokens(String systemPrompt, String userMessage) {
    CountTokensResponse count = client.messages().countTokens(
        MessageCountTokensParams.builder()
            .model(Model.CLAUDE_SONNET_4_6)
            .system(systemPrompt)
            .addMessage(MessageParam.builder()
                .role(MessageParam.Role.USER)
                .content(userMessage)
                .build())
            .build()
    );
    return count.inputTokens();
}

// Guard against expensive requests
public String safeComplete(String system, String user) {
    int tokens = countTokens(system, user);
    if (tokens > 10_000) {
        throw new RequestTooLargeException(
            "Request is " + tokens + " tokens; limit is 10,000");
    }
    return complete(user);
}

Prompt Caching: Reduce Costs by 90%

For requests with a long, stable system prompt (documentation, a large codebase, a reference dataset), Anthropic's prompt caching charges subsequent requests at 10% of normal input token cost:

import com.anthropic.models.CacheControlEphemeral;

Message response = client.messages().create(
    MessageCreateParams.builder()
        .model(Model.CLAUDE_SONNET_4_6)
        .maxTokens(1024)
        // The large, stable system prompt — mark for caching
        .system(List.of(
            SystemMessageParam.builder()
                .text(largeDocumentationContext)   // could be 50k+ tokens
                .cacheControl(CacheControlEphemeral.builder().build())
                .build()
        ))
        // The per-request user question — NOT cached
        .addMessage(MessageParam.builder()
            .role(MessageParam.Role.USER)
            .content(userQuestion)
            .build())
        .build()
);

// Check cache performance in the response
Usage usage = response.usage();
System.out.println("Cache read tokens: " + usage.cacheReadInputTokens());
System.out.println("Cache write tokens: " + usage.cacheCreationInputTokens());

The cache lives for 5 minutes. Any request using the same prefix within that window gets the cached version. For a chatbot with a 10k-token system prompt and 100 messages per minute, this reduces input token costs by 80–90%.

Async Operations with CompletableFuture

The Anthropic SDK's async client returns CompletableFuture, letting you call multiple models concurrently:

import com.anthropic.client.AnthropicAsync;

@Service
public class ModelComparisonService {

    private final AnthropicAsync asyncClient;

    public CompletableFuture<ModelComparison> compareModels(String prompt) {
        var params = MessageCreateParams.builder()
            .maxTokens(1024)
            .addMessage(MessageParam.builder()
                .role(MessageParam.Role.USER)
                .content(prompt).build());

        CompletableFuture<String> sonnetFuture = asyncClient.messages()
            .create(params.model(Model.CLAUDE_SONNET_4_6).build())
            .thenApply(this::extractText);

        CompletableFuture<String> haikuFuture = asyncClient.messages()
            .create(params.model(Model.CLAUDE_HAIKU_4_5_20251001).build())
            .thenApply(this::extractText);

        // Both calls run concurrently
        return CompletableFuture.allOf(sonnetFuture, haikuFuture)
            .thenApply(v -> new ModelComparison(
                sonnetFuture.join(),
                haikuFuture.join()
            ));
    }
}

SDK vs Spring AI Summary

Use the Anthropic Java SDK when you need: (1) features not yet in Spring AI, (2) non-Spring applications, (3) maximum control over streaming, (4) custom retry/fallback beyond Spring AI's defaults. Use Spring AI for everything else — it handles the boilerplate and integrates cleanly with the Spring ecosystem. Both are production-ready; the choice is about fit, not quality.

Tools-Hut

Calling Claude from Java: Anthropic SDK, Streaming & Tool Use

SDK vs Spring AI: When to Use Which

Setup: Maven Dependency

Client Initialization

Basic Message: Synchronous Completion

Multi-Turn Conversations

Streaming: Real-Time Token Output

Tool Use (Function Calling)

Vision: Analyzing Images

Error Handling: What Can Go Wrong

Counting Tokens Before Sending

Prompt Caching: Reduce Costs by 90%

Async Operations with CompletableFuture

Java & Spring AI Series

SDK vs Spring AI: When to Use Which

Setup: Maven Dependency

Client Initialization

Basic Message: Synchronous Completion

Multi-Turn Conversations

Streaming: Real-Time Token Output

Tool Use (Function Calling)

Vision: Analyzing Images

Error Handling: What Can Go Wrong

Counting Tokens Before Sending

Prompt Caching: Reduce Costs by 90%

Async Operations with CompletableFuture

Java & Spring AI Series

Related Articles