Spring AI: First LLM Integration in 20 Minutes

Spring AI 1.x brings first-class LLM support to the Spring ecosystem: a consistent abstraction over Claude, OpenAI, Mistral, Ollama, and a dozen other providers. The same ChatClient code works across all of them. This guide takes you from an empty Spring Initializr project to a streaming chat endpoint connected to both Claude and OpenAI — in under 20 minutes, with the production configuration patterns you'll actually need.

Why Spring AI Instead of the Provider SDK Directly?

You could call the OpenAI API directly with RestClient or use the Anthropic Java SDK. Spring AI adds value in three areas:

Provider portability: Swap Claude for GPT-4o by changing one property — no code change required
Spring ecosystem integration: Auto-configuration, Spring Security, Actuator metrics, and dependency injection work out of the box
Higher-level abstractions: RAG pipelines, memory management, vector stores, and tool calling are built in

The trade-off: you're on Spring AI's release cadence and abstraction layer. For cutting-edge provider features that land in the raw SDK before Spring AI, you may need to wait or drop down to the SDK directly. Article 2 in this series covers the raw SDK approach for those situations.

Project Setup: Dependencies

Start with Spring Boot 3.3+ and Java 17+. Spring AI uses a BOM (Bill of Materials) to manage transitive dependencies. In Maven:

<!-- pom.xml -->
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-bom</artifactId>
      <version>1.0.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <!-- Choose one or both providers -->
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
  </dependency>
</dependencies>

Or with Gradle (Kotlin DSL):

// build.gradle.kts
extra["springAiVersion"] = "1.0.0"

dependencyManagement {
  imports {
    mavenBom("org.springframework.ai:spring-ai-bom:${extra["springAiVersion"]}")
  }
}

dependencies {
  implementation("org.springframework.ai:spring-ai-anthropic-spring-boot-starter")
  implementation("org.springframework.ai:spring-ai-openai-spring-boot-starter")
}

Configuration: application.yml

Spring AI reads API keys from standard Spring properties. Never hardcode keys — use environment variables or Secrets Manager in production:

# application.yml
spring:
  ai:
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}
      chat:
        options:
          model: claude-sonnet-4-6        # or claude-opus-4-8
          max-tokens: 2048
          temperature: 0.7
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
          max-tokens: 2048
          temperature: 0.7

Never commit API keys

Use ANTHROPIC_API_KEY and OPENAI_API_KEY as environment variables. In local development, add them to your shell profile or use a .env file (excluded from git). In production, use AWS Secrets Manager — see Article 10 in this series for the full deployment setup.

Your First ChatClient Call

Spring AI's ChatClient is the main entry point. Spring Boot auto-configures one for each provider on your classpath. The fluent API makes calls readable and composable:

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;

@Service
public class AiService {

    private final ChatClient chatClient;

    // Spring auto-injects ChatClient.Builder; .build() creates an immutable client
    public AiService(ChatClient.Builder builder) {
        this.chatClient = builder
            .defaultSystem("You are a helpful assistant for Java developers.")
            .build();
    }

    public String ask(String question) {
        return chatClient.prompt()
            .user(question)
            .call()
            .content();    // returns the text response directly
    }
}

The ChatClient.Builder accepts a defaultSystem that applies to every call. You can override it per-call with .system(...) in the prompt chain. The .call().content() idiom blocks and returns the complete response as a String.

REST Controller: Expose the AI Endpoint

import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api/ai")
public class AiController {

    private final AiService aiService;

    public AiController(AiService aiService) {
        this.aiService = aiService;
    }

    @PostMapping("/chat")
    public ChatResponse chat(@RequestBody ChatRequest request) {
        String answer = aiService.ask(request.question());
        return new ChatResponse(answer);
    }

    record ChatRequest(String question) {}
    record ChatResponse(String answer) {}
}

Streaming Responses with Server-Sent Events

For user-facing chatbots, streaming is essential — it shows the first word in 200–500ms instead of making the user wait 5–10 seconds for a complete response. Spring AI returns a reactive Flux<String>:

import reactor.core.publisher.Flux;
import org.springframework.http.MediaType;

@Service
public class AiService {

    private final ChatClient chatClient;

    public AiService(ChatClient.Builder builder) {
        this.chatClient = builder
            .defaultSystem("You are a helpful assistant for Java developers.")
            .build();
    }

    // Streaming: returns tokens as they arrive
    public Flux<String> stream(String question) {
        return chatClient.prompt()
            .user(question)
            .stream()
            .content();    // Flux emits one String per token chunk
    }
}

// In the controller:
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@RequestParam String question) {
    return aiService.stream(question);
}

The TEXT_EVENT_STREAM_VALUE media type causes Spring to send each Flux emission as an SSE event. The browser or frontend can consume this with EventSource or the Fetch API's ReadableStream.

PromptTemplate: Structured, Reusable Prompts

Hard-coding prompts as string literals in Java is a maintenance trap. PromptTemplate lets you use Mustache-style {variable} placeholders and load templates from files or classpath resources:

import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.core.io.ClassPathResource;

@Service
public class CodeReviewService {

    private final ChatClient chatClient;

    // Load template from src/main/resources/prompts/code-review.st
    private final PromptTemplate reviewTemplate = new PromptTemplate(
        new ClassPathResource("prompts/code-review.st")
    );

    public String reviewCode(String language, String code) {
        var prompt = reviewTemplate.create(Map.of(
            "language", language,
            "code",     code
        ));
        return chatClient.prompt(prompt).call().content();
    }
}

The template file at src/main/resources/prompts/code-review.st:

Review the following {language} code for bugs, performance issues, and best practice violations.
Be specific about line numbers when possible. Format your response as:

1. **Critical Issues** (bugs, security problems)
2. **Performance** (inefficiencies)
3. **Style & Best Practices**
4. **Suggested Refactoring** (if significant)

Code to review:
```{language}
{code}
```

Switching Between Claude and OpenAI

When you have both starters on the classpath, Spring AI auto-configures a ChatModel bean for each provider. You can inject the one you want by name, or use a configuration property to choose at runtime:

@Configuration
public class AiConfig {

    @Value("${app.ai.provider:anthropic}")
    private String provider;

    @Bean
    public ChatClient chatClient(
            AnthropicChatModel anthropicModel,
            OpenAiChatModel openAiModel) {

        ChatModel model = provider.equals("openai") ? openAiModel : anthropicModel;
        return ChatClient.create(model);
    }
}

With this setup, deploying with APP_AI_PROVIDER=openai switches the entire application to GPT-4o without touching code. This is powerful during A/B tests or when one provider has an outage.

Per-Request Options: Model, Temperature, and Tokens

The global defaults in application.yml can be overridden per call using provider-specific options:

import org.springframework.ai.anthropic.AnthropicChatOptions;

public String summarize(String text) {
    return chatClient.prompt()
        .user("Summarize in 3 bullet points: " + text)
        .options(AnthropicChatOptions.builder()
            .model("claude-haiku-4-5-20251001")  // fastest Claude for simple tasks
            .maxTokens(300)
            .temperature(0.3)               // lower = more deterministic
            .build())
        .call()
        .content();
}

Conversation History with ChatMemory

A single-turn request/response is stateless. For multi-turn conversations where context matters, Spring AI provides ChatMemory. The built-in InMemoryChatMemory is fine for development; use a persistent store in production:

import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.InMemoryChatMemory;

@Service
public class ConversationalService {

    private final ChatClient chatClient;
    private final InMemoryChatMemory memory = new InMemoryChatMemory();

    public ConversationalService(ChatClient.Builder builder) {
        this.chatClient = builder
            .defaultAdvisors(new MessageChatMemoryAdvisor(memory))
            .build();
    }

    public String chat(String conversationId, String userMessage) {
        return chatClient.prompt()
            .user(userMessage)
            .advisors(a -> a.param(
                MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY,
                conversationId))    // tie messages to a session/user
            .call()
            .content();
    }
}

Each conversationId maintains its own message history. The advisor automatically prepends prior messages to each request, giving the model the conversation context it needs.

Structured Output: Mapping Responses to POJOs

Spring AI can map the LLM's JSON response directly to a Java class using the entity() method. The library generates the schema from your class and instructs the model to output valid JSON:

public record CodeReview(
    List<String> criticalIssues,
    List<String> performanceIssues,
    List<String> styleSuggestions,
    int overallScore
) {}

public CodeReview structuredReview(String code) {
    return chatClient.prompt()
        .user("Review this Java code and respond as JSON: " + code)
        .call()
        .entity(CodeReview.class);    // auto-deserializes the JSON response
}

Testing AI Integrations

Never call the real API in unit tests — it's slow, flaky, and costs money. Spring AI provides a MockChatModel for testing:

import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.model.MockChatModel;

@SpringBootTest
class AiServiceTest {

    @Test
    void shouldReturnFormattedResponse() {
        var mockResponse = "Here are 3 bullet points...";
        var mockModel = MockChatModel.builder()
            .withResponse(mockResponse)
            .build();

        var service = new AiService(ChatClient.builder(mockModel));
        var result = service.ask("Explain Java records");

        assertThat(result).isEqualTo(mockResponse);
    }
}

Production Checklist for Spring AI

API keys from environment variables or Secrets Manager — never in code or application.yml committed to git
Set max-tokens explicitly — prevent runaway responses from billing surprises
Use temperature: 0 for deterministic tasks (extraction, classification), 0.7 for creative tasks
Add spring.ai.retry.* configuration for automatic retries on transient errors
Enable Spring Boot Actuator — Spring AI auto-exports token usage metrics
Use MockChatModel in all tests — never call real APIs in CI

Retry Configuration

Spring AI includes automatic retry with exponential backoff. Configure it in application.yml:

# application.yml
spring:
  ai:
    retry:
      max-attempts: 3
      initial-interval: 1000   # ms before first retry
      multiplier: 2.0           # doubles each retry
      max-interval: 30000      # caps at 30s
      on-http-codes: 429,503    # only retry these codes

What's Next in This Series

You now have a running Spring Boot application that calls Claude and OpenAI via a consistent ChatClient API. The remaining articles in this series go deeper:

Article 2 — Calling the Anthropic SDK directly for streaming, vision, and tool use before Spring AI supports them
Article 3 — Building a full RAG pipeline with PGVector, document ingestion, and semantic search
Article 4 — Function calling and @Tool annotations to connect LLMs to your Spring services
Articles 7–10 — Security, CI/CD, observability, and production deployment to AWS

Tools-Hut

Spring AI: Your First LLM Integration in 20 Minutes

Why Spring AI Instead of the Provider SDK Directly?

Project Setup: Dependencies

Configuration: application.yml

Your First ChatClient Call

REST Controller: Expose the AI Endpoint

Streaming Responses with Server-Sent Events

PromptTemplate: Structured, Reusable Prompts

Switching Between Claude and OpenAI

Per-Request Options: Model, Temperature, and Tokens

Conversation History with ChatMemory

Structured Output: Mapping Responses to POJOs

Testing AI Integrations

Retry Configuration

What's Next in This Series

Java & Spring AI Series

Why Spring AI Instead of the Provider SDK Directly?

Project Setup: Dependencies

Configuration: application.yml

Your First ChatClient Call

REST Controller: Expose the AI Endpoint

Streaming Responses with Server-Sent Events

PromptTemplate: Structured, Reusable Prompts

Switching Between Claude and OpenAI

Per-Request Options: Model, Temperature, and Tokens

Conversation History with ChatMemory

Structured Output: Mapping Responses to POJOs

Testing AI Integrations

Retry Configuration

What's Next in This Series

Java & Spring AI Series

Related Articles