What is Spring AI and how does it relate to Spring Boot?

Spring AI is a Spring project that provides abstractions and auto-configuration for AI models (chat, embeddings, image, audio) and vector stores. It integrates with Spring Boot the same way Spring Data integrates with databases: you add starters, configure properties, and inject beans like ChatClient or ChatModel instead of writing low-level HTTP code for each provider.

Do I need OpenAI to use Spring AI?

No. OpenAI is one supported provider via spring-ai-openai-spring-boot-starter. Spring AI also offers starters for Azure OpenAI, Ollama, Anthropic, and others. You choose the starter that matches your deployment and configure base URL, model name, and credentials accordingly.

Where should I store API keys for a Spring Boot AI application?

Never commit keys to git. Use environment variables (for example OPENAI_API_KEY) referenced from application.yml, or inject secrets from your cloud provider (AWS Secrets Manager, Azure Key Vault, Kubernetes secrets). Use separate profiles for dev and production.

What is ChatClient in Spring AI?

ChatClient is a fluent facade built on top of ChatModel that simplifies building prompts (system and user messages), calling the model, and retrieving text or structured output. It reduces boilerplate compared to calling the REST API of a provider directly.

How do I add retrieval-augmented generation (RAG) in Spring Boot?

Spring AI supports document loading, text splitting, embedding models, and VectorStore implementations. Typical steps: chunk your documents, embed chunks, store vectors, then at query time retrieve relevant chunks and include them in the prompt context. Start with a working chat endpoint, then add embeddings and a vector store when you need grounded answers from your own data.

Can I stream LLM responses from a Spring Boot controller?

Yes. Spring AI supports streaming token output; reactive stacks often use WebFlux with Server-Sent Events or a Flux of text chunks. Servlet-based apps can use async MVC or switch to WebFlux for streaming. Exact method names may vary slightly by Spring AI version, so follow the reference for your release.

Spring Boot with AI: Complete Tutorial (Spring AI, ChatClient & REST API Examples) — 2026

This guide shows how to add generative AI to a Spring Boot 3 application using Spring AI: project setup, ChatClient, REST controllers, configuration for OpenAI-compatible APIs, streaming, and security best practices — with copy-paste examples you can run locally.

Why Spring AI with Spring Boot?

Spring Boot excels at building production HTTP APIs, configuration, and observability. Spring AI brings the same conventions to LLM integrations: you declare dependencies, set properties, and inject a ChatClient or ChatModel bean instead of hand-rolling HTTP clients for every provider. That keeps your Spring Boot AI code testable (swap in mocks), portable across models (OpenAI, Azure, Ollama, and others via starters), and ready to extend toward RAG (retrieval-augmented generation) when you add a vector store.

Prerequisites and versions

You need JDK 17 or later (JDK 21+ recommended for long-term support alignment), Maven 3.9+ or Gradle, and an API key from your chosen provider (for example OpenAI). Pin Spring Boot 3.2+ and a Spring AI release that matches — check the official Spring AI project page for the current BOM version. The examples below use property placeholders so you can substitute the exact version numbers from the documentation without changing structure.

Create a Spring Boot project

Use start.spring.io with Spring Web (and optionally Spring Reactive Web if you want streaming with WebFlux). Add the Spring AI starter manually in your build file — Initializr may list Spring AI depending on the release. Your main class is standard:

src/main/java/com/example/demo/DemoApplication.java

package com.example.demo;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class DemoApplication {
    public static void main(String[] args) {
        SpringApplication.run(DemoApplication.class, args);
    }
}

Maven dependencies (BOM + OpenAI starter)

Import the Spring AI BOM in dependencyManagement, then add spring-ai-openai-spring-boot-starter. Replace ${spring-ai.version} with the version from the Spring AI reference (for example 1.0.0 or the latest stable).

pom.xml (excerpt)

<properties>
    <java.version>21</java.version>
    <spring-ai.version>1.0.0</spring-ai.version>
</properties>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>${spring-ai.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

For Gradle, use the BOM platform dependency and the same starter artifact. If your build cannot resolve the BOM, confirm that you are not mixing incompatible Spring Boot and Spring AI versions, and consult the release notes for any required Maven repository (milestones vs central).

Configure API keys and model options

Store secrets outside source control. On your machine, export OPENAI_API_KEY (or your provider’s variable), then reference it from YAML:

src/main/resources/application.yml

spring:
  application:
    name: demo-ai
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      base-url: https://api.openai.com   # change for Azure or compatible endpoints
      chat:
        options:
          model: gpt-4o-mini
          temperature: 0.7

server:
  port: 8080

For Azure OpenAI, Spring AI provides a dedicated starter and property namespace; the pattern is the same: configure endpoint, API version, deployment name, and key via environment-specific profiles (application-prod.yml).

ChatClient: synchronous REST endpoint

Inject ChatClient.Builder (auto-configured) and expose a simple POST endpoint that accepts a user message and returns the model’s reply as plain text:

src/main/java/com/example/demo/ChatController.java

package com.example.demo;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api")
public class ChatController {

    private final ChatClient chatClient;

    public ChatController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    public record ChatRequest(String message) {}
    public record ChatResponse(String reply) {}

    @PostMapping(path = "/chat", consumes = MediaType.APPLICATION_JSON_VALUE)
    public ChatResponse chat(@RequestBody ChatRequest request) {
        String reply = chatClient.prompt()
            .user(request.message())
            .call()
            .content();
        return new ChatResponse(reply);
    }

    @GetMapping("/chat")
    public ChatResponse chatGet(@RequestParam(defaultValue = "Hello") String message) {
        return chat(new ChatRequest(message));
    }
}

Test with curl:

Terminal

curl -s -X POST http://localhost:8080/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message":"Explain Spring AI in one sentence."}'

The fluent ChatClient API handles request assembly, calls the configured ChatModel, and returns content. You can later switch models or providers largely by configuration.

System prompts and user messages

For consistent behavior, set a system message that defines tone, safety rules, or output format. Spring AI lets you chain .system(...) before .user(...):

System + user prompt

String answer = chatClient.prompt()
    .system("You are a concise technical assistant. Answer in under 120 words.")
    .user("What is dependency injection in Spring?")
    .call()
    .content();

You can externalize long system prompts in classpath:/prompts/system.st templates (Spring AI supports template resources) to avoid hard-coding strings in Java for complex prompts.

Streaming responses with WebFlux

For long answers, stream tokens to the client to reduce time-to-first-byte. Add spring-boot-starter-webflux and use a reactive return type. Example pattern with Flux<String> (exact API may vary slightly by Spring AI version; consult the reference forstream() on the client):

Streaming endpoint (conceptual)

// Add: org.springframework.boot:spring-boot-starter-webflux

import reactor.core.publisher.Flux;

@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@RequestParam String message) {
    return chatClient.prompt()
        .user(message)
        .stream()
        .content();
}

If you stay on servlet stack only, you can still collect chunks in a callback or use Spring MVC async support; WebFlux is the most straightforward fit for SSE-style streaming.

Secrets, rate limits, and production tips

API keys: use environment variables locally, and secret managers (AWS Secrets Manager, Azure Key Vault, Kubernetes secrets) in production. Restrict CORS on browser-facing APIs and authenticate callers (OAuth2, API keys per tenant) so your LLM quota is not public.

Abuse: add rate limiting (Bucket4j, API gateway, or Spring Cloud Gateway) and payload size limits on message fields. Log request IDs, not raw prompts, if privacy requires it.

Cost: choose smaller models for classification or routing, and larger models only for complex reasoning. Cache repeated queries where safe.

Testing with mocks

Replace the chat model with a test double or use Spring Boot’s test slices. A minimal approach is to mock ChatClient behavior at the service layer so controller tests stay fast and deterministic without calling external APIs.

Next steps: RAG and vector stores

Spring AI supports embeddings, VectorStore implementations (PGVector, Redis, etc.), and document loaders. A typical Spring Boot RAG pipeline: ingest documents, chunk text, embed with an embedding model, store vectors, then at query time retrieve top-k chunks and pass them as context in the prompt. That pattern grounds answers in your data and is the standard upgrade path after a basic chat endpoint works.

On this blog, you can go deeper on the Java platform with Java 25 features and examples and Java interview questions with detailed answers. For REST payloads and APIs, see what JSON is and what XML is.

Try AI tools on FreeToolSuite

These free tools complement what you build in Spring: experiment with prompts, documents, and logic helpers in the browser.

AI Equation Solver

Step-by-step math help for algebra through calculus using an on-screen keyboard.

Digital Logic Solver

Boolean algebra, truth tables, and K-maps with guided AI explanations.

Science Explainer

Biology, chemistry, and physics topics with visuals and storyboard-style learning.

PDF AI Summarizer

Upload PDFs, ask questions, and get AI-powered summaries and answers.

You now have a working pattern for Spring Boot with AI: dependencies, configuration, a ChatClient-based REST API, and a roadmap toward streaming and RAG. Adjust versions against the official Spring AI documentation, keep secrets out of git, and iterate with tests before exposing endpoints to untrusted clients.