This guide shows how to add generative AI to a Spring Boot 3 application using Spring AI: project setup, ChatClient, REST controllers, configuration for OpenAI-compatible APIs, streaming, and security best practices — with copy-paste examples you can run locally.
Why Spring AI with Spring Boot?
Spring Boot excels at building production HTTP APIs, configuration, and observability. Spring AI brings the same conventions to LLM integrations: you declare dependencies, set properties, and inject a ChatClient or ChatModel bean instead of hand-rolling HTTP clients for every provider. That keeps your Spring Boot AI code testable (swap in mocks), portable across models (OpenAI, Azure, Ollama, and others via starters), and ready to extend toward RAG (retrieval-augmented generation) when you add a vector store.
Prerequisites and versions
You need JDK 17 or later (JDK 21+ recommended for long-term support alignment), Maven 3.9+ or Gradle, and an API key from your chosen provider (for example OpenAI). Pin Spring Boot 3.2+ and a Spring AI release that matches — check the official Spring AI project page for the current BOM version. The examples below use property placeholders so you can substitute the exact version numbers from the documentation without changing structure.
Create a Spring Boot project
Use start.spring.io with Spring Web (and optionally Spring Reactive Web if you want streaming with WebFlux). Add the Spring AI starter manually in your build file — Initializr may list Spring AI depending on the release. Your main class is standard:
package com.example.demo;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
}Maven dependencies (BOM + OpenAI starter)
Import the Spring AI BOM in dependencyManagement, then add spring-ai-openai-spring-boot-starter. Replace ${spring-ai.version} with the version from the Spring AI reference (for example 1.0.0 or the latest stable).
<properties>
<java.version>21</java.version>
<spring-ai.version>1.0.0</spring-ai.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>For Gradle, use the BOM platform dependency and the same starter artifact. If your build cannot resolve the BOM, confirm that you are not mixing incompatible Spring Boot and Spring AI versions, and consult the release notes for any required Maven repository (milestones vs central).
Configure API keys and model options
Store secrets outside source control. On your machine, export OPENAI_API_KEY (or your provider’s variable), then reference it from YAML:
spring:
application:
name: demo-ai
ai:
openai:
api-key: ${OPENAI_API_KEY}
base-url: https://api.openai.com # change for Azure or compatible endpoints
chat:
options:
model: gpt-4o-mini
temperature: 0.7
server:
port: 8080For Azure OpenAI, Spring AI provides a dedicated starter and property namespace; the pattern is the same: configure endpoint, API version, deployment name, and key via environment-specific profiles (application-prod.yml).
ChatClient: synchronous REST endpoint
Inject ChatClient.Builder (auto-configured) and expose a simple POST endpoint that accepts a user message and returns the model’s reply as plain text:
package com.example.demo;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api")
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
public record ChatRequest(String message) {}
public record ChatResponse(String reply) {}
@PostMapping(path = "/chat", consumes = MediaType.APPLICATION_JSON_VALUE)
public ChatResponse chat(@RequestBody ChatRequest request) {
String reply = chatClient.prompt()
.user(request.message())
.call()
.content();
return new ChatResponse(reply);
}
@GetMapping("/chat")
public ChatResponse chatGet(@RequestParam(defaultValue = "Hello") String message) {
return chat(new ChatRequest(message));
}
}Test with curl:
curl -s -X POST http://localhost:8080/api/chat \
-H "Content-Type: application/json" \
-d '{"message":"Explain Spring AI in one sentence."}'The fluent ChatClient API handles request assembly, calls the configured ChatModel, and returns content. You can later switch models or providers largely by configuration.
System prompts and user messages
For consistent behavior, set a system message that defines tone, safety rules, or output format. Spring AI lets you chain .system(...) before .user(...):
String answer = chatClient.prompt()
.system("You are a concise technical assistant. Answer in under 120 words.")
.user("What is dependency injection in Spring?")
.call()
.content();You can externalize long system prompts in classpath:/prompts/system.st templates (Spring AI supports template resources) to avoid hard-coding strings in Java for complex prompts.
Streaming responses with WebFlux
For long answers, stream tokens to the client to reduce time-to-first-byte. Add spring-boot-starter-webflux and use a reactive return type. Example pattern with Flux<String> (exact API may vary slightly by Spring AI version; consult the reference forstream() on the client):
// Add: org.springframework.boot:spring-boot-starter-webflux
import reactor.core.publisher.Flux;
@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.stream()
.content();
}If you stay on servlet stack only, you can still collect chunks in a callback or use Spring MVC async support; WebFlux is the most straightforward fit for SSE-style streaming.
Secrets, rate limits, and production tips
API keys: use environment variables locally, and secret managers (AWS Secrets Manager, Azure Key Vault, Kubernetes secrets) in production. Restrict CORS on browser-facing APIs and authenticate callers (OAuth2, API keys per tenant) so your LLM quota is not public.
Abuse: add rate limiting (Bucket4j, API gateway, or Spring Cloud Gateway) and payload size limits on message fields. Log request IDs, not raw prompts, if privacy requires it.
Cost: choose smaller models for classification or routing, and larger models only for complex reasoning. Cache repeated queries where safe.
Testing with mocks
Replace the chat model with a test double or use Spring Boot’s test slices. A minimal approach is to mock ChatClient behavior at the service layer so controller tests stay fast and deterministic without calling external APIs.
Next steps: RAG and vector stores
Spring AI supports embeddings, VectorStore implementations (PGVector, Redis, etc.), and document loaders. A typical Spring Boot RAG pipeline: ingest documents, chunk text, embed with an embedding model, store vectors, then at query time retrieve top-k chunks and pass them as context in the prompt. That pattern grounds answers in your data and is the standard upgrade path after a basic chat endpoint works.
Related reading and tools
On this blog, you can go deeper on the Java platform with Java 25 features and examples and Java interview questions with detailed answers. For REST payloads and APIs, see what JSON is and what XML is.
Try AI tools on FreeToolSuite
These free tools complement what you build in Spring: experiment with prompts, documents, and logic helpers in the browser.
AI Equation Solver
Step-by-step math help for algebra through calculus using an on-screen keyboard.
Digital Logic Solver
Boolean algebra, truth tables, and K-maps with guided AI explanations.
Science Explainer
Biology, chemistry, and physics topics with visuals and storyboard-style learning.
PDF AI Summarizer
Upload PDFs, ask questions, and get AI-powered summaries and answers.
You now have a working pattern for Spring Boot with AI: dependencies, configuration, a ChatClient-based REST API, and a roadmap toward streaming and RAG. Adjust versions against the official Spring AI documentation, keep secrets out of git, and iterate with tests before exposing endpoints to untrusted clients.