Getting structured input and output from Ollama
If you have ever wired a local LLM into a production-ish Rails app, you have probably met this failure mode: you ask for JSON, the model says Sure, here is the JSON:, wraps the payload in a Markdown fence, forgets one field, adds another field because it felt helpful, and now your parser is holding the bag.
Ollama gives you useful tools for structured work, especially the format option on chat requests. But the important lesson is not "turn on JSON mode and relax." The real pattern is more boring and more reliable: shape the input so the model understands the task, constrain the output when you can, then parse, sanitize, validate, and retry when the model drifts.
That is the difference between a demo and a feature you can leave running overnight.
Start with the message shape
The chat endpoint takes a messages array. In practice, the cleanest shape is usually one system message for stable instructions, optional alternating user and assistant messages for history, and one final user message that contains the actual task.
That last message matters. If you are using RAG, do not scatter retrieved context and the user's question across multiple consecutive user messages. Bundle them together and put the real task at the end. Models are trained on conversation patterns, and the last concrete user request tends to anchor the response better than a task hidden above a wall of context.
A simple version can be Markdown:
prompt = <<~MARKDOWN
# Context
#{retrieved_context}
# Task
Answer this customer question using only the context above:
#{question}
MARKDOWN
messages = [
{
role: "system",
content: "You answer support questions for a Rails application."
},
{
role: "user",
content: prompt
}
]
Markdown is often enough when the shape is simple. Headings give the model obvious boundaries, and the result is easy to inspect in logs. The downside is that Markdown does not protect your instruction structure from dynamic user content. If the question itself contains headings, fake instructions, or snippets that look like delimiters, the model can blur the line between your prompt and the user's data.
When that starts to matter, move to a structured envelope.
Use JSON or XML for structured input
JSON is the obvious default because every model has seen endless JSON, every stack can generate it safely, and your application already knows how to escape strings. For many Rails apps, JSON.generate is enough to build a prompt payload that keeps retrieved documents, instructions, and the user question separate.
documents = results.map.with_index(1) do |result, index|
{
index: index,
title: result.title,
snippets: result.snippets
}
end
prompt = JSON.generate(
instructions: "Answer the question using the documents as context.",
documents: documents,
question: question
)
The nice thing here is not that the model becomes a JSON parser. It is that your application owns the escaping. Quotes, braces, and newlines inside user content no longer break the prompt structure.
XML can also be a good fit, especially for smaller models. Tags give the model semantic anchors that are sometimes more legible than nested JSON keys, and CDATA gives you a practical way to wrap dynamic text.
builder = Nokogiri::XML::Builder.new(encoding: "UTF-8") do |xml|
xml.prompt do
xml.instructions "Answer the question using the documents as context."
xml.documents do
results.each.with_index(1) do |result, index|
xml.document(index: index) do
xml.title { xml.cdata result.title }
result.snippets.each do |snippet|
xml.snippet { xml.cdata snippet }
end
end
end
end
xml.question { xml.cdata question }
end
end
prompt = builder.doc.root.to_xml(
save_with: Nokogiri::XML::Node::SaveOptions::NO_DECLARATION
)
The rule I like is simple: if you want JSON out, JSON in is usually a good match. If your prompt is mostly long text with named sections, XML can be clearer. Do not make the model translate between too many structures at once unless you have measured that it helps.
For output, prefer JSON Schema over loose JSON mode
Ollama's chat API supports format. Passing "json" asks the model to produce JSON-ish output. Passing a JSON Schema gives the decoder a much stronger shape to follow.
For production code, schema mode is the better starting point.
schema = {
type: "object",
required: ["answer", "confidence"],
additionalProperties: false,
properties: {
answer: { type: "string" },
confidence: {
type: "string",
enum: ["low", "medium", "high"]
}
}
}
response = Ollama.chat(
model: "llama3.1",
format: schema,
messages: [
{
role: "system",
content: "Return JSON with answer and confidence fields."
},
{
role: "user",
content: prompt
}
]
)
There is one sharp edge worth knowing about: schema descriptions are not a substitute for prompt instructions. Ollama uses the schema to constrain generation, but field descriptions may not be visible to the model in the way you expect. If a field has business meaning, say that meaning in the prompt too.
For example, do not rely only on a description inside the schema to explain what "confidence" means. Tell the model in prose that high confidence means the answer is directly supported by the supplied documents, medium means the answer is inferred, and low means the documents are insufficient.
Still parse like the model is trying to embarrass you
Even with format set, treat model output as hostile until proven otherwise. You will eventually see Markdown fences, leading prose, blank content, partial JSON, or valid JSON with the wrong shape.
Start by cleaning the common wrappers.
def parse_llm_json(content)
raise "Empty LLM response" if content.blank?
clean = content.strip
clean = clean.sub(/\A```(?:json)?\s*/i, "")
clean = clean.sub(/\s*```\z/, "")
clean = clean.sub(/\A(?:sure|here is the json)[:,!\s]*/i, "")
JSON.parse(clean, decimal_class: BigDecimal)
end
This helper is deliberately small. It handles common junk without pretending to repair every malformed response. Once parsing succeeds, validate the payload against the same schema your application expected in the first place.
parsed = parse_llm_json(response.message.content)
errors = JSON::Validator.fully_validate(schema, parsed)
if errors.any?
raise "LLM response did not match schema: #{errors.join(", ")}"
end
That validation step is where structured output becomes operationally useful. Your app should not continue because the response "looks about right." It should continue because the response matches the contract the rest of the code depends on.
Retry with feedback, not vibes
When parsing or validation fails, a retry can work well, but make the retry concrete. Feed the error back to the model and ask for the same response shape again. Keep the original task, include the invalid output if it is safe to do so, and include the parser or validation error.
repair_prompt = JSON.generate(
instructions: "Repair the response so it matches the schema exactly.",
schema: schema,
invalid_response: response.message.content,
error: errors.join(", ")
)
Do not retry forever. One to three attempts is usually enough to separate transient formatting drift from a prompt or model that simply cannot satisfy the contract. After that, fail loudly, log the raw output, and let the job or request surface a useful error.
A practical Rails shape
In a Rails app, I like keeping the model call and the response contract near each other. You do not need a grand framework for this. A small object that builds the prompt, calls Ollama, parses the response, and validates the result is enough.
class SupportAnswer
SCHEMA = {
type: "object",
required: ["answer", "confidence"],
additionalProperties: false,
properties: {
answer: { type: "string" },
confidence: { type: "string", enum: ["low", "medium", "high"] }
}
}
def self.call(question:, documents:)
new(question:, documents:).call
end
def initialize(question:, documents:)
@question = question
@documents = documents
end
def call
response = Ollama.chat(
model: "llama3.1",
format: SCHEMA,
messages: messages
)
parsed = parse_json(response.message.content)
validate!(parsed)
parsed
end
private
attr_reader :question, :documents
def messages
[
{
role: "system",
content: system_prompt
},
{
role: "user",
content: user_prompt
}
]
end
def system_prompt
"Answer with JSON only. Confidence is high only when the documents directly support the answer."
end
def user_prompt
JSON.generate(
instructions: "Use the documents to answer the question.",
documents: documents,
question: question
)
end
def parse_json(content)
clean = content.to_s.strip
clean = clean.sub(/\A```(?:json)?\s*/i, "").sub(/\s*```\z/, "")
JSON.parse(clean, decimal_class: BigDecimal)
end
def validate!(parsed)
errors = JSON::Validator.fully_validate(SCHEMA, parsed)
return if errors.empty?
raise "Ollama response did not match schema: #{errors.join(", ")}"
end
end
The object is not exciting, which is the point. The prompt has a stable shape. The output has a schema. The parser removes common wrappers. The validator protects the rest of the app. When this fails, it fails in one obvious place.
The production rule
Structured LLM work is not about convincing the model to be perfectly obedient. It is about reducing ambiguity before generation and enforcing a contract after generation.
For Ollama, that means putting the real task last, choosing an input format that preserves boundaries, using JSON Schema for output when possible, repeating important field semantics in the prompt, stripping common response wrappers, validating parsed data, and retrying with specific feedback.
Do that, and structured output stops being a hopeful string convention. It becomes a normal integration boundary, which is exactly how boring we want production LLM code to be.
Happy parsing!