Skip to content

Tool-Call Parsing

Hamr includes native tool-call parsers for every model family supported by vLLM's --tool-call-parser flag. This means Hamr can parse raw model tool-call output without requiring vLLM to normalize it first.

Why Native Parsing?

Local models often don't emit clean OpenAI-format tool_calls. Instead, they emit:

  • XML/tag-delimited blocks (<tool_call>...</tool_call>)
  • Pythonic function-call syntax (func(arg="value"))
  • Special-token-delimited formats (<|python_tag|>, [TOOL_CALLS])
  • Family-specific markup (<function=name>, <function_calls>)

Hamr parses all of these formats natively, so you don't need vLLM runtime normalization.

Hamr also normalizes messy streaming output from local gateways. Some servers send cumulative partials instead of true deltas; Hamr converts them to append-only text before updating the TUI or session log. Thinking/reasoning text is displayed as part of the live assistant message, then kept out of future prompts unless the provider explicitly requires reasoning replay.

Configuration

Per-Provider Parser

Set tool_call_parser in your provider config:

toml
[providers.relay]
id = "relay"
model = "Qwen3.6-35B-A3B"
tool_call_parser = "qwen3_xml"

Auto-Detection

If you don't set tool_call_parser, Hamr auto-detects the parser from your model name:

Model FamilyAuto-Detected Parser
Qwen3, Qwen3-Coder, Qwen3.5, Qwen3.6qwen3_xml
Qwen2.5hermes
Hermes, NousResearch, OpenHermeshermes
Llama 3 / 3.1 / 3.2 / 3.3llama3_json
Llama 4llama4_pythonic
DeepSeek V3 / Chat / R1deepseek_v3
DeepSeek V3.1deepseek_v31
Mistral, Mixtralmistral
xLAMxlam
Granite / Granite 3granite
Granite 4granite4
Granite 20B FCgranite-20b-fc
InternLMinternlm
FunctionGemmafunctiongemma
OLMo3 / OLMoEolmo3
GLM-4 / GLM-4.5glm45
GLM-4.7glm47
Step 3step3
Step 3.5step3p5
Kimi K2kimi_k2
Hunyuan A13Bhunyuan_a13b
LongCatlongcat
Jambajamba
MiniMaxminimax
GigaChat 3gigachat3

Explicit Override

toml
[providers.custom]
id = "custom"
base_url = "http://127.0.0.1:1234/v1"
model = "my-model"
tool_call_parser = "hermes"

Disabling Content Parsing

To rely only on OpenAI-format tool_calls from the API (no content parsing):

toml
tool_call_parser = "openai"

Supported Parsers

XML/Tag-Based

Parser IDFormatExample Models
qwen3_xml<tool_call><function=name><parameter=key>val</parameter></function></tool_call>Qwen3, Qwen3-Coder
hermes<tool_call>{"name":"...","arguments":{...}}</tool_call>Hermes, Qwen2.5
olmo3<function_calls><function_call>{...}</function_call></function_calls>OLMo3
functiongemma<tool_call>{"name":"...","arguments":{...}}</tool_call>FunctionGemma
gigachat3<function=name>{"key":"value"}</function>GigaChat 3
step3<tool_call> or <function_call> blocksStep 3
step3p5<tool_call> or <function_call> blocksStep 3.5

JSON-Based

Parser IDFormatExample Models
llama3_json<|python_tag|>{"name":"...","parameters":{...}}Llama 3.x
mistral[TOOL_CALLS][{"name":"...","arguments":{...}}]Mistral, Mixtral
xlam<tool_call>{"name":"...","arguments":{...}}</tool_call> or bare fn+JSONxLAM
granite<tool_call>{"name":"...","arguments":{...}}</tool_call>Granite 3
granite4<tool_call>{"name":"...","arguments":{...}}</tool_call>Granite 4
granite-20b-fc<tool_call>{"name":"...","arguments":{...}}</tool_call>Granite 20B FC
internlm<tool_call>{"name":"...","arguments":{...}}</tool_call>InternLM
jamba<tool_call>{"name":"...","arguments":{...}}</tool_call>Jamba
minimax<tool_call>{"name":"...","arguments":{...}}</tool_call>MiniMax
kimi_k2<tool_call>{"name":"...","arguments":{...}}</tool_call>Kimi K2
hunyuan_a13b<tool_call>{"name":"...","arguments":{...}}</tool_call>Hunyuan A13B
longcat<tool_call>{"name":"...","arguments":{...}}</tool_call>LongCat
deepseek_v3<tool_call>{"name":"...","arguments":{...}}</tool_call>DeepSeek V3, R1
deepseek_v31Same as V3 with special token variantDeepSeek V3.1
glm45<|tool_call|>{"name":"...","arguments":{...}} or Hermes fallbackGLM-4.5
glm47Same as glm45GLM-4.7

Pythonic

Parser IDFormatExample Models
pythonicfunc(key="val", ...) or [func(...), func(...)]Pythonic-capable models
llama4_pythonic<|python_tag|>func(key="val")Llama 4

Passthrough

Parser IDBehavior
openaiNo text parsing; uses API-returned tool_calls only

Generic Fallback

Parser IDBehavior
genericMulti-strategy: tries Hermes-style, fenced JSON, and bare JSON

Format Details

Qwen3 XML (qwen3_xml)

<tool_call>
<function=get_weather>
<parameter=location>San Francisco</parameter>
<parameter=unit>celsius</parameter>
</function>
</tool_call>

Each <parameter> value is coerced: true/false → boolean, null → null, numbers → Number, JSON objects/arrays → parsed JSON. Unknown values stay as strings.

Hermes (hermes)

<tool_call>
{"name": "get_weather", "arguments": {"location": "SF", "unit": "celsius"}}
</tool_call>

Each <tool_call> block contains a single JSON object. Supports name/tool_name/function for function name, and arguments/parameters/input for arguments (object or JSON-string).

Llama 3 JSON (llama3_json)

<|python_tag|>{"name": "get_weather", "parameters": {"location": "SF"}}

Uses <|python_tag|> prefix. May include <|start_header_id|>assistant<|end_header_id|> and <|eot_id|> tokens which are stripped.

Pythonic (pythonic, llama4_pythonic)

python
get_weather(location="San Francisco", unit="celsius")
python
[get_weather(city="SF"), get_weather(city="Seattle")]

Safe tokenizer — no eval() used. Supports parallel calls (lists), booleans (True/False), None, numbers, and quoted strings.

Mistral (mistral)

[TOOL_CALLS][{"name": "get_weather", "arguments": {"location": "SF"}}]

OLMo3 (olmo3)

<function_calls>
<function_call>
{"name": "get_weather", "arguments": {"location": "SF"}}
</function_call>
</function_calls>

Troubleshooting

Raw <tool_call> Markup in Transcript

If you see <tool_call>...</tool_call> markup in the assistant output instead of actual tool executions:

  1. Wrong parser: Check your tool_call_parser config. Use auto-detection or set it explicitly.
  2. Model not formatting correctly: Some models need a specific chat template. Check vLLM docs for recommended --chat-template per model.
  3. Malformed output: The model may be emitting corrupted tool calls. Enable verbose logging to see raw output.

Parser Not Found

If you set tool_call_parser to a value that doesn't match any registered parser, Hamr throws an immediate error:

Unknown tool-call-parser "xyz". Available: qwen3_xml, hermes, llama3_json, ...

This prevents silent misconfiguration — an explicit override is treated as intentional. To see all available parsers:

bash
# Parser source files
ls src/llm/parsers/

If you want to disable content parsing without specifying a parser, use tool_call_parser = "openai" (passthrough mode).

Parser Produces No Calls

  • The openai parser is a passthrough and never parses text.
  • The mistral parser requires [TOOL_CALLS] prefix.
  • The llama3_json parser requires <|python_tag|> prefix.
  • Use debug mode to see the raw model output and verify format.

Native Parsing vs vLLM Normalization

Hamr can work in two modes:

  1. Native parsing (default): Hamr parses raw model text output using the configured parser. The provider sees the raw content. Tool calls are extracted by Hamr.

  2. vLLM normalization: If your vLLM server is configured with --enable-auto-tool-choice --tool-call-parser <parser>, vLLM normalizes tool calls into OpenAI-format tool_calls in the API response. Hamr consumes these natively without needing text parsing.

Both modes work. Native parsing is recommended for local providers (Relay, llama.cpp) that don't run the full vLLM tool-call normalization pipeline.

Limitations

  • Not schema-constrained: Tool-call parsing is post-hoc text extraction. Model output can still be malformed.
  • No content parsing: Text in the model output that looks like tool calls but is actually prose (e.g., documentation examples) may be falsely parsed. The generic parser mitigates this by checking context.
  • Streaming: Hamr accumulates stream chunks and parses the complete response. Partial/incomplete tool calls are not emitted until the stream ends.
  • Some parsers are stubs: step3, step3p5, glm45, glm47 use fallback strategies pending more detailed format documentation from vLLM.

References

  • vLLM Tool Calling Documentation
  • vLLM --tool-call-parser flag: supported parsers in vllm/entrypoints/openai/tool_parsers/
  • Hamr parser source: src/llm/parsers/

Skaft Software · MIT License