DocsAI Connect / Local LLM

Local LLM (Ollama, LM Studio, Gemma)

If you're offline, or security policy prevents using cloud LLMs, you can attach an MCP client to a model running on Ollama, LM Studio, etc. and use FindIP as a tool. We recommend 8B+ models with tool-use support — Llama 3.1, Qwen 2.5, Gemma 2, and similar.

Prerequisites

A locally running LLM instance, a FindIP API key, and a client that supports tool calling (e.g. FastMCP, or the OpenAI Python SDK with a custom base URL).

Setup steps

1

Run a model with Ollama

Pull a model that supports tool calling.

bash
ollama serve
ollama pull llama3.1:8b-instruct-q4_K_M
# or: ollama pull qwen2.5:7b-instruct
2

Define the FindIP tool

Register the tool against the OpenAI-compatible API (Ollama listens on localhost:11434/v1).

python
import os, requests
from openai import OpenAI

llm = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

tools = [{
"type": "function",
"function": {
  "name": "findip_search",
  "description": "Semantic search across patents from KR/US/JP/CN/EP",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {"type": "string"},
      "top_k": {"type": "integer", "default": 10},
    },
    "required": ["query"],
  },
},
}]

def findip_search(query, top_k=10):
  return requests.post(
      "https://api.findip.ai/api/v1/search/semantic",
      headers={"X-API-Key": os.environ["FINDIP_API_KEY"]},
      json={"query": query, "top_k": top_k},
      timeout=30,
  ).json()
3

Tool-call loop

When the model returns tool_calls, execute them for real and feed the results back into the model.

python
messages = [{"role": "user", "content": "Top 5 solid-electrolyte patents for solid-state batteries"}]
res = llm.chat.completions.create(model="llama3.1:8b-instruct-q4_K_M", messages=messages, tools=tools)

if res.choices[0].message.tool_calls:
  for tc in res.choices[0].message.tool_calls:
      result = findip_search(**eval(tc.function.arguments))
      messages.append(res.choices[0].message)
      messages.append({"role": "tool", "tool_call_id": tc.id, "content": str(result)})
  final = llm.chat.completions.create(model="llama3.1:8b-instruct-q4_K_M", messages=messages, tools=tools)
  print(final.choices[0].message.content)

Sample prompt

Prompt

"Pick 5 key Korean / US patents on stability enhancement of perovskite solar cells and summarize them with their main claims."

Troubleshooting

The model never issues a tool call.

Check whether the model supports tool calling at all. Llama 3.1 8B/70B Instruct, Qwen 2.5 7B+, and Gemma 2 27B work reliably. Anything below 7B has a high failure rate.

Responses are too slow.

Letting the local model read and summarize raw data is heavy. Extracting only title and abstract from the FindIP response before passing it back cuts token usage to roughly 1/5.

I want to use the MCP standard directly.

You can connect to https://api.findip.ai/mcp directly via FastMCP or LangChain's MCP adapter. The OAuth flow needs extra work in headless environments though, so the API-key approach is usually simpler.

FindIP — Semantic Patent Search