OpenClaw DeepSeek Setup: API + Local in 5 Minutes

Set up DeepSeek in OpenClaw two ways: cloud API at $0.28/M tokens or free local via Ollama. Both take under 5 minutes. Full config included.

9 min read

OpenClaw DeepSeek setup showing the OpenClaw mascot connected to the DeepSeek logo on a dark background

TL;DR: Two ways to run DeepSeek in OpenClaw. Cloud API: export DEEPSEEK_API_KEY=sk-... then openclaw models set deepseek/deepseek-chat. Local via Ollama: ollama pull deepseek-r1:14b then openclaw models set ollama/deepseek-r1:14b. API costs $0.28 per million input tokens. Local costs nothing. Both work in under five minutes.


I run DeepSeek on three of my four OpenClaw agents. The coding agent stays on Claude Opus because reasoning quality matters there. Everything else? DeepSeek. My research agent processes thousands of tokens per day on deepseek-chat, and the bill last month was $4.20. The same workload on GPT-5.4 would have cost over $150.

OpenClaw treats DeepSeek like any other provider. One environment variable, one CLI command, done. And if you want zero cost and full privacy, you can run distilled R1 models locally through Ollama. Same OpenClaw instance, same agent routing, no data leaving your network.

This guide covers both paths.

What DeepSeek ships in March 2026

DeepSeek's API runs on V3.2, a 671-billion parameter Mixture-of-Experts model that activates 37 billion parameters per token. Two model IDs point to it:

  • deepseek-chat runs V3.2 in non-thinking mode. Fast, cheap, good for conversation and bulk work.
  • deepseek-reasoner runs V3.2 in thinking mode. It shows chain-of-thought reasoning before answering, similar to OpenAI's o1 or Claude with extended thinking. Slower, but better at math, code, and multi-step logic.

Both share a 128K context window. Default max output is 4K tokens for chat, 32K for reasoner. You can push chat to 8K and reasoner to 64K if you need longer responses.

Pricing makes it the cheapest serious API available:

Input (cache miss)Input (cache hit)Output
DeepSeek V3.2$0.28$0.028$0.42
GPT-5.4$2.50N/A$15.00
Claude Sonnet 4.6$3.00N/A$15.00

Cache hits happen automatically when your prompts share a prefix. System prompts, repeated instructions, anything that stays the same across requests. At $0.028 per million tokens, cached input is almost free.

New accounts get 5 million free tokens. No credit card required.

V4 has been rumored since February 2026 but hasn't shipped as of this writing. When it lands, the model IDs will likely change, and I'll update this guide. For now, V3.2 is what you get.

Path 1: DeepSeek cloud API

Get an API key from platform.deepseek.com. Sign up, go to API Keys, generate one. Takes about 30 seconds.

Set the environment variable in your OpenClaw server's shell:

export DEEPSEEK_API_KEY=sk-your-key-here

To make it persist across reboots, add it to your shell profile (~/.bashrc, ~/.zshrc, or your OpenClaw .env file).

Now tell OpenClaw to use it:

openclaw models set deepseek/deepseek-chat

That's it. Send a message and DeepSeek V3.2 answers.

Want the thinking model instead? Swap the model string:

openclaw models set deepseek/deepseek-reasoner

Reasoner takes longer to respond because it reasons through the problem first. Use it for coding tasks, math, or anything that needs step-by-step logic. For casual chat, scheduling, or bulk processing, deepseek-chat is faster and cheaper.

Sick of managing API keys and server configs? OpenclawVPS ships pre-configured with every major provider. Set your DeepSeek key once in the dashboard and it handles the rest. Plans start at $19/month.

Path 2: DeepSeek local via Ollama

DeepSeek R1 distilled models run locally through Ollama. These are smaller versions trained on reasoning data from the full 671B model. They're dumber than the API, but they're free and nothing leaves your machine.

First, pull the model that fits your hardware:

ModelSize on diskRAM neededBest for
deepseek-r1:1.5b1.1 GB4 GBTesting, tiny tasks
deepseek-r1:7b4.7 GB8 GBLight reasoning on laptops
deepseek-r1:8b5.2 GB8 GBR1-0528 Qwen3 update, best small model
deepseek-r1:14b9 GB16 GBGood balance of speed and quality
deepseek-r1:32b20 GB24 GB+Serious local reasoning
deepseek-r1:70b43 GB64 GB+Near-API quality, needs beefy GPU

I use the 14B on a Mac Mini with 16GB RAM. Good enough for summarization and simple Q&A. For anything that needs real reasoning, I switch to the cloud API.

ollama pull deepseek-r1:14b

Wait for the download, then tell OpenClaw:

openclaw models set ollama/deepseek-r1:14b

No API key needed. Ollama must be running on the same machine (or reachable via network). If you haven't started it: ollama serve.

OpenClaw DeepSeek setup diagram showing two paths: cloud API on the left and local Ollama on the right

The trade-off is speed and intelligence. The 14B model is roughly equivalent to GPT-3.5 on most benchmarks. The 32B gets closer to GPT-4 territory. None of them match the full V3.2 API. But for tasks where privacy matters or where you're processing data you can't send to a third party, local is the right call.

Per-agent model routing

The real power shows up when you assign different models to different agents. In openclaw.json:

{
  "agents": {
    "coder": {
      "model": {
        "primary": "anthropic/claude-opus-4-6"
      }
    },
    "researcher": {
      "model": {
        "primary": "deepseek/deepseek-chat"
      }
    },
    "private": {
      "model": {
        "primary": "ollama/deepseek-r1:14b"
      }
    }
  }
}

Coding agent gets Opus for the heavy thinking. Research agent gets DeepSeek at $0.28 per million tokens. The private agent runs locally for anything sensitive. Three agents, three cost tiers, one OpenClaw instance.

Set up an alias so you can switch models mid-conversation without typing the full string:

openclaw models aliases add ds deepseek/deepseek-chat
openclaw models aliases add dsr deepseek/deepseek-reasoner
openclaw models aliases add dsl ollama/deepseek-r1:14b

Now /model ds in any conversation switches to DeepSeek cloud. /model dsl switches to local. Fast.

OpenClaw DeepSeek terminal showing model aliases and per-agent configuration commands

Fallback chains with DeepSeek

DeepSeek can sit anywhere in a fallback chain. I put it as the secondary for my coding agent. If Claude hits a rate limit or goes down, OpenClaw tries DeepSeek before falling back to Gemini:

{
  "coder": {
    "model": {
      "primary": "anthropic/claude-opus-4-6",
      "fallbacks": [
        "deepseek/deepseek-reasoner",
        "google/gemini-2.5-flash"
      ]
    }
  }
}

Or flip it. Make DeepSeek the primary and use a more expensive model as the fallback for when you need guaranteed quality:

{
  "researcher": {
    "model": {
      "primary": "deepseek/deepseek-chat",
      "fallbacks": [
        "anthropic/claude-sonnet-4-6"
      ]
    }
  }
}

Mixing providers in the fallback chain is smart because outages, rate limits, and latency spikes tend to hit one provider at a time. If DeepSeek's API has a bad day (it happens, especially during peak hours in Chinese time zones), your agent keeps working on a different provider without you touching anything.

The cost math

Real numbers from my setup. I run three agents that collectively process about 2 million tokens per day (input + output combined). Monthly cost by provider if I used a single one everywhere:

ProviderMonthly cost (60M tokens)
DeepSeek V3.2 (chat)~$21
DeepSeek V3.2 (with cache hits)~$8
GPT-5.4~$525
Claude Sonnet 4.6~$540
Gemini 2.5 Flash~$6
Ollama local$0 (electricity only)

DeepSeek sits in a sweet spot. It's not the absolute cheapest (Gemini Flash edges it out on raw price), but it's significantly smarter than Flash on reasoning tasks. For agents that need to think, DeepSeek gives you 90% of the quality at 10% of the price compared to GPT or Claude.

And those cache hits add up fast. If your agent has a long system prompt that stays the same across conversations, every request after the first one gets the 90% discount on input tokens. My research agent's system prompt is about 2,000 tokens. Over thousands of requests, that cached prefix saved me roughly $15 last month compared to what the same traffic would have cost without caching.

Troubleshooting

"401 Unauthorized" or "Invalid API key" Your DEEPSEEK_API_KEY is either not set or wrong. Run echo $DEEPSEEK_API_KEY to check. If it's empty, re-export it. If you're using a .env file, restart OpenClaw after editing.

"Model not found" The model string must include the provider prefix. deepseek/deepseek-chat works. Just deepseek-chat doesn't. OpenClaw needs the prefix to route to the right API.

Ollama "connection refused" Ollama isn't running. Start it with ollama serve in a separate terminal or as a system service. If it's running on a different machine, set the Ollama base URL in your OpenClaw config.

Slow responses from local model Smaller distilled models on CPU are slow. Two options: use a smaller model (drop from 14B to 7B), or offload to GPU if you have one. For Ollama, set OLLAMA_NUM_GPU=999 to push everything to the GPU. If you don't have a GPU, accept the latency or switch to the API for time-sensitive tasks.

Context limit hit API supports 128K tokens. Local models default to a much smaller context (usually 2048-4096). In Ollama, set num_ctx higher in the Modelfile or pass it at runtime, but watch your memory. A 14B model at 32K context can eat 20GB+ of RAM.

Debugging server configs at 2am gets old fast. OpenclawVPS handles Ollama, API keys, and model routing out of the box. Managed hosting so you can focus on building agents, not fixing infrastructure.

Ready to run DeepSeek on your agents?

DeepSeek at $0.28 per million tokens. Ollama distilled models at $0. Both work in OpenClaw with one command. If you want managed hosting where providers are pre-configured and agents run 24/7 without server babysitting:

Get started with OpenclawVPS →


Frequently asked questions

Can OpenClaw use DeepSeek models?
Yes. DeepSeek is a pre-configured provider in OpenClaw. Set your DEEPSEEK_API_KEY environment variable, then run openclaw models set deepseek/deepseek-chat. You get DeepSeek V3.2 with 128K context. For reasoning tasks, use deepseek/deepseek-reasoner instead.
How much does DeepSeek cost through OpenClaw?
DeepSeek V3.2 costs $0.28 per million input tokens and $0.42 per million output tokens. Cache hits drop input to $0.028/M. New accounts get 5 million free tokens. For comparison, GPT-5.4 costs $2.50/$15 per million tokens, making DeepSeek roughly 10-36x cheaper.
Can I run DeepSeek locally with OpenClaw?
Yes. Pull a distilled R1 model through Ollama (ollama pull deepseek-r1:14b for a good balance), then set it in OpenClaw with openclaw models set ollama/deepseek-r1:14b. No API key needed, no data leaves your machine. Sizes range from 1.5B to 70B parameters.
What is the difference between deepseek-chat and deepseek-reasoner?
Both run DeepSeek V3.2. deepseek-chat is the non-thinking mode, fast and cheap, best for conversation and bulk processing. deepseek-reasoner is the thinking mode with chain-of-thought reasoning, better for math, code, and multi-step logic but slower. Same pricing for both.
Is DeepSeek as good as GPT-5.4 or Claude Sonnet?
For most tasks, DeepSeek V3.2 performs within 5-10% of GPT-5.4 and Claude Sonnet 4.6 on benchmarks. It scores 90.8 on MMLU versus GPT-5.4's comparable range. The reasoner mode rivals OpenAI's o1 on math and code. At 10-36x lower cost, the quality-per-dollar ratio is unmatched.

Keep reading