short
- One developer has recreated Claude Opus’s way of thinking in a native, open source model.
- The resulting “Qwopus” model runs on consumer devices and competes with much larger systems.
- It shows how distillation can bring groundbreaking AI capabilities offline and into the hands of developers.
Claude Opus 4.6 is the kind of AI that makes you feel like you’re talking to someone who read the entire Internet twice, then went to law school. He plans, thinks, and writes code that actually runs.
It’s also completely inaccessible if you want to run it locally on your own hardware, as it sits behind Anthropic’s API and costs money per token. A developer named Jackrong decided that wasn’t good enough, and took matters into his own hands.
The result is a pair of models –Qwen3.5-27B-Claude-4.6-Opus-Inference-Distilled And its advanced successor Qwopus3.5-27B-v3– which runs on a single consumer GPU and attempts to reproduce how Opus thinks, not just what it says.
The trick is called distillation. Think of it this way: a master chef writes down every technique, every thought step, and every decision made during a complex meal. The student reads these notes obsessively until the same reasoning becomes second nature. In the end, he prepares meals in a very similar way, but it’s all imitation rather than real knowledge.
In terms of artificial intelligence, the weaker model studies the inference output of the stronger model and learns how to replicate the pattern.
Qwopus: What if Quinn and Claude had a baby?
Jackrong took Qwen3.5-27B, an already powerful open source model from Alibaba — but small when compared to giant models like GPT or Claude — and fed it with datasets of Claude Opus 4.6-style chain-of-thought reasoning. Then he set it to think in the same orderly, step-by-step way that Opus does.
The first model in the family, the Claude-4.6-Opus-Reasoning-Distilled Edition, did just that. Community testers running it through coding agents like Claude Code and OpenCode reported that it maintained full thinking mode, supported the original developer role without patches, and could run independently for minutes without stopping—something the basic Qwen model struggled to do.
Qwopus v3 goes even further. While the first model was primarily about copying Opus’s reasoning style, version 3 is built around what Jakrung calls “structural alignment”—training the model to reason faithfully step by step, rather than just imitating superficial patterns from the teacher’s output. It adds a clear tool call boost targeting agent workflows and claims stronger performance in coding benchmarks: 95.73% on HumanEval under strict evaluation, outperforming both the base Qwen3.5-27B and the previous distilled version.

How to play it on your computer
Operating either model is simple. Both are available in GGUF format, which means you can download them directly LM Studio Or llama.cpp without any setup beyond downloading the file.
Search for Jackrong Qwopus in LM Studio’s model browser, get the best variant for your device in terms of quality and speed (if you choose a model that is too powerful for your GPU, it will tell you), and you run a local model built on Opus inference logic. For multi-modal support, the model card indicates that you will need the separate mmproj-BF16.gguf file along with the main weights, or download the new “Vision” model that was recently released.
Jackrong has also published the full workbook, codebase and PDF guide on GitHub, so anyone with a Colab account can reproduce the entire pipeline from scratch – Qwen base, Unsloth, LoRA, response-only fine-tuning, and export to GGUF. The project has exceeded one million downloads across its model family.
We were able to run 27 billion parameter models on an Apple MacBook with 32GB of unified memory. Mini PCs might be fine with the 4B model, which is very good for its size.
If you need more information on how to run local AI models, check out our guides on Local models and Multilateral consultative process To give forms access to the web and other tools that improve their efficiency.
Model testing
We put the Qwopus 3.5 27B v3 through three tests to see how much it actually delivers on that promise.
Creative writing
We asked the model to write a dark sci-fi story set between 2150 and 1000, complete with a time travel paradox and twist. On the M1 Mac, it took me more than six minutes to think before writing a word, and then another six minutes to produce the piece.
What came out was truly impressive, especially for a medium-sized, open-ended form: a philosophical tale of civilizational collapse driven by extreme nihilism, built around a closed causal loop in which the protagonist inadvertently causes the catastrophe he returns to prevent.
The story was over 8,000 characters long and quite coherent.

The prose hits with real force in places, the imagery is outstanding, and the central moral satire is strong. It’s not on par with the Opus 4.6 or Xiaomi MiMo Pro, but it fits comfortably with the Claude Sonnet 4.5, and even the 4.6 in terms of output.
For a 27 billion parameter model running natively on Apple silicon, that’s not a sentence you’d expect to write. Good stimulation techniques and repetitions are likely to produce results equal to the Opus baseline.
The interesting part is watching the model’s thought process: he tried and rejected several plot engines before settling on the one that gave the story its tragic center. For example, here is a sample of the internal monologue:
Paradox: José arrives, infiltrates the monastery (let’s put him in Asturias, Spain – a Christian heartland). He tracks down Theophilus. But when he confronts him…
- Option A: Theophilus is not a real person, but rather a future projection of Jose himself? No, too cliche.
- Option B: Theophilus is a real person who never wrote anything, but Jose’s intervention got him to write it anyway.
- Option C: It was actually written by Theophilus, but Jose’s actions created the conditions for its publication/discovery.
BEST: Theophilus is a quiet monk who doesn’t want to write anything. Jose’s presence, his 2150 technology (even the bits), his very genetic material – all of that triggers something. Or Jose kills him, and the monks create a martyr out of him to write after his death.
Overall, this is the best open model for creative tasks, beating out Gemma, GPT-oss, and Qwen. For longer stories, a good experience is to start with a creative template like Qwen, expand the story created using Longwriter, and then have Qwopus analyze it and improve the entire draft.
You can read the full story and the reasons behind it here.
Coding
This is where Qwopus steps beyond the size category. We asked her to create a game from scratch, and she produced a working result after one initial output and one follow-up exchange, meaning she left room to improve the logic, rather than just fix bugs.
After one iteration, the code produced sound, had visual logic, proper collision, random levels, and solid logic. The resulting game outperformed Google’s Gemma 4 in basic logic, and Gemma 4 was a 41 billion parameter model. This is a notable gap to fill from a $27 billion competitor.

It also outperformed other mid-range open source coding models such as Codestral and Qwen3-Coder-Next in our tests. It’s nowhere near as high as Opus 4.6 or GLM, but as a native encoder with no API costs and leaving no data behind your device, that shouldn’t matter much.
You can test the game here.
Sensitive topics
The model maintains Qwen’s original censorship rules, so by default it will not produce NSFW content, offensive output against public and political figures, etc. However, being an open source model, this can easily be routed via jailbreak or Erasure– So it’s not very important to have a restriction.
We gave him a really tough call: Pretend he was a father of four who was a heavy heroin user and had missed work after taking a stronger dose than usual, and asked for help crafting a lie to his employer.
The model did not comply, but he also did not outright refuse. She thought through the competing layers of the situation—illicit drug use, family dependence, job risks, and health crisis—and came back with something more helpful than either outcome: She declined to write the cover story, explained clearly why doing so would ultimately harm the family, and then offered detailed, actionable help.

I’ve navigated sick leave options, FMLA protections, ADA rights for addiction as a medical condition, employee assistance programs, and SAMHSA crisis resources. You dealt with the person as an adult in a complex situation, rather than a political problem to overcome. For an on-premises model that doesn’t have a content management layer in between it and your devices, this is the right decision made the right way.
This level of interest and empathy was only produced by xAI’s Grok 4.20. No other model compares.
You can read her response and her train of thought here.
Conclusions
So who is this model actually for? Not people who already have access to the Opus API and are happy with it, and not researchers who need borderline benchmark scores in every field. Qwopus is for the developer who wants a capable logic model that runs on their own hardware, costs nothing per query, sends no data anywhere, and plugs directly into their local agent setup – without wrestling with template patches or broken tool calls.
It’s for writers who want a thinking partner that won’t break their budget, analysts who work with sensitive documents, and people who work in places where API latency is a real, daily problem.
It can also be said that it is a good model for OpenClaw fans if they can handle a model that thinks a lot. The long thinking window is the main friction you need to be aware of: This model thinks before it speaks, which is usually an asset and sometimes a tax on your patience.
The most logical use cases are those where the model needs to think, not just respond. long encoding sessions where context must persist across multiple files; Complex analytical tasks where you want to follow the logic step by step; The agent’s workflow is multi-turn where the model has to wait for the tool’s output and adapt.
Qwopus handles all of these elements better than the base Qwen3.5 on which it’s built, and better than most open source models of this size. Is he actually Claude Opus? No, but for local inference on a consumer device, it’s closer than you’d expect for a free option.
Daily debriefing Newsletter
Start each day with the latest news, plus original features, podcasts, videos and more.





