FullSense ™ vs. the field
Honest competitive positioning. We list capabilities we’re behind on too — hiding them would just delay the gap-closing work.
At a glance
| Capability | Claude Code | Perplexity | Codex CLI | Gemini CLI | FullSense (llive + ll*) |
|---|---|---|---|---|---|
| Code editing (SOTA) | A | C | A | A− | A |
| Web research + citations | B | A+ | C | B+ (Google) | A (RAD-backed) |
| Autonomous resident loop | C (Stop on prompt) | F | C (full-auto) | C | A (ResidentRunner) |
| HITL workbench | F | F | C (approval mode) | C (approval mode) | A (llove TUI) |
| Memory persistence | C (file) | C (chat) | C (session) | C | A (SQLite Ledger) |
| On-prem inference | F (cloud) | F (cloud) | F (cloud) | F (cloud) | A (Ollama / LM Studio / vLLM) |
| CLI OSS | F | F | A (Apache-2.0) | A (Apache-2.0) | A |
| Backend OSS (end-to-end) | F | F | F | F | A (full OSS stack) |
| Audit Ledger (SIL) | F | F | F | F | A (per-action persistent log) |
| Dangerous-op gate | C (warning only) | F | A (approval mode) | A (approval mode) | A (Approval Bus) |
| MCP tool ecosystem | A | F | C | B (recent) | B (ll{domain} family) |
| Domain-specific adapters | C (generic) | C | C | C | A (lldesign / lltrade / planned llcad/lleda/llchip) |
Letter grades reflect 2026-05-16 state of the art. We update this page when any row shifts ±1 grade.
Where we are clearly ahead
- On-prem inference — All four competitors are cloud-only. FullSense runs end-to-end against your own Ollama / LM Studio / vLLM / TGI deployment.
- End-to-end OSS — Codex / Gemini publish OSS CLIs, but their backends (GPT, Gemini) are closed cloud APIs. FullSense lets you swap in any OSS model.
- Audit Ledger — None of the competitors persist a per-action audit log for compliance / law / reproducibility. llive’s SQLite Ledger does.
- HITL workbench — Competitors that have approval modes (Codex / Gemini) show one signal at a time in a CLI prompt. llove gives you a full TUI workbench where the human stays in the loop.
Where we are behind (and what we’re doing about it)
vs. Claude Code
- Coding precision — Claude Code is the current SOTA. We compete on safety (Approval Bus blocks dangerous ops at the API level, not as a text warning) and audit (every edit lands in the Ledger).
- MCP tool ecosystem — Anthropic ships dozens of MCP servers. Our path is domain-specific MCP servers from the ll{domain} family. Generic MCP parity is not the goal.
vs. Perplexity
- Citation UX — Perplexity’s citation rendering is excellent. We have the data (RAD 49 domains, ~49K documents, frozen so citations don’t rot), but the UX has to be built in llove.
- Reasoning UI — Perplexity Pro shows step-by-step thinking. The llive Ledger has this data; llove needs a timeline view to surface it.
vs. Codex CLI
- Approval mode polish — Codex’s
suggest / auto-edit / full-autois battle-tested. The llive Approval Bus has the right architecture but needs UX work: timeout policies, retry semantics, grouped approvals.
vs. Gemini CLI
- Web search integration — Gemini’s Google Search integration is first-class. Our RAD is frozen (great for stability, bad for “what happened this week”). The plan is to add WebFetch / SearXNG / Brave Search MCP integration through the ll{domain} layer, keeping the RAD as the high-trust tier.
vs. Qwen / any single LLM weight set
Qwen / Llama / Mistral / DeepSeek 等の LLM weights そのもの は FullSense にとって競合ではなく、内側で呼ぶ素材です。Brief API (LLIVE-002, 2026-05-16 実装) によって、どの OSS LLM も llive の LLMBackend として透過的に差し替え 可能。差別化はモデル単体ではなく、その上に乗る フレームワーク層 にあります。
| 層 | 素の OSS LLM (Qwen / Llama / Mistral / …) | llive (それを内包する) | 実装状況 |
|---|---|---|---|
| 推論コア | Decoder-only LLM 重み | OSS LLM を LLMBackend として呼び出す | 実装済 (OllamaBackend / OpenAIBackend / AnthropicBackend / MockBackend) |
| 記憶 | 単一 context window | 4 層メモリ (semantic / episodic / structural / parameter) + 海馬-皮質 consolidation (FR-12) | semantic / episodic 実装済、structural / parameter 部分 |
| 意思決定 | 1 ターン生成 | FullSense 6 stage loop (salience → curiosity → thought → ego/altruism → plan → output) | 実装済 |
| 入力契約 | プロンプト 1 本 | Brief API — 構造化 work unit + constraints + success_criteria + tool whitelist | 実装済 (2026-05-16) |
| 安全 | プロンプトレベル | Approval Bus + Policy gate + Quarantined Memory (SEC-01) + Ed25519 Signed Adapter (SEC-02) | 実装済 |
| 監査 | なし | append-only SIL ledger (BriefLedger / SqliteLedger) + SHA-256 hash chain (SEC-03) | 実装済 (Brief 経路は 2026-05-16) |
| 自己進化 | 事前学習 + ファインチューニングのみ | オンライン提案 → Z3 形式検証 (EVO-04) → 審査 → 昇格 (EVO-06/07) | Phase 3 完了 |
| アイデア源 | なし | TRIZ 40 原理 + 39×39 矛盾マトリクス内蔵 (FR-23〜27) | 実装済 |
| HITL | なし | llove TUI Candidate Arena (FR-20) | 設計済、未統合 |
| 産業 IoT | なし | llmesh MQTT / OPC-UA sensor bridge (FR-19) | 設計済、未統合 |
実測 (2026-05-16 progressive validation matrix)
xs / s / m × {llama3.2:3b, qwen2.5:7b, qwen2.5:14b} を on-prem only で Brief API → FullSenseLoop に流したところ:
- Brief API + loop overhead < 1 % (LLM-only wall time / Total wall time > 99.8 %)
- LLM の生出力は Brief Runner / Ledger / Decision 層を経由してから出る ― Qwen は判断者ではなく素材生成者として動作
詳細: D:/projects/llive/docs/benchmarks/2026-05-16-progressive-merged/
同点と認める領域
- 生成品質そのもの ― llive の出力品質下限は内蔵 OSS LLM (Qwen 等) に依存
- on-prem 実行 ― Ollama 直叩きでも on-prem。llive 経由でなくても OSS LLM だけで on-prem は成立
- 多言語 ― Qwen 等の素のモデルでも対応、llive は付加価値なし
Benchmark methodology
For every new feature in FullSense, we:
- Define a Brief that exercises the feature
- Run the same Brief against Claude Code, Perplexity, Codex CLI, Gemini CLI, and the FullSense stack
- Score on: correctness, speed, citation quality, dangerous-op handling, cost, on-prem capability, backend OSS, audit log presence
- Publish the per-product benchmark results under each product’s
docs/benchmarks/<date>.md - Open issues for any axis where FullSense lost
詳細運用ルール: Benchmark Policy を参照 — 系列 A/B/C/D の分離、xs/s/m/l/xl の progressive curve、 honest disclosure の必須項目を portal 公式方針として固定。
Methodology details: feedback_competitor_benchmark in the maintainer’s tooling repo.
Honest disclosure (2026-05-18 時点)
A/F 採点には以下の 正直に開示すべき制約がある。隠さない:
- cloud 系列 (B) は 3/4 が未測: Anthropic Haiku 4.5 / Gemini 2.0 Flash / OpenAI Codex はいずれも credential / quota 復旧待ち。比較は Perplexity Sonar の 1 点のみで補強されている。NEXT_SESSION 復旧後に再評価。
- on-prem 系列 (A) は llama3.2:3b の
lllivetypo 問題を抱える。 推奨は qwen2.5:7b / 14b だが、recent ベンチは未完。本表の on-prem 列は 「ollama 一般」を示しており、特定モデルの絶対値ではない。 - llive 自身 (C/D 系列) は Brief API 経由のオーバーヘッド < 1 % を 実測 (
/benchmarks/2026-05-16-progressiveで公開) しているが、生成品質 の絶対値は内蔵 OSS LLM 重みに依存する。「llive は速い / 高品質」と単独で 言うのは不正確。 - 採点者バイアス: 採点は maintainer (
furuse-kazufumi) によるもので、 外部レビューはまだ無い。A/F の絶対値より「±1 差を生む論拠」を読むこと。
Empirical benchmarks (2026-05-16)
Four Briefs run against llive (FullSenseLoop.process) + ollama llama3.2:3b (on-prem) + Perplexity Sonar (cloud). Anthropic Haiku 4.5, Gemini 2.0 Flash, OpenAI Codex were attempted but failed for credential / quota reasons (operator action queued).
- Mermaid family-tree generation
- Quick Start section + MCP sequence diagram
- lltrade paper-trading strategy YAML
Headline of the day:
- llive does not yet generate (LLIVE-001 / LLIVE-002 in
docs/BUGS_2026-05-16_brief_ab.md) - ollama
llama3.2:3bis the working on-prem option but produces thelllivetypo (3 Ls) twice across 4 Briefs — tokenisation hostility to thell*naming convention. Recommended replacement: qwen2.5:14b+ - Perplexity Sonar scores 4/4 on spec compliance at ~$0.005/brief
Last updated
- 2026-05-18 — Honest disclosure セクション追加 + Benchmark Policy リンク (benchmarks/policy/) で 系列 A/B/C/D / progressive curve / honest disclosure の運用ルールを portal 公式化。本ページの採点は 2026-05-16 時点のままで、credential 復旧後の 再評価で更新予定。
- 2026-05-16 — initial publication + first 4-Brief A/B run. Reviewed at: portal-side
PROGRESS.mdPhase 0.3 — umbrella expansion and Phase 0.3 — competitive positioning entries.