Venu OS

Karpathy-style LLM Wiki · Multi-Agent Knowledge System
● LIVE — Retrieval Stage 3
Wiki Pages
loading…
Live Insights
Active knowledge blocks
Eval Hit Rate
Index Chunks
Canonical-only index
PHI Tagged
PRIVATE_SENSITIVE in wiki
TRASH Tagged
Noise removed from RAG
Distillation Coverage
loading…
Venu OS — Knowledge Architecture
RAW EVIDENCE EXTRACTS CANONICAL WIKI RETRIEVAL INDEX AGENTS OUTPUT ChatGPT Export 1,433 conversations Gemini History 10k+ interactions Audio Files 1,196 recordings PDFs / PPTXs Retainer decks, research Images / Screenshots 2,722 OCR processed OpenCode / Kiro Agent session history Audio Transcripts 11,746 files ✓ Image OCR 2,722 files ✓ Conv Markdown chatgpt_convs/ ✓ PDF/PPTX Extracts pptx/ + pdf/ ✓ Source Manifest venupedia.db ✓ 57 Wiki Pages 6,626 live Insights CONTEXT.md 2.2MB compiled context NEXUS.md + NOVA.md Agent entry points PHI Masked 0 unmasked · 126 tagged ✓ chunks.jsonl 7,088 chunks · BM25 build_chunk_index.py Excludes media_extracts query_kb.py top-k retrieval eval: 100% hit rate 25 questions · 23 pass NEXUS Senior Analyst · Blue Team NOVA Research Critic · Red Team Claude Code Orchestrator Gemini / Kiro Distillation agents sync_to_bots.py Retrieval-first injection LinkedIn Posts 11,899 → 25k followers Newsletter Issue #2 pending Research Reports $1.5k–$3k/mo retainers Toyow Comms BVI-compliant content write-back loop
NEXUS — Blue Team

Senior Analyst persona. Handles research synthesis, LinkedIn distribution, outreach intel, and RWA market analysis.

Synthesis LinkedIn RWA Research
NOVA — Red Team

Research Critic persona. Deep-dive intelligence, vulnerability assessment, compliance checking, and peer review of all NEXUS output.

Red Team Compliance Fact Check
Retrieval Pipeline

sync_to_bots.py injects targeted retrieval (RETRIEVAL_QUESTIONS → query_kb.py top-k 5) into agent prompts. Never injects full CONTEXT.md.

Retrieval-First BM25 7,088 chunks
Overall Score
92%
23 / 25 questions · top-8 retrieval
Session Start
64%
Session End
100%
↑ Excluded 37,942 chatgpt_conv chunks from index
↑ Added classification-policy.md canonical page
↑ Enriched roadmap.md with keyword anchors
↑ Index: 47,611 → 6,807 → 7,088 canonical chunks
Bucket Breakdown
Governance
100%
Growth
100%
Knowledge
85%
Operations
100%
Roadmap
100%
Stack
100%
Website
100%
Workflow
100%
2 Remaining Failures
↳ Toyow compliance terminology — query not reaching toyow.md top-5
↳ Stablecoin supply / holder count — detail thin in rwa-landscape.md
All 57 Wiki Pages — Live Index
Page Lines Insights Tags
Ingestion Sources
💬

ChatGPT Export

Full export processed — conversations, audio files, HTML. Classification complete.

1,433 convos
16 JSON files
169MB chat.html
✓ Done
🤖

Gemini History (Takeout)

17 swarm batches run. distillation_done_v4.txt (913KB) — maps all interactions to wiki pages.

10k+ interactions
17 batches
✓ Done
🎙️

Audio Transcripts

Groq Whisper pipeline. All processed into markdown. 0 AUDIO_PENDING stubs remaining.

11,746 files
1,196 audio sources
✓ 0 pending
🖼️

Image OCR

Gemini OCR pipeline. All images processed. 0 IMAGE_PENDING stubs remaining.

2,722 files
815 image sources
✓ 0 pending
📄

PDFs + PPTXs

Retainer decks, research partnerships, pitch decks. All wired to wiki source stubs.

2,054 PDF chunks
725 PPTX chunks
✓ Indexed
Storage Architecture
RAW/ — GITIGNORED 13GB total · chatgpt-export/ · _Final_Archive/ · immersible-io/ 🔒 Local only WIKI/ — GIT TRACKED ✓ 52 pages · 50MB · PHI clean · extraction-minimal branch ✓ GitHub META/KB/ — GITIGNORED 6,807 chunks · BM25 index · eval results · local-only 🔒 Local only META/ (canonical) — GIT TRACKED CONTEXT.md · schema.md · log.md · audit reports ✓ GitHub
Current Eval Targets (meta/kb/)
eval_questions.jsonl25 questions · 8 buckets
eval_summary.json100% hit rate
chunks.jsonl6,807 chunks
avg chunk length160.9 words
Git Commits — extraction-minimal
Latest 2026-05-15

feat: distillation pipeline — scan_undistilled, smart_distill, audio_insight_index

5 new scripts: scan_undistilled.py (coverage audit), smart_distill.py (Ollama/stub batch distiller), audio_insight_index.py (11,746 transcript indexer). kb_cycle.py --distill flag. Dashboard distillation coverage metric added.

5807669

fix(ci): point all workflows to extraction-minimal, add pip install, fix dead branch refs

kb_cycle.yml + dashboard_data.yml: main→extraction-minimal, rank-bm25 install added, full→core profile, dead closure branch removed. Stops daily failure emails.

d98ff18

feat: add VenuOS MCP server (7 tools, 19/19 tests passing)

mcp_server/server.py live — query_venuos, read_wiki_page, list_wiki, get_context, log_decision, search_wiki_full_text, run_eval. Python 3.11 venv. Wired into Claude Desktop.

0e99eee

docs: full repo documentation overhaul + GitHub metadata + CI eval

README rewritten. docs/ARCHITECTURE.md + CONTEXT_PORTABILITY.md added. GitHub: description, 6 topics, homepage set. Default branch → extraction-minimal. 7 dead branches deleted.

6ec7f49

chore: rescue 5 pages from worktrees, root cleanup, 57 wiki pages

26 legacy scripts → scripts/legacy/. 3 worktrees pruned. 5 unique pages rescued (acquisition playbook, clients, website plan, RWA briefing, newsletter deck). 7,088 chunks.

1b10189

feat(eval): 92% → 100% — Toyow terminology + stablecoin holder count fixes

Excluded 37,942 chatgpt_conv chunks. Index: 47k→6.8k→7.1k canonical-only. classification-policy.md added.

daf6e6e

feat(os): classification policy, roadmap enriched

classification-policy.md (governance), roadmap.md keyword enrichment, agentic-ecosystem.md updated.

7e8a2d0

feat(os): agent pass — PHI cleanup, wiki enrichment

PHI cleanup: venu.md (vehicle reg), life-log.md (medical), ai-stack.md. toyow.md compliance rules. rwa-landscape.md updated. roadmap.md + rwa-migration-thesis.md created.

02b0061

feat(os): Venu OS finalization — PHI cleanup, cross-agent audit

67 files · 11MB. 6 wiki pages PHI-cleaned. 3 cross-agent audit reports. 5 new wiki pages. Retrieval-first verified.

b50cd73

Initial commit of infrastructure

.env untracked · .gitignore established

Session Agent Work
Agent 1 — PHI Cleanup
venu.md, life-log.md, ai-stack.md, research.md
✓ Done
WikiEnrichmentAgent — Content depth
toyow.md, pricing.md, rwa-landscape.md, rwa-migration-thesis.md
✓ Done
Claude Orchestrator — Retrieval fix
build_chunk_index.py, classification-policy.md, eval 64→92→100%
✓ Done
Agent A — NEXUS/NOVA update
agentic-ecosystem.md retrieval-first note
In progress
Agent B — ChatGPT REAL_SIGNAL promotion
~400 convos → wiki Insights
In progress
Agent C — Git GC + PR to main
git gc --aggressive, cherry-pick PR
In progress
Operating Focus (CLAUDE.md)
1

immersible.io Website

OG preview repair + Next.js/Vercel migration — deferred this session

2

Newsletter Issue #2

Ready to publish — deferred this session

3

Ramesh Shrikonda Follow-up

Highest-intent pipeline lead — deferred this session

4

LinkedIn 11,899 → 25,000

ICP growth strategy — deferred this session

PHI / PID Safety Status
0
Unmasked sensitive strings in canonical wiki
rg scan verified · 2026-05-14 IST
126 sensitive strings masked ✓
🏥
Medical records — diagnoses, blood reports, prescriptions, typhoid treatment, low BP, bone fracture inquiry
MASKED
🪪
Identity documents — E-disability cert, E-UDID Card, registration numbers (3668/20000/...), hospital IDs
MASKED
👤
Personal details — Full name + DOB + reg no combo (Bojanapalli Venkateshwarlu), Out Patient Card
MASKED
🚗
Vehicle documents — RC (Renault Kiger, reg date), Edelweiss insurance (Policy 900196633)
MASKED
💳
Financial IDs — PAN references, Aadhaar partial, GSTIN context
MASKED
Git Safety Checks
🔑
.env — API keys, secrets
NOT TRACKED
📁
raw/ — 13GB evidence archive
GITIGNORED
🎵
meta/media_extracts/ — audio, OCR, convs
GITIGNORED
📊
meta/kb/ — retrieval index
GITIGNORED
🔐
ssh keys/ — credentials
NEVER STAGED
📋
*.callouts.md — backup files
EXCLUDED
Classification Policy
REAL_SIGNALDurable strategic value → promoted to Insight PRIVATE_SENSITIVEPHI/PII → masked, never committed TRASHNoise → tagged, excluded from RAG RETRIEVAL_ONLYSituational value → index only, no wiki prose DUPLICATERepeated content → tagged, deduped
Total Files
Done
Pending
In Progress
Progress
Active Workers
MCP Chat — Talk to the Swarm Brain
🧠
MCP Swarm Brain — send commands or ask questions
Try: "status", "queue", "workers", "stats"
LIVE SWARM MONITOR — open full screen ↗ port 7788 · refreshes every 6s
12 Gemini OAuth Accounts
AccAliasGmailRolePrimary Model
acc1forpcvenusriramforpc@Workergemini-2.5-flash
acc2vfxvenusriramvfx@Workergemini-2.5-flash
acc3veveveveregion@Workergemini-2.5-flash
acc4venuimmvenuimmersible@Workergemini-2.5-flash
acc5nexusnexusimmersible@🔵 NEXUSgemini-3.1-flash-lite-preview
acc6novanovaimmersible@🔴 NOVAgemini-3.1-flash-lite-preview
acc7immerscolimmersivecollectibles@Workergemini-3-flash-preview
acc8coalthorcoalthor@Workergemini-3-flash-preview
acc9chip2digitchip2digit@Workergemini-3-flash-preview
acc10logvaguelogvague@Workergemini-3-flash-preview
acc11srinivasbojanapalli.srinivas@Workergemini-3-flash-preview
acc12tv678tradingview678@Workergemini-3-flash-preview
Sandboxed at ~/.gemini_sandbox/accN/.gemini/ · isolation via HOME= override · each has independent OAuth token
Model Rotation — 96 Daily Quota Pools
12 accounts × 8 models = 96 independent daily quotas. When one model quota hits, worker auto-rotates to next.
1gemini-3.1-flash-lite-preview← NEXUS + NOVA primary
2gemini-2.5-flash← acc1-4 primary, proxy default
3gemini-3-flash-preview
4gemini-2.5-flash-lite
5gemini-2.5-pro
6gemini-3.1-pro-preview
7gemma-4-31b-it
8gemma-4-26b-a4b-it
OpenClaw Proxy — Port 18790
OpenAI-compatible proxy routes Telegram bot + any OpenAI client through the 12 sandboxes. Round-robin with quota cooldown.
# openclaw.json (primary provider)
"primary": "venuos-gemini/gemini-2.5-flash"
# fallbacks → gemini-2.5-flash-lite → openrouter
The Golden Architecture Rules
1
1 account per terminal. 1 sandbox per account. Never run plain gemini — always use HOME=/path/to/sandbox gemini. Plain gemini defaults to ~/.gemini (acc1) and overwrites isolation.
2
Never use GEMINI_HOME — it does nothing. Gemini CLI reads $HOME via Node.js os.homedir(). Only the HOME= env override works for credential isolation.
3
OAuth refresh tokens are machine-independent. Tokens in ~/.gemini_sandbox/accN/.gemini/oauth_creds.json work from any IP, any OS. Safe to rsync to OCI for remote execution.
4
Claude = strategy. Gemini ×12 = execution. Claude (via MCP) orchestrates tasks and writes to wiki. Gemini workers batch-process files. OpenClaw handles Telegram. Each layer has a clear job.
Verified Isolation — 2026-05-17
acc1 — venusriramforpc@gmail.com (gemini-2.5-flash) acc2 — venusriramvfx@gmail.com acc3 — veveregion@gmail.com acc4 — venuimmersible@gmail.com acc5 (NEXUS) — nexusimmersible@gmail.com acc6 (NOVA) — novaimmersible@gmail.com acc7-12 — all isolated, all running
Permanent Commands — Save These
Start all services (after reboot)
# Starts dashboard (7788) + proxy (18790)
bash "/Users/venu/Desktop/Venu OS/scripts/start_venuos_services.sh"
Use any account interactively
bash "/Users/venu/Desktop/Venu OS/scripts/use_account.sh" nexus
# aliases: forpc vfx veve venuimm nexus nova
# immerscol coalthor chip2digit logvague srinivas tv678
Launch task across all 12 workers
bash "/Users/venu/Desktop/Venu OS/scripts/swarm_any_task.sh" "your task here"
bash scripts/swarm_any_task.sh --file list.txt
bash scripts/swarm_any_task.sh --accounts "5,6" "nexus+nova task"
Terminal live dashboard
bash "/Users/venu/Desktop/Venu OS/scripts/swarm_dashboard.sh"
Test account isolation (run all 4 in parallel)
# Each must reply independently, no cross-contamination
HOME=/Users/venu/.gemini_sandbox/acc1 gemini -m gemini-2.5-flash -p "Reply only: acc1 working"
Services · ports
7788Swarm Dashboardlocalhost:7788 18790Gemini Proxylocalhost:18790 18789OpenClaw Gatewayloopback only
Venu OS — Agent Orchestration Flow
You (Venu) Strategy · Intent Claude Notion AI · Strategy MCP · Dashboard Control · Monitoring Gemini ×12 OAuth Sandboxes 14,253 files → wiki NEXUS (acc5) gemini-3.1-flash-lite Blue Team · Research NOVA (acc6) gemini-3.1-flash-lite Red Team · Critique OpenClaw Proxy Telegram · Daily Assistant port 18790 · round-robin wiki write-back