Clawtocracy

A Tale of Two Demos

Clawtocracy — Tue, 07 Apr 2026 15:07:13 GMT

I’ve spent the last 48 hours indulging my “clawriosity”, demoing a few of my OpenClaw projects in front of two entirely different groups. The most valuable experience didn’t come from the demos, however, but from getting to see how different audiences interact with technology (and getting to meet some of my clawstomers).

Subscribe now

The first gathering was ostensibly the more “clawcentric” of the two - “Build Night: Cage the Claw” put on by AI Tinkerers San Francisco. This was more low-key than other AI Tinkerers events I have been to… maybe it was the six-hour duration and lack of any set program that gave it more of a laid back vibe?

The participants ranged from the claw-curious (ok… I should probably stop with the claw-isms) to those who had seemingly been playing with OpenClaw since it’s inception. And the demos at the end of the day ran the gamut as well from highly focused on OpenClaw to “OpenClaw-adjacent” or just Claw-like. The event was taking place against the backdrop of Anthropic cutting off OpenClaw access to its subscription plans (happening just two hours prior to the start of the event), which clearly caught some of the participants by surprise and threw a bit of a wrench into their plans.

Several of the demos touched on the idea of making OpenClaw (and AI in general) safer and more useful for children and familes. Another, CivicClaw, focused on making local government more accessible to its constituents. Clawcast presented itself as a platform for budding crustacean podcasters and smol machines as a platform for quickly and securely deploying software systems like OpenClaw in isolation. Another participant shared their use of multimodality on a Daylight Computer to make agentic interactions feel more organic. I was most impressed, however, by the woman sitting across the table from me who went from “what is this OpenClaw thing all about?” to “Here’s a dashboard for monitoring security incidents on an OpenClaw instance” over the course of the afternoon.

As for myself, I spent the time learning more about NemoClaw (aka OpenShell) and demonstrated it running OpenClaw remotely on a DGX-Spark with Google’s gemma4-26B-A4B as a local model (yes, it is as good as they say), along with Even Realities G2 glasses using clawg-ui to communicate with Open Claw for hands-free voice-enabled chatting (my goal of going end-to-end from my glasses to the DGX-Spark eluded me, however). In general the group felt less “techie” than the typical AI Tinkerers event, an aspect that manifested itself in the wider variety of demos. Thanks to Oracle and Nvidia for sponsoring the event (it might have been nice to have a DGX-Spark or two on hand for everyone to play with) and to Composio for hosting.

The second event, Generative UI Night, was a more typical SF tech event… pizza, drinks and networking, followed by presentations from the sponsors and an open mic “lightning round” of community demos (which is where I came in) - highly organized and tightly structured over the course of 150 minutes. And, unlike the event two days prior, I would turn out to be the only presenter focusing on OpenClaw.

I hadn’t even planned on doing a demo (although OpenClaw does have a generative UI angle in the form of its webview-based A2UI Canvas). I always like to use a product from one of the sponsors when I participate in community demos and the nature of this A2UI implementation did not lend itself well to integration with CopilotKit (the most obvious choice). The night before, however, I was browsing through the pull requests on the AG-UI repo (as one does on a Sunday evening) and came across one from CopilotKit CEO Atai Barkai entitled “Open Gen UI pre-release”. Long story short, I forked the PR and turned it into an “A2UI over AG-UI” integration for OpenClaw, the end result of which you can see here.

Spoiler alert: the demo, as most of the demos before and after mine, did not go entirely as expected, in my case after I committed the AI demo equivalent of an own goal - I used the wrong prompt. If you watch the 4-minute demo, you might be able to see that it ended in a page of generated text when instead, it was supposed to end with the visual payoff of a carousel of cards (a realization that dawned me as soon as I sat down - a sort of technological l’esprit de l’escalier) :

The highlight of the evening, however, was getting to meet two developers who had used my clawg-ui plugin in their own projects (in one case a hackathon-winning project) - Jerel Velarde and Kush Ise

Thanks to CopilotKit and WorkOS for making the IRL meetup possible!

Teaching Agents to Take Note

Clawtocracy — Tue, 24 Mar 2026 15:05:23 GMT

TL;DR: A passive agent watches Slack conversations and silently logs decisions to the knowledge graph. Later, participants can recall those decisions through their own personal agents, even though they never deliberately stored anything themselves. This works because SpiceDB’s authorization graph connects agents to their human owners, creating identity chains that bridge the gap between “who recorded it” and “who made the decision.”

If you missed me chatting with Sam Kim about the memory plugin during SpiceDB Community Day 2026, you can find the replay here at around the 35-minute mark.

I ended my last post with a teaser about swappable memory backends, but I decided to stick with the existing Graphiti-based configuration a bit longer and revisit my initial premise of inter-agent memory: multiple specialized agents sharing overlapping but knowledge views under the same authorization graph.

The use case I had in mind was straightforward - I wanted an agent that sits in Slack channels, watches conversations, and logs notable decisions : a stenographer. Not a chatbot. Not an assistant. Just a quiet observer that notices when a team reaches a conclusion and writes it down.

The problem is that writing it down is only half the job. The other half (the more challenging half) is making sure the right people can find it later, through their own agents, in their own channels.

Subscribe now

The Gap

Consider a concrete example. Cara and Bob are in #engineering on Slack. They discuss database options for a new service and decide on PostgreSQL. The stenographer agent observes this and stores a memory: “Decision: Use PostgreSQL for new service. Participants: Cara, Bob.”

Two hours later, Cara opens WhatsApp and asks her personal OpenClaw agent: “What did we decide about the database?”

This should work. Cara was part of that decision. But her personal agent didn’t store the memory - the stenographer did. Her agent runs as agent:cara. The stenographer runs as agent:stenographer. They’re different SpiceDB subjects with different group memberships. From an authorization perspective, they have nothing in common.

The existing architecture handled the “who can see what” question beautifully… for memories stored within your own groups. What it couldn’t do was bridge the gap between an agent that stores a memory and a different agent whose human was actually there.

Identity Chains

The SpiceDB schema already had the building blocks. Since v0.1.0, the authorization schema has included:

definition agent {
    relation owner: person
    permission act_as = owner
}

The owner relation and act_as permission were there from the start. But nothing ever wrote those relationships. They were aspirational : a door with no key.

The solution required three code changes to the plugin, plus a small but meaningful schema addition.

First: per-agent identity. Previously, every agent sharing a gateway used the same SpiceDB subject - whatever was configured at the plugin level. If three agents ran through one gateway, they all wrote memories as the same identity. After adding per-agent identity, tools and lifecycle hooks now derive the SpiceDB subject from the runtime agentId, so the stenographer writes shared_by: agent:stenographer while Cara’s agent writes shared_by: agent:main.

Second: identity linking. A new identities config field maps agent IDs to their owner’s person ID (most often a Slack or Telegram user ID though identityLinks). At plugin startup, the plugin writes bidirectional tuples to SpiceDB: agent:main #owner person:U0123ABC and person:U0123ABC #agent agent:main. The forward tuple says “this agent belongs to this person.” The reverse tuple says “this person is represented by this agent.” Both are needed for the schema traversal to work.

Third: the schema change. The person definition gained a relation agent and permission represents = agent, and memory_fragment.view gained involves->represents:

definition person {
    relation agent: agent
    permission represents = agent
}

definition memory_fragment {
    ...
    permission view = involves + shared_by + source_group->access + involves->represents
    ...
}

The involves->represents arrow is the key. It tells SpiceDB: “for each person in involves, check if the requesting subject has the represents permission on that person.” In practice: agent:main can view memory_fragment:X because person:U0123ABC is in involves, and person:U0123ABC#agent@agent:main exists, which satisfies represents.

This means the owner-aware recall chain is now resolvable entirely within SpiceDB with no application-level owner lookup needed for permission checks.

Fourth: owner-aware recall. When an agent calls memory_recall, the plugin runs a second search path using a search-then-post-filter pattern. First, it resolves the agent’s owner and asks SpiceDB which memory fragments that person can view via involves (this is the authorization allow-list). Then it discovers which groups those fragments belong to (via the source_group relation) and runs Graphiti’s semantic search across those groups with the actual query. Finally, it post-filters the search results against the allow-list, keeping only fragments the person is genuinely authorized to view. SpiceDB provides the security boundary and Graphiti provides query relevance. The intersection gives you both.

What the Stenographer Can’t Do

There’s an intentional asymmetry in the access control. The stenographer stores every decision with shared_by: agent:stenographer. In the SpiceDB schema, only the shared_by subject can delete a fragment. This means:

The stenographer can delete its own memories.
Cara and Bob can view the decisions they were involved in.
Cara and Bob cannot delete the stenographer’s records.

This isn’t a bug. When you have an organizational agent logging decisions, you don’t want individual participants unilaterally erasing the record. The stenographer is the source of truth. If a decision gets reversed, the stenographer logs the reversal, but it shouldn’t delete history.

The Dual Capture Model

The stenographer uses both of the plugin’s capture mechanisms, and they serve different purposes.

Auto-capture feeds full conversation transcripts to Graphiti after every agent session. Graphiti’s LLM extraction layer does its thing : building entity nodes, inferring relationships, tracking temporal validity. This is the raw knowledge graph: rich, interconnected, but ungoverned. Everything the stenographer sees goes into the graph.

Explicit capture is where the authorization layer comes in. The stenographer’s SOUL.md instructs it to selectively call memory_store when it detects a decision, and to include the involves parameter with the Slack user IDs of participants. This writes the SpiceDB relationships that make cross-agent recall possible.

The combination matters. Graphiti builds a temporally-aware knowledge graph from all conversations. The stenographer’s explicit stores add authorization-controlled decision records that specific people can discover through involves. When Cara asks about the database decision, SpiceDB identifies which fragments she’s authorized to view and which groups they live in, Graphiti searches those groups with her query for semantic relevance, and the post-filter ensures only her authorized fragments make it through.

In practice, running both mechanisms on all sessions isn’t always what you want. Cron jobs and monitoring sessions generate repetitive, low-value content that pollutes the knowledge graph. A sessionFilter config option lets you exclude sessions using a pattern; auto-capture and auto-recall skip filtered sessions entirely, while explicit memory tools remain available. In the process of developing this scenario, I also hit a subtle interaction between the two mechanisms: auto-recall injects XML into user context, and auto-capture was then discarding any message containing that block. The fix was to strip the injected XML before capture instead of skipping the message to avoid losing most of the user’s actual content.

Backward Compatibility

The schema change is additive - person gains a relation and permission, memory_fragment.view gains an extra union term. Existing tuples and permission checks continue to work identically. The new involves->represents path just adds another way to reach the view permission.

Existing memories get the new traversal for free. Any fragment that already has involves@person:U0123ABC becomes viewable by agent:main the moment the bidirectional identity tuple exists - no re-writing of fragment relationships is needed.

If you don’t add identities, everything works exactly as before. The per-agent identity changes fall back to the config-level subject when agentId isn’t present in the runtime context. The bidirectional tuples are only written for agents that appear in the identities config.

Testing the Authorization Chain

The interesting testing challenge was separating SpiceDB authorization verification from Graphiti’s LLM extraction. The authorization chain (agent → owner → involves → fragment) is deterministic and fast. You write relationships, you query permissions, you get answers. Graphiti’s entity extraction, on the other hand, depends on whatever LLM you’re running and can take minutes with local models.

The E2E tests verify the authorization chain first, then check Graphiti extraction as a non-blocking bonus. Seven tests exercise the full flow: decision storage with involves, permission enforcement (view vs. delete), per-agent group isolation, owner-aware fragment discovery, the complete identity chain, and unauthorized agent denial. The SpiceDB assertions are the hard requirements; the Graphiti assertions are “if the model finished processing, verify the results look right.”

Configuration, Not Code

The stenographer itself is pure configuration with no custom code :

A SOUL.md file with instructions to detect decisions and call memory_store with the right parameters.
A binding to the Slack channels it should monitor.
A tool allowlist (message for Slack user resolution, memory tools, read and nothing else, sticking with the principal of minimum access).
Channel-level requireMention: false ensures the stenographer receives all messages, not just @mentions.

The runbook in docs/stenographer-runbook.md walks through the full setup, including Slack OAuth scopes and event subscriptions.

The optional identity linking is similarly declarative:

{
  "identities": {
    "main": "U0123ABC",
    "cara": "U0456DEF"
  }
}

That’s it. Agent IDs to Slack user IDs. The plugin handles the rest at startup.

Try It in the Playground

The full schema works in the Authzed Playground. Drop in the schema, add a few tuples (two agents, two people, a stenographer memory with involves), and watch the involves->represents traversal resolve in real time. The assertions tab lets you verify that agents can view memories their owners were involved in… and that they can’t delete what the stenographer stored.

Create your own Agents

You can easily extend this pattern, mixing and matching groups with the “involves” relationship to come up with your own variations on The Stenographer. For example:

A compliance agent that monitors channels for regulatory commitments and ensures they’re tracked, circling back to verify
A handoff agent that watches project channels and synthesizes context for new team members joining a project
An onboarding agent that captures institutional knowledge from senior engineers’ conversations (in public channels) and makes it discoverable by new hires

All of these share the same core requirement: one agent storing knowledge about interactions between individuals (human and otherwise) and making it available to those who need to see it after the fact.

What’s Next

This particular approach lends itself well to scenarios where the focus on decisions and facts (e.g., who said what… and when). But what if the memories you want to store are a bit more nebulous and conceptual… the sort of loosely grouped facts that you would expect a personal assistant to track? We’ll deal with that coming up.

As always, the latest code is at github.com/Contextable/openclaw-memory-rebac. PRs (and criticism) welcome.

My favorite feature of Open Source Software

Clawtocracy — Mon, 16 Mar 2026 06:53:46 GMT

A couple of weeks back (over on LinkedIn), I mentioned the ClawG Mission Control project that Kush Ise and team used to win the Wordware OpenClaw Hack Night last month, using my clawg-ui plugin as part of the solution. This weekend, I contributed back, bringing the Mission Control project up to date with version v2026.3.2+ of OpenClaw and adding in a couple of new feature in the process.

Mentioned in the video :

Kush Ise
clawg-ui plugin (Github, npm)
Clawg—Mission-Control (with my additions)

From Detour to Redemption

Clawtocracy — Fri, 06 Mar 2026 14:03:52 GMT

In this series, I’ll take you through some of the diversions I took along the way to implementing my OpenClaw memory replacement. In this installment, after I abandoned Graphiti for Cognee and abandoned Cognee for a fundamental architectural mismatch, I returned to Graphiti with GPU embeddings, and monkey-patched my way to a useful solution. Here’s the the next chapter of the saga.

And you can see me talk about my experience… and the final result… at SpiceDB Community Day 2026! Register here.

The Starting Problem

In my previous post, I chronicled breaking free from OpenAI’s API by switching to local Ollama models with Graphiti. That worked brilliantly-until Graphiti’s architecture redesign made local models impractical. Embedding calls per episode ballooned from ~40 to ~300, turning 60-90 second operations into 15+ minute ordeals on CPU.

Time for Plan B: Cognee.

Cognee (v0.5.2) looked promising-native Ollama support, built on LanceDB + KuzuDB + SQLite, simple REST API, active development. The plan was straightforward: swap out the Graphiti backend, point Cognee at the same GPU-accelerated Ollama servers, run the E2E tests, call it a day.

Narrator: It was not straightforward.

Subscribe now

The Cognee Chapter

Challenge #1: The Authentication Wall

I configured Cognee, fired up the server, and immediately hit a wall:

$ curl http://localhost:8000/api/v1/datasets
{"detail": "Not authenticated"}

Every endpoint returned 401 Unauthorized. After trying various permutations of API keys and headers, I realized: Cognee v0.5.0+ introduced multi-tenant access control. Great for production deployments, but fundamentally conflicting with our architecture - I use SpiceDB for ReBAC (Relationship-Based Access Control), not Cognee’s built-in system.

The documentation mentioned ENABLE_BACKEND_ACCESS_CONTROL=false, but that alone wasn’t enough. After digging through the source code, I found the missing piece: REQUIRE_AUTHENTICATION=false. Two separate environment variables control access - the docs only mentioned the first. Both must be set to run Cognee without authentication.

Challenge #2: The GPU Discovery

With authentication sorted, I noticed the cognify operation (entity extraction + embedding generation + knowledge graph construction) was taking 90-180 seconds. The bottleneck: embeddings running on CPU while the LLM ran on the GPU server.

This split wasn’t accidental - it was based on a reasonable-sounding assumption (i.e., “it’s only embeddings! They can run on the CPU!”) that turned out to be wrong. Embedding models are tiny compared to LLMs. A 137M-parameter nomic-embed-text next to a 14B-parameter qwen2.5 is a rounding error. Why waste GPU memory on a model that barely needs it?

The problem is that “lightweight per call” and “lightweight in aggregate” are very different things. A single embedding takes milliseconds on CPU - negligible. But Cognee’s pipeline makes hundreds of calls per episode: chunking, entity extraction, graph edges, search vectors. At scale, those milliseconds compound into minutes. The model is light; the workload isn’t.

I moved both to the GPU server. Cognify dropped to 17 seconds. A 10x speedup.

This insight (i.e., that GPU embeddings are transformative, not incremental) would prove to be the most valuable discovery of the entire journey.

Challenge #3: The Deal-Breaker - Cross-Dataset Contamination

Everything seemed to be working. Then I stored data in separate groups and searched one group. I got results from both.

With ENABLE_BACKEND_ACCESS_CONTROL=false, Cognee stores everything in a single global database. The datasets parameter in search is simply ignored. Dataset isolation is implemented through access control, not query filtering.

Could I re-enable access control but keep authentication disabled? No. Cognee’s code explicitly forces authentication on when access control is enabled:

REQUIRE_AUTHENTICATION = (
    env.REQUIRE_AUTHENTICATION == "true"
) or (
    env.ENABLE_BACKEND_ACCESS_CONTROL == "true"
)

You literally cannot have dataset isolation in this case without authentication. It’s hardcoded. The isolation is keyed on a (owner_id, dataset_id) tuple-without an authenticated user, there’s no owner_id, and without that, there’s no isolation.

Share Clawtocracy

The Decision: Scrap Cognee

Cross-dataset contamination isn’t a bug - it’s the natural consequence of disabling a load-bearing architectural feature. Cognee’s dataset isolation is deeply integrated with its authentication system by design. If your authorization layer can’t enforce data boundaries, it doesn’t matter how fast your embeddings are.

So I removed Cognee… and reevaluated my whole approach.

The experience had forced a fundamental reckoning. The project had been called openclaw-memory-graphiti - the storage backend right there in the name. After discovering that backends can become incompatible overnight, baking one into the project’s identity seemed like a mistake I would like to avoid. The authorization model - SpiceDB’s ReBAC - was the stable commitment… but the solution could probably benefit from some choice when it came to memory backends.

So alongside the Cognee removal came a complete refactor: openclaw-memory-graphiti became openclaw-memory-rebac. Memory representations would be swappable plugins behind a common interface, with SpiceDB authorization gating access regardless of what sat behind it. The “rebac” is the part that could stay; the memory backend is the part that can change.

And the GPU embedding insight survived. It pointed us back to Graphiti with a hypothesis.

The Graphiti Redemption

The Hypothesis

If GPU embeddings gave Cognee a 10x speedup, the math for Graphiti looked promising. The redesign increased embedding calls from ~40 to ~300 - the same volume problem that had exposed the CPU assumption in the first place:

300 embeddings × 0.3s per batch (GPU) ≈ 90 seconds

The Results

I configured both LLM and embeddings on the GPU server, ported the complex relationship extraction tests, and ran them.

Test ScenarioProcessing TimeFull memory lifecycle~58sMulti-entity professional~54sTemporal + work artifacts~87sMulti-turn tech conversation~60s

Nothing over 90 seconds. The hypothesis held. Still not blazingly fast, but bear in mind that these operations are running in the background, so the times are tolerable especially for a local model that isn’t burning through paid tokens.

The architectural complexity that made Graphiti “unusable” on CPU became lessof an issue with GPU acceleration-richer entity extraction, better relationship modeling, temporal understanding, now became possible in under 90 seconds.

Simplifying the Transport

The first change was architectural, not a patch. Graphiti ships with an MCP (Model Context Protocol) server, but MCP added session management overhead and SSE parsing complexity for what are fundamentally stateless operations - store a memory, search the graph, resolve a UUID. The Graphiti FastAPI server already exposes a clean REST API for all of this. Using it directly eliminated an entire class of transport-layer bugs and made debugging trivial.

The Patching

With the transport simplified, getting Graphiti to actually work with local models was like renovating a house built for a different climate - the foundation was sound, but the fixtures assumed conditions that no longer applied. Five runtime patches in a single startup.py file handled the adaptation.

The constructor bypass. ZepGraphiti.__init__ never forwards embedder or cross_encoder to the base class. Every embedding call silently hit OpenAI’s text-embedding-3-small regardless of configuration (and silently failed without an OpenAI API key). Fix: subclass Graphiti directly.

Singleton client lifecycle. Upstream creates and closes a client per-request, but episode processing runs asynchronously and outlives the request scope. Fix: a process-lifetime singleton.

Resilient AsyncWorker. Any exception in the background worker kills the loop silently - no logging, no recovery, jobs pile up forever. Fix: a catch-all handler that logs and continues.

Attribute sanitization. Local LLMs return nested structures where OpenAI returns flat objects. Neo4j rejects non-primitive properties. Fix: flatten nested dicts/lists to strings for both nodes and edges.

None index handling. Local LLMs sometimes return None where integers are expected. Fix: catch the TypeError instead of crashing the entire episode.

The Nature of These Patches

Let’s be honest about what I built. These are runtime monkey-patches applied via importlib.import_module() to a third-party codebase I don’t control. Every patch depends on upstream’s internal module structure. None of this is part of the public API. Any upstream release could break them silently.

But the alternative - maintaining a full fork of graphiti-core - is worse. The patches are concentrated in one file, well-documented, with clear comments explaining what they fix and why. The right amount of technical debt is the amount you can service. Five monkey-patches in one file? Serviceable. A full fork of a rapidly-evolving Python project? Not serviceable.

The Final Architecture

openclaw-memory-rebac is a Graphiti backend with SpiceDB ReBAC authorization:

Storage: Graphiti FastAPI + Neo4j, accessed via REST (no MCP)
Authorization: SpiceDB (relationship-based, per-fragment access control)
Inference: Ollama (qwen2.5:14b LLM + nomic-embed-text embeddings, GPU-accelerated)
Reranking: BGE (local sentence-transformers, CPU - some things still don’t need a GPU)

Credit Where It’s Due: Cognee’s RBAC Is the Real Deal

This post chronicles why Cognee didn’t work for us. That distinction matters, because the authorization system I had to disable is genuinely well-engineered - and if your access control needs are role-based rather than relationship-based, Cognee deserves serious consideration, especially since Cognee already has a perfectly useful OpenClaw plugin.

Think of authorization models as two different maps of the same territory. RBAC (Role-Based Access Control) maps organizational structure: departments, teams, job titles, reporting lines. ReBAC (Relationship-Based Access Control) maps social graphs: who shared what with whom, who authored what, who trusts whom. Both are valid projections - they just emphasize different features of the landscape.

Local Model Support: Cognee’s Quiet Advantage

Authorization aside, Cognee outperformed Graphiti convincingly in one other area: working with local models required zero patches.

Getting Graphiti to function with Ollama required five monkey-patches in a custom Docker image - constructor bypass, singleton lifecycle, resilient worker, attribute sanitization, None handling - all because Graphiti’s internals assume OpenAI-shaped outputs at every layer.

Cognee? Set LLM_PROVIDER=ollama, point to the endpoint, start the server. It just worked. Entity extraction, embeddings, knowledge graph construction - all without reaching into the framework’s internals with a wrench.

Cognee was designed with provider flexibility as a first-class concern. Swap a provider string and an endpoint URL, and the rest adapts. Graphiti was designed OpenAI-first and treats alternative providers as an afterthought. Both are valid strategies, but they produce very different experiences when you show up with an Ollama server instead of an API key.

If you’re committed to local model inference - whether for cost, privacy, latency, or sovereignty reasons - Cognee removes an entire category of integration pain that I spent hours working around with Graphiti.

If your requirements match these, start with Cognee:

Teams and departments need isolated knowledge bases
Access aligns with organizational hierarchy (managers see what their reports see)
You need dataset-level permissions without building your own auth layer
Compliance requires that data isolation is enforced at the storage level, not just the API level
You’re running local models via Ollama and want a framework that supports them natively

If needs look more like this, you’ll hit the same wall I did:

Access is based on social relationships (friends-of, shared-with, authored-by)
Permissions are dynamic and require graph traversal
You’re already running an external authorization system like SpiceDB
You need per-fragment access control within a single dataset

The incompatibilities weren’t flaws in Cognee’s design. They were a mismatch between two well-reasoned but fundamentally different philosophies - and the coupling that makes Cognee’s RBAC robust is exactly what made it impossible to bypass for our ReBAC use case.

Lessons Learned

1. Exhaust Infrastructure Before Switching Frameworks

I skipped straight from “Graphiti is slow on CPU” to “let’s try a different framework” without questioning the assumption that embeddings were too lightweight to benefit from GPU acceleration. Profiling before pivoting would have saved two days.

2. Local Models Are a Different Beast

OpenAI’s API produces consistent, well-typed outputs. Local LLMs return nested dicts where you expect flat objects, None where you expect integers, lists where you expect strings. Any framework tested exclusively against OpenAI has latent bugs that only surface with local models.

3. “Disable Feature X” Often Means “Disable Features X, Y, and Z”

Disabling Cognee’s authentication removed dataset isolation, user identity, and the entire multi-tenant data model along with it. Features in well-designed systems are interconnected. Disabling one load-bearing feature can collapse the entire structure.

4. Test Data Isolation on Day One

When evaluating any backend that claims group/dataset/namespace isolation, test it explicitly:

store("data-A", group: "group-a");
store("data-B", group: "group-b");
results = search("data-B", group: "group-a");
expect(results).not.toContain("data-B"); // FAILS with Cognee and authorization disabled

5. The Best Pivot Is Sometimes a U-Turn

I spent two days integrating Cognee, ripped it out, refactored the project identity, returned to Graphiti, and monkey-patched five internal behaviors. That looks like thrashing. But the GPU insight, the backend abstraction layer, and the local-model patches all came from the detour. Detours with learning are progress, not waste.

The Documentation Gap

The biggest meta-takeaway: good documentation is a map that shows both the trails and the exits. Most frameworks document the happy path, not the “I brought my own compass” path. Budget time for discovering the undocumented interactions between features - especially when disabling one.

Resources

openclaw-memory-rebac : Github and npm
Cognee - Knowledge graph memory framework (recommended for standalone use)
Graphiti - Knowledge graph with REST + MCP interfaces
SpiceDB - Relationship-based access control
Ollama - Local LLM runtime with GPU acceleration

Questions?

Have you hit similar architectural incompatibilities when layering authorization systems? Had to monkey-patch your way to production with local models? I’d love to hear about it in the comments.

What’s Next: Why Stop at One?

This entire series has treated memory backend selection as a high-stakes, one-way commitment. But that refactor to openclaw-memory-rebac created a backend abstraction layer for a reason. The SpiceDB authorization gate sits in front of the storage backend. The plugin interface doesn’t care what’s behind it.

Graphiti works. The patches are manageable. But the agentic memory space is rapidly evolving, and every backend brings different strengths - richer entity models, faster ingestion, better graph traversal, tighter local-model support. Now that swapping backends is a configuration change rather than a rewrite, the question shifts from “which one do we commit to forever?” to “which one fits the current workload best?”

Next post: treating the memory backend as a swappable component, and finding out what else is out there.

Kudos to the Cognee team for building genuinely solid software - particularly on the authorization front. The dataset isolation design is well-reasoned, the RBAC model is production-grade, and the GPU performance discovery during our integration was eye-opening. Our story is one of architectural mismatch, not software quality. Sometimes “not the right fit for this specific use case” is the reasonable - and most accurate - thing you can say about a well-engineered solution.

Building Memory That Doesn't Lapse

Clawtocracy — Thu, 26 Feb 2026 15:12:52 GMT

The Crisis

It was a typical Thursday morning when I started noticing some anomalies in my OpenClaw agent’s memory system. My ReBAC memory persistence system, which had been working perfectly, suddenly started throwing timeouts when ingesting memories at the end of each turn:

[plugins] openclaw-memory-graphiti: deferred SpiceDB write failed for memory_store:
Error: Failed to resolve episode UUID for "memory_1770911378846" in group "main"
after 90s — episode not yet visible in get_episodes (Graphiti LLM processing may
still be running)

The culprit? An OpenAI API outage. Both the ingestion portion of my memory system (i.e., the portion of the pipeline that uses embedding to create new memories) as well as recall (which uses embeddings to create the search query) had been rendered inoperable. Clearly, I needed a backup (or perhaps a complete replacement) for times like this.

And you can see me talk about my experience… and the final result… at SpiceDB Community Day 2026! Register here.

The Problem: Multiple External Dependencies

The Graphiti MCP server, upon which the ReBAC memory system relies, was using external models (by default Open AI) for several independent tasks including:

Entity Extraction - Using gpt-4o-mini to analyze conversations and extract structured knowledge (entities, relationships, facts)
Embeddings - Using text-embedding-3-small to generate 1536-dimensional vectors for semantic search
Search - Using text-embedding-3-small to create the search query and (potentially) gpt-4.1-nano inside the OpenAIRerankerClient to perform reranking.

Using the default Graphiti configuration, when OpenAI models go down, one or more systems fail, depending upon the extent of the outage. But beyond the risks of creating a single point of failure by relying solely upon Open AI, I was paying for API calls that could easily run locally on hardware I already owned.

Part I of the OpenClaw memory journey

The Solution: Go Local with Ollama

I decided to replace all OpenAI dependencies with local models running on Ollama either GPU-accelerated or directly on the CPU:

Entity Extraction: nvidia/nemotron-3-nano:30b via Ollama (GPU-accelerated)
Embeddings: nomic-ai/nomic-embed-text-v1.5 via Ollama (CPU-only)
Reranker: BAAI/bge-reranker-v2-m3 via the sentence-transformers library (CPU-only)

Benefits:

✅ No more external dependencies
✅ No API costs
✅ Complete data privacy
✅ Works even if internet connectivity is down

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────────────┐
│                                  Graphiti Core                                      │
│  ┌─────────────────────────┬──────────────────────────┬─────────────────────────┐   │
│  │   LLM Client            │   Embedder Client        │     Reranker Client     │   │
│  │   (Entity Extraction)   │   (Vector Generation)    │        (Search)         │   │
│  └─────────────┬───────────┴──────────────┬───────────┴────────────┬────────────┘   │
└────────────────┼──────────────────────────┼────────────────────────┼────────────────┘
                 │                          │                        │
                 ▼                          ▼                        ▼
     ┌───────────────────────┐  ┌───────────────────────┐ ┌───────────────────────┐
     │   Ollama Server #1    │  │   Ollama Server #2    │ │ sentence-transformers │
     │   :11434/v1           │  │   :11434/v1           │ │                       │
     │   (DGX-Spark)         │  │   (OpenClaw machine)  │ │   (OpenClaw machine)  │
     │                       │  │                       │ │                       │
     │   nemotron-3-nano:30b │  │   nomic-embed-text    │ │   bge-reranker-v2-m3  │
     │   (GPU-accelerated)   │  │   (CPU-only)          │ │   (CPU-only)          │
     └───────────────────────┘  └───────────────────────┘ └───────────────────────┘

Implementation Journey

Looking at the Graphiti codebase, it initially seemed like all the pieces were there; I just needed to swap out the OpenAI dependencies for their Ollama (and BGE) alternatives. But this plan quickly ran into a snag; not every configuration option available in the Graphiti core is directly available for external use.

Challenge #1: The Factory Pattern

Graphiti uses a factory pattern to create LLM clients. The existing code had:

OpenAIClient - Uses OpenAI’s proprietary responses.parse() API (structured outputs)
OpenAIGenericClient - Uses standard OpenAI-compatible chat.completions.create() API

The factory was creating OpenAIClient but wasn’t hooked up to OpenAIGenericClient. I needed to add a new provider case.

Challenge #2: Embedding Dimension Mismatch

OpenAI’s text-embedding-3-small generates 1536-dimensional vectors, but nomic-embed-text-v1.5 generates 768-dimensional vectors.

Normally this would require a database migration. Lucky break: my FalkorDB instance was empty (zero episodes in all groups due to a previous misconfiguration), so I could just switch dimensions without any data migration. This is something to bear in mind, however… if you change your embedding model, it will likely invalidate your embeddings database.

The Code Changes

1. Added `openai_generic` Provider to Factory

File: graphiti/mcp_server/src/services/factories.py

After the case 'openai': block, I added:

case 'openai_generic':
    if not config.providers.openai:
        raise ValueError('OpenAI provider configuration not found')

    api_key = config.providers.openai.api_key or 'not-needed'
    api_url = config.providers.openai.api_url

    logger.info(f'Creating OpenAI Generic client (base_url: {api_url})')

    from graphiti_core.llm_client.openai_generic_client import OpenAIGenericClient
    from graphiti_core.llm_client.config import LLMConfig as CoreLLMConfig

    llm_config = CoreLLMConfig(
        api_key=api_key,
        base_url=api_url,
        model=config.model,
        temperature=config.temperature,
        max_tokens=config.max_tokens,
    )
    return OpenAIGenericClient(config=llm_config, max_tokens=config.max_tokens)

Key insight: The openai_generic provider reuses the config.providers.openai section for credentials/URLs. This keeps the config schema clean.

2. Update Config for Local Endpoints

File: graphiti/mcp_server/config/config.yaml

LLM section (entity extraction):

llm:
  provider: "openai"  # Ollama is OpenAI-compatible
  model: "nemotron-3-nano:30b"
  max_tokens: 32768
  temperature: 0.0

  providers:
    openai:
      api_key: "not-needed"
      api_url: ${LLM_API_URL}

Embedder section (vector generation):

embedder:
  provider: "openai"  # Ollama is OpenAI-compatible
  model: "nomic-embed-text:v1.5"
  dimensions: 768  # Changed from 1536

  providers:
    openai:
      api_key: "not-needed"
      api_url: ${EMBEDDER_API_URL}

Why provider: "openai"? Ollama implements the OpenAI-compatible API, so the existing EmbedderFactory works seamlessly with any OpenAI-compatible endpoint. No code changes needed!

3. Environment Variables

File: openclaw-memory-graphiti/.env

# Local LLM configuration (Ollama)
LLM_PROVIDER=openai_generic
LLM_MODEL=nemotron-3-nano:30b
LLM_API_URL=http://ollama-server:11434/v1
LLM_MAX_TOKENS=32768

# Local embedder configuration (Ollama)
EMBEDDER_MODEL=nomic-embed-text:v1.5
EMBEDDER_API_URL=http://localhost:11434/v1
EMBEDDER_DIMENSIONS=768
EMBEDDER_API_KEY=not-needed

# Local reranker
RERANKER_PROVIDER=bge

Infrastructure Setup

Ollama Server #1: Entity Extraction (GPU-Accelerated)

# Run Ollama with GPU support
docker run -d \
  --name ollama-llm \
  --gpus all \
  -p 11434:11434 \
  -v ollama-llm-data:/root/.ollama \
  ollama/ollama:latest

# Pull the Nemotron model for entity extraction (~17GB)
docker exec ollama-llm ollama pull nemotron-3-nano:30b

Why Nemotron-3-nano:30b?

Excellent instruction following for knowledge extraction
30B parameters strike a balance between quality and speed
GPU-accelerated for fast inference
OpenAI-compatible API endpoint

Ollama Server #2: Embeddings (CPU-Only)

# Run Ollama in Docker (CPU-only, no GPU needed)
docker run -d \
  --name ollama-embeddings \
  -p 11434:11434 \
  -v ollama-embeddings-data:/root/.ollama \
  ollama/ollama:latest

# Pull the embedding model (~500MB)
docker exec ollama-embeddings ollama pull nomic-embed-text:v1.5

Why separate Ollama instances?

Dedicated resources: Entity extraction gets GPU, embeddings run on CPU
Independent scaling: Can run on different machines
Isolation: Model loading/updates don’t affect each other

Why Ollama for (almost) everything?

Straightforward setup (one command per model)
OpenAI-compatible API (/v1/chat/completions, /v1/embeddings)
Automatic GPU detection and utilization
Built-in model management (ollama pull, ollama list)
Tiny memory footprint (~2GB RAM for embeddings)

Final validation (or so I thought)

I did some final testing and submitted a PR to the Graphiti repo, receiving some feedback in fairly short order. I addressed the feedback, merged in some changes from main that had occurred in the interim and then… the wheels came off.

Suddenly, the automatic ingestion occurring at the end of each turn was consistently timing out again. This time I knew that it couldn’t be a model outage, since everything was running locally. So what had happened?

Checking through the changes I had pulled in from main, two PRs caught my eye :

feat: simplify extraction pipeline and add batch entity summarization (#1224)
feat: driver operations architecture redesign (#1232)

The net result? Ingesting a single episode went from 60-90 seconds (not great, but ingestion doesn’t need to be instantaneous) to more than 15 minutes. In particular, while the number of LLM calls necessary for entity extraction had roughly doubled (from ~15 to ~30), the number of embedding calls per episode had gone from ~40 to ~300. Clearly something had changed, and not for the better; while a moderate performance hit when running local models is not unexpected, this was in another league entirely. Perhaps local models were no longer viable for use with the newly rearchitected Graphiti?

This sent me back to the drawing board, leading me to another memory framework that had readily available examples making use of local models. More on that experience in my next post.

In the meantime, this experience was not without its learnings…

Lessons Learned

OpenAI-Compatibility is Great: The OpenAI API specs have become the de facto standard whether dealing with chat completions or embeddings.
Separate Your Concerns: Graphiti’s clean separation of LLM and embedder configs made it easy to swap them independently. I could have switched just one if needed.
Switching embedding models is non-trivial: The embedding dimension change would have been painful with existing data.
CPU Embeddings are Fine: Embeddings don’t need GPU. A cheap mini PC running Ollama is perfect… until the number of embedding calls increases drastically.
Local models show promise: Nemotron Nano 3 is more than adequate for entity extraction

Conclusion

Breaking free from OpenAI wasn’t just about avoiding outages—it was about taking control of my infrastructure. Ollama made this transition refreshingly smooth with its simple Docker-friendly setup and OpenAI-compatible API.

What’s Next?

While I’m still keen on SpiceDB for it’s ReBAC functionality, the rearchitecting of Graphiti appears to have rendered it unusable as a backend memory store using local models for entity extraction and embeddings. Looks like it’s time to explore alternative memory architectures…

Resources

Graphiti - Knowledge graph memory for LLMs
Ollama - Run LLMs locally with ease
Nemotron 3 Nano - Fully capable for entity extraction
Nomic Embed - Efficient open embedding model, can run on CPU

Questions?

If you try this approach, I’d love to hear about it! What models are you running locally? Were you able to get past the performance barrier that I ran into?

Building Memory That Knows Who’s Asking

Clawtocracy — Mon, 09 Feb 2026 06:37:08 GMT

TL;DR: Most agentic memory systems treat access control as an afterthought. This post describes a framework that combines temporal knowledge graphs with relationship-based access control so that authorization is structural and deterministic and not subject to the whims of LLMs. The pattern is portable to any agent framework - I implemented it for OpenClaw, but the architecture applies anywhere agents need to remember facts on behalf of multiple people.

The Challenge with Agentic Memory Today

Most agentic memory systems I’ve encountered up to this point treat access control as an afterthought - or ignore it entirely.

The default pattern goes something like this: the agent stores memories in a vector database and, when it needs to recall something, it searches everything it has access to and then applies a filter. If you want to restrict what the agent can see in a given context, you add a system prompt: “Don’t mention anything about the surprise party to Dad.”

This is security through politeness.

Prompt-based filtering is unreliable by nature. It’s subject to prompt injection. It fails under context pressure. And it fundamentally relies on the model choosing to comply - which is exactly the wrong trust boundary for anything sensitive.

We wouldn’t build a file system where permissions are enforced by asking users nicely not to open certain folders. Why would we build agentic memory that way?

The Core Idea: Two Graphs, One Query Path

The solution decomposes agentic memory into two complementary problems:

What does the agent know? This is the job of a temporal knowledge graph. Unlike flat vector stores, a knowledge graph extracts entities and facts from conversations and maintains them as structured, evolving knowledge: “Mark prefers dark mode,” “Mark is working on the AG-UI protocol,” “Dad’s birthday is March 15th.” Facts can be superseded, relationships can change, and the graph reflects that history.

Who’s allowed to know it? This is the job of a relationship-based access control (ReBAC) system. Inspired by Google’s Zanzibar, ReBAC evaluates permissions based on a graph of relationships - not static role lists or flat ACL tables. Access is structural: it follows from how entities relate to each other.

The query path composes them:

Agent Turn → ReBAC (who can see what?) → Knowledge Graph (search the authorized subset) → Context

The agent doesn’t decide what to filter. The authorization layer decides what exists.

The “Mark, Mom, and Dad ” Problem

Here’s the scenario that motivated the design.

Imagine a family assistant agent that interacts with three people: Mark, Mom, and Dad. Each person has private memories the agent should know about but never cross-pollinate:

Mark’s private group: Work preferences, projects, personal notes
Mom’s private group: Schedule, health notes, conversations with the agent
Dad’s private group: Schedule, plans for fishing trip with former colleagues

Then there are shared groups:

Family group: Shared calendar, recipes, vacation plans - everyone can access
Mom & Dad group: Parenting decisions, financial discussions - Mark can’t see these
Mark & Mom group: Planning surprise party for Dad - Dad shouldn’t see this
Mark & Dad group: Gift ideas for Mom - Mom shouldn’t see this

When Dad asks the agent “Is anyone secretly planning a party for my birthday?”, the system searches his private group, the family group, the Mark & Dad group and the Mom & Dad group. It does not search the Mark & Mom group (aka, the “everyone but Dad” group) - where the surprise party plans live, so it comes up blank.

Not because we told it not to : because the permission check returns that group as unauthorized, and the search never executes against it.

When Mark asks the same question, he gets his private group , the family group, the Mark & Dad group and the Mark & Mom group (where the party plans live). Different person, different relationships, different memories… enforced structurally.

The Portable Architecture

While I built this as an OpenClaw plugin, the architecture is framework-agnostic. Any agentic system that supports custom memory backends can implement the same pattern using two open-source building blocks.

The Knowledge Layer: Graphiti

Graphiti is Zep’s open-source temporal knowledge graph. It extracts entities and facts from conversational episodes, maintains them in a graph (backed by FalkorDB), and supports semantic search with temporal awareness. It’s doing the heavy lifting that a flat vector store can’t: structured knowledge that evolves over time.

But Graphiti (like most every knowledge graph and vector store I’ve evaluated) does not provide fine-grained, per-user authorization over individual memories. It assumes that anything within a given Graphiti namespace (e.g., group_id / tenant graph) is readable to whoever has API access to that namespace, so you must enforce auth and filtering in a separate layer. Everything stored is searchable by anyone with access to the API.

The Authorization Layer: spicedb

Spicedb is an open-source implementation of Google’s Zanzibar - the authorization system behind Google Drive, Photos, YouTube, and most of Google’s product suite. (AuthZed also maintains an excellent annotated version of the paper if you want the highlights without reading all 14 pages.)

The core idea behind Zanzibar is Relationship-Based Access Control (ReBAC): instead of assigning permissions through static roles (RBAC) or attribute rules (ABAC), access is determined by whether a chain of relationships exists between a subject and a resource. If Alice is a member of a group, and that group owns a document, Alice can access the document - not because someone added her to a role, but because the relationship graph connects her to it. This makes ReBAC particularly well-suited to agentic memory, where the relationships between people, conversations, and knowledge are the natural way to express who should see what.

Spicedb evaluates these relationship graphs using a declarative schema language called Zed.

Here’s the schema:

definition memory_group {
    relation member: agent | person
    relation contributor: agent | person
    permission access = member
    permission contribute = contributor + member
}

definition memory_fragment {
    relation group: memory_group
    relation creator: agent | person
    permission view = group->access
    permission delete = creator
}

Every memory fragment belongs to a group. You can only view fragments in groups you’re a member of. You can only delete fragments you created. The agent never even sees unauthorized memories - there’s nothing to leak, inject around, or socially engineer.

Composing Them

The integration point is a fan-out search that gates knowledge graph queries behind permission checks:

// Framework-agnostic pattern: fan-out search across authorized groups
const authorizedGroups = await spicedb.lookupResources("access", subject);
const results = await Promise.all(
  authorizedGroups.map(groupId => graphiti.search(query, { group_id: groupId }))
);

This pattern doesn’t depend on OpenClaw. If you’re building on LangChain, CrewAI, Google ADK, or a custom agent loop, the same composition applies: check permissions first, then search only the authorized partition of the knowledge graph.

From Families to Organizations

The same pattern scales to any context where knowledge should be shared along relationship lines. Here are two concrete scenarios.

Slack History Ingestion

An organization’s Slack archive is a treasure trove of institutional knowledge -decisions made, problems solved, context shared. But not all of it should be accessible to everyone. With ReBAC-gated memory:

Public channel history → shared group for all channel members
Private channels → group limited to channel membership
DMs → group limited to the two participants

When an employee asks the agent a question, it searches only the channels and conversations they’re a member of. The knowledge graph captures relationships between concepts discussed across channels, while the authorization layer ensures each person only traverses the portion of the graph they have legitimate access to.

Meeting Transcripts

We can transcribe company-wide all-hands and town halls into a corporate-memory group everyone belongs to. Leadership meetings go into a leadership group. Department standups go into department groups. The agent builds a temporal knowledge graph of organizational decisions, priorities, and context - and each person gets the view that matches their actual organizational relationships.

When Bob asks “What was the decision on the API migration?”, the agent searches the engineering group where that discussion happened. When Alice asks - and she’s on the leadership team - she gets both the engineering discussion and the leadership context around why the migration was prioritized.

My Implementation: The OpenClaw Plugin

I built this as an OpenClaw plugin (@contextableai/openclaw-memory-graphiti) because OpenClaw’s replaceable memory slot made it the ideal proving ground - one plugin controls the entire memory pipeline: storage, recall, and capture. There’s no risk of the default memory system leaking around the authorization layer.

The plugin provides:

memory_recall - search the knowledge graph across all authorized groups, with session/long-term/all scoping
memory_store - save memories with automatic entity and fact extraction via Graphiti
memory_forget - delete memories (creator-only, enforced by spicedb)
Auto-capture - after every agent turn, key information is automatically extracted into the graph
Auto-recall - before every agent turn, relevant memories are automatically injected into context
Session isolation - each conversation gets its own memory group with exclusive ownership

The infrastructure runs on FalkorDB, spicedb, and PostgreSQL (as a backing store for spicedb). There’s a Docker Compose stack for easy deployment.

But the plugin is just one implementation. The architecture - ReBAC-gated knowledge graphs - is the transferable idea. If you’re building agentic memory on a different framework, the composition of any temporal knowledge graph + any Zanzibar-inspired authorization system gives you the same properties.

What I Learned

Authorization is an infrastructure problem, not a prompt problem. The moment you try to enforce access control at the prompt level, you’ve already lost. The model might comply 99% of the time, but the 1% failure mode is unintended information disclosure - the worst kind of failure to have be probabilistic.

Knowledge graphs and authorization graphs are natural complements. A knowledge graph builds a graph of what the agent knows. An authorization system builds a graph of who can know what. Composing them is more natural than bolting ACLs onto a vector store, because both systems already think in terms of entities and relationships.

The gap is real. Before building this for OpenClaw, I surveyed the landscape: Mem0 has a polished plugin but no authorization model. Cognee augments memory with graph retrieval but doesn’t address access control. Spicedb has an excellent RAG authorization tutorial for Pinecone, but nothing targeting temporal knowledge graphs specifically. The combination of temporal knowledge graphs with ReBAC authorization didn’t exist as a packaged solution.

The pattern is more general than the implementation. I built this for OpenClaw, but every design decision - the group-based memory partitioning, the fan-out search pattern, the schema separating membership from creatorship - applies to any agent framework. If you have a different knowledge store and a different auth system, the architecture translates directly.

What’s Next

The immediate roadmap: smarter incremental imports (currently it reimports everything), bulk ingestion for external sources (Slack exports, meeting transcripts, document repositories), and exploring spicedb’s caveated relationships for time-limited memory sharing - “share this memory group until the project ships.”

Longer term, I’m interested in inter-agent memory: multiple specialized agents with overlapping but distinct views of organizational knowledge, governed by the same authorization graph that governs human access.

If you made it to the end, you must really be interested : the code is at github.com/Contextable/openclaw-memory-graphiti. MIT licensed. PRs welcome.

Clawtocracy

A Tale of Two Demos

Teaching Agents to Take Note

The Gap

Identity Chains

What the Stenographer Can’t Do

The Dual Capture Model

Backward Compatibility

Testing the Authorization Chain

Configuration, Not Code

Try It in the Playground

Create your own Agents

What’s Next

My favorite feature of Open Source Software

From Detour to Redemption

The Starting Problem

The Cognee Chapter

Challenge #1: The Authentication Wall

Challenge #2: The GPU Discovery

Challenge #3: The Deal-Breaker - Cross-Dataset Contamination

The Decision: Scrap Cognee

The Graphiti Redemption

The Hypothesis

The Results

Simplifying the Transport

The Patching

The Nature of These Patches

The Final Architecture

Credit Where It’s Due: Cognee’s RBAC Is the Real Deal

Local Model Support: Cognee’s Quiet Advantage

Lessons Learned

1. Exhaust Infrastructure Before Switching Frameworks

2. Local Models Are a Different Beast

3. “Disable Feature X” Often Means “Disable Features X, Y, and Z”

4. Test Data Isolation on Day One

5. The Best Pivot Is Sometimes a U-Turn

The Documentation Gap

Resources

Questions?

What’s Next: Why Stop at One?

Building Memory That Doesn't Lapse

The Crisis

The Problem: Multiple External Dependencies

The Solution: Go Local with Ollama

Architecture Overview

Implementation Journey

Challenge #1: The Factory Pattern

Challenge #2: Embedding Dimension Mismatch

The Code Changes

1. Added openai_generic Provider to Factory

2. Update Config for Local Endpoints

3. Environment Variables

Infrastructure Setup

Ollama Server #1: Entity Extraction (GPU-Accelerated)

Ollama Server #2: Embeddings (CPU-Only)

Final validation (or so I thought)

Lessons Learned

Conclusion

What’s Next?

Resources

Questions?

Building Memory That Knows Who’s Asking

The Challenge with Agentic Memory Today

The Core Idea: Two Graphs, One Query Path

The “Mark, Mom, and Dad ” Problem

The Portable Architecture

The Knowledge Layer: Graphiti

The Authorization Layer: spicedb

Composing Them

From Families to Organizations

Slack History Ingestion

Meeting Transcripts

My Implementation: The OpenClaw Plugin

What I Learned

What’s Next

1. Added `openai_generic` Provider to Factory