How I Built an AI Agent That Manages My Infrastructure

Published on

AI Automation SRE OpenClaw

Not Another Chatbot

When I tell people I have an "AI assistant," they picture ChatGPT in a browser. That's not what I built. Nixie — my AI agent — is an autonomous system that runs 24/7 on my infrastructure, manages servers, tracks tasks, and proactively solves problems. She doesn't wait for questions. She acts.

This post is about how I built her, what works, what breaks, and whether it's actually worth the effort.

The Stack

Architecture Decisions

Why Not a Persistent Memory?

Most AI agent tutorials push you toward vector databases and persistent state. I went the opposite direction: flat files. Nixie wakes up fresh each session and reads her memory files. This has three advantages:

Multi-Model Routing

Not every task needs the most expensive model. I built a routing strategy:

Task Type              → Model      → Cost/1M tokens
Quick status checks    → Haiku      → $0.80
Standard reasoning     → Sonnet     → $3.00
Deep analysis          → Opus       → $15.00
Heartbeats/budget work → Ollama     → Free

This keeps daily costs under $5 while maintaining quality where it matters. The local Ollama model handles heartbeats and simple checks — zero API cost.

The Butler Protocol

I gave Nixie a strict operating framework:

Act freely within bounds (read files, run diagnostics, explore).
Ask first for destructive ops, external comms, or uncertainty.
Be proactive — offer solutions, not just problem reports.
Earn trust through competence, not performance.

This prevents the two failure modes of AI agents: doing nothing useful (too cautious) or doing something destructive (too aggressive).

What She Actually Does

Infrastructure Management

Nixie manages my NixOS configurations. She runs nixos-rebuild dry-build before applying changes, monitors system health, and knows the rollback path. When I tell her to "add Prometheus to the server," she edits the NixOS config, validates it, and waits for my approval before rebuilding.

Task Tracking

She integrates with Todoist via API. Every task gets progress comments, blocker detection, and automatic escalation. She has a heartbeat system that checks every 30 minutes for:

Code Analysis

She can index repositories, analyze code structure, find dead code, and detect circular dependencies. This is powered by jCodeMunch, an MCP server that provides AST-level code intelligence.

The Cost

Running Nixie 24/7 costs:

Total: roughly $90-150/month. For a 24/7 SRE assistant that never sleeps? That's cheaper than 2 hours of on-call time.

What Breaks

It's not all smooth. Here's what actually goes wrong:

Is It Worth It?

Yes, but with caveats. Nixie handles ~70% of my routine infrastructure work. The remaining 30% needs human judgment — architectural decisions, security reviews, anything requiring business context.

The real value isn't replacing myself. It's never starting from scratch. When I sit down to work, Nixie has already checked the systems, identified issues, and prepared options. I go straight to decision-making instead of discovery.

Try It Yourself

If you want to build something similar:

Want to discuss AI agents for SRE work? Get in touch.

$ subscribe --to newsletter

SRE tips, infrastructure patterns, and NixOS guides — straight to your inbox. No spam, just signal.

Delivery via newsletter service. Unsubscribe anytime.

Related Posts

← Back to Blog