We Build Pipelines That Run Themselves

Autonomous data collection, real-time processing, multi-stage workflows, and AI-driven generation pipelines. Hundreds of scheduled jobs across production systems, running 24/7 without manual intervention.

Pipelines We've Built

Four real systems, four different pipeline architectures — all in production

Traffic Intelligence Pipeline

170+ automated jobs / real-time + batch processing / multi-region AWS

Real-time traffic data flows in from hardware sensors across multiple vendors — Bluetooth detectors, radar units, video analytics systems. Each vendor has a different API, different data format, different polling interval. We normalize all of it into a unified schema and run it through travel-time calculation, segment aggregation, and anomaly detection before it hits the database.

On top of the real-time layer, batch jobs run hourly and daily aggregations: FHWA-compliant travel time indices (TTI/PTI/BTI), near-miss safety metrics (TTC/PET), and EPA MOVES emissions estimates. Archive jobs compress and store raw data. Monitoring jobs watch the whole thing and alert when collectors go silent.

170+ Scheduled Jobs

24/7 Real-Time Collection

Multi-Region AWS Deployment

60+ EC2 Instances

Django Celery Redis PostgreSQL AWS Batch Lambda ElastiCache S3 CloudWatch

Intelligence Collection Pipeline

167 automated collectors / knowledge graph ingestion / LLM inference chains

A distributed intelligence system that autonomously collects, processes, and connects information across domains. 167 collectors run on scheduled cycles — scraping, polling APIs, monitoring feeds, and ingesting structured data. Each collector normalizes its output into a common signal format tagged with source, confidence, and domain metadata.

Incoming signals feed into a multi-stage enrichment pipeline: entity extraction, relationship mapping into a Neo4j knowledge graph, vector embedding into Qdrant for semantic search, and time-series storage in TimescaleDB. LLM inference runs on local GPU hardware, producing summaries, assessments, and cross-domain correlation reports on 4-hour autonomous cycles.

167 Collectors

12 Machines

4-Hour Autonomous Cycles

Local GPU LLM Inference

FastAPI Celery Neo4j Qdrant TimescaleDB PostgreSQL Redis Ollama Elasticsearch

Decision Engine Pipeline

Multi-source aggregation / scheduled analysis / LLM-augmented recommendations

A personal command center that aggregates data from multiple life domains — journal entries, habit tracking, financial transactions, goals, project status, and external feeds. Incoming data flows through validation and normalization before landing in domain-specific stores.

Scheduled analysis jobs run pattern detection across domains: correlating habits with goal progress, tracking financial trends against budgets, and surfacing tasks that are falling behind. An LLM layer generates weekly summaries and actionable recommendations based on the cross-domain analysis. Everything runs on self-hosted infrastructure with zero external data custody.

6+ Data Domains

Self-Hosted Infrastructure

Cross-Domain Correlation

Zero External Data Custody

FastAPI React PostgreSQL Redis Ollama Celery

Autonomous Code Generation Pipeline

16-state workflow / 3 async queues / proposal to tested branch

An AI-driven development pipeline that accepts build proposals and runs them through a 16-state process to produce tested code. Proposals enter a planning queue where an LLM breaks them into implementation steps. Each step flows through a generation queue that writes code on isolated Git branches, then into a validation queue that runs tests and static analysis.

Three Celery queues handle the different stages with different concurrency and priority settings — planning is serial, generation is parallel across proposals, validation runs with dedicated resources. Failed validations loop back to generation with error context. Successful builds produce tested branches ready for human review. The whole pipeline runs autonomously once a proposal is submitted.

16 Pipeline States

3 Async Queues

Autonomous End-to-End

Tested Output Branches

FastAPI Celery Redis PostgreSQL Ollama Gitea React

How We Build Pipelines

Celery + Redis for Everything Async

Every pipeline we build runs on Celery with Redis as the broker. Scheduled tasks, event-driven triggers, priority queues, retry logic with exponential backoff, and dead-letter handling. We know this stack deeply and we push it hard.

Normalize Early, Enrich Later

Raw data hits a staging layer first. Validation, schema normalization, and deduplication happen before anything touches the main database. Enrichment — entity extraction, embedding, aggregation — runs as separate downstream jobs that can fail without losing the source data.

Every Job Is Observable

Every pipeline job logs its start, completion, duration, and record count. Monitoring jobs watch for silent failures — collectors that stop collecting, jobs that run longer than expected, queues that back up. We know something is broken before users notice.

Self-Hosted When It Matters

Pipelines that handle sensitive data or need GPU compute run on hardware we own and maintain. No third-party data custody, no variable cloud costs for inference workloads. AWS for what makes sense, bare metal for what doesn't.

Need a Pipeline Built?

Whether it's real-time data collection, batch ETL, AI-driven workflows, or something we haven't seen yet — we can build it.

Email

contact@legacycoder.com

Phone

+1 (720) 767-3986

Location

Denver, CO