Technical Infrastructure
The guts
Production infrastructure that powers the things I design and build. I use a wide variety of tools, systems and custom built models that adapt specifically to what YOU need.
Technology Stack
Core capabilities
Backend
↓- Python (FastAPI, async/await, Celery)
- Go/Golang for high-performance services
- Node.js / Express for real-time systems
- SQLAlchemy + Alembic migrations
Frontend
↓- React 18+ / Next.js 14
- TypeScript for type safety
- React Native for mobile
- TailwindCSS, shadcn/ui, Radix UI
- Vite & Webpack build systems
AI & Machine Learning
↓- Ollama local model deployment
- OpenAI & Anthropic Claude APIs
- LangChain orchestration
- PyTorch & TorchVision
- OpenCV for computer vision
- scikit-learn for ML pipelines
Databases
↓- PostgreSQL (optimized queries, partitioning)
- Redis (caching, pub/sub, queues)
- Neo4j for graph relationships
- Weaviate vector database (2.8M+ vectors)
Infrastructure
↓- Docker Swarm (30+ services, 6 nodes)
- Docker Compose for orchestration
- Terraform for IaC
- Nginx reverse proxy & load balancing
- CI/CD pipelines (GitHub Actions)
- Linux server management
Security
↓- JWT & OAuth 2.0 authentication
- bcrypt encryption
- Rate limiting & DDoS protection
- Vulnerability scanning (SCAFU)
- SSL/TLS certificate management
System Architecture
Production infrastructure
A modular production stack: gateway at the edge, services in the middle, knowledge and storage beneath. The specifics change per project—the structure stays consistent.
Edge & Gateway
Receives all external traffic, routes to internal services, enforces global policies, and handles TLS. First line of defense and traffic control.
Application Services
Stateless services handling REST/GraphQL APIs, real-time channels, and async workers. Message queues decouple workloads for resilience.
Orchestration
150+ workflows orchestrate data movement, scheduled tasks, and event-driven automations. 8.5K daily executions coordinate the entire stack.
AI & ML Layer
Privacy-first AI with local Ollama models and 2.8M vector embeddings. Semantic search, code generation, and intelligent automation—all running in-house.
Data Plane
Multi-model persistence: PostgreSQL for transactions, Neo4j for relationships, Redis for sub-millisecond reads. Purpose-built for each data pattern.
Monitoring & Security
End-to-end observability with Prometheus metrics, Grafana dashboards, and real-time alerting. Security policies enforced at every layer.
All services communicate via internal Docker network with automatic service discovery. Prometheus monitors health metrics, Grafana visualizes performance, and automated backups run daily.
AI Infrastructure
Local AI deployment
Privacy-first AI architecture. All models run locally via Ollama—no data leaves your infrastructure. Cloud APIs (OpenAI, Claude) used only for non-sensitive workloads.
# Model routing logic
if task.sensitive_data:
model = "llama3.1:8b" # Local only
elif task.requires_code:
model = "deepseek-coder:6.7b"
elif task.context_length > 10000:
model = "claude-3.5-sonnet"
else:
model = "mistral:7b" # Fast default
Vector Database
RAG & semantic search
Weaviate vector database with 2.8M+ embeddings enables sub-100ms semantic search across millions of entities. Hybrid search combines vector similarity with keyword filtering for precision.
Embedding Pipeline: Documents chunked to 512 tokens → Voyage AI embeddings → Weaviate index → Redis cache hot queries. Nuculair uses this for instant profile context retrieval across 300+ data sources.
Hybrid Search: Vector similarity (cosine distance) + BM25 keyword matching + metadata filters. Weighted fusion algorithm combines scores for optimal relevance ranking.
Workflow Orchestration
n8n automation platform
150+ n8n workflows handle data ingestion, processing, and delivery. 8,500+ daily executions power OSINT aggregation, security scanning, and AI model orchestration.
OSINT Workflows
- Social media scraping (120+ platforms)
- Professional network aggregation (80+ sources)
- Public records collection (60+ databases)
- Real-time alerts & monitoring
- Data enrichment pipelines
Security Automation
- Scheduled vulnerability scans
- CVE database synchronization
- Exploit code generation triggers
- Remediation report delivery
- False positive filtering
AI Orchestration
- Model selection routing
- Prompt template management
- Response quality validation
- Context aggregation
- Embedding generation batch jobs
Frequently Asked Questions
Infrastructure FAQ
What infrastructure powers these AI systems?
The production stack runs on Docker Swarm orchestrating containers across multiple nodes. Core services include Ollama for local AI model inference, PostgreSQL for relational data, Neo4j for graph relationships, Redis for caching, and FastAPI/Python backends with React/Next.js frontends.
Why Docker Swarm instead of Kubernetes?
Docker Swarm provides sufficient orchestration while dramatically reducing operational complexity. For single-team operations running fewer than 50 services, Swarm's native Docker integration and lower resource overhead outweigh Kubernetes' advanced scheduling.
How are local AI models deployed in production?
Ollama serves as the local model runtime, hosting quantized versions of Llama 3.1 and Mistral. A routing layer directs security-sensitive inference to local models and generic tasks to cloud APIs.
What database strategy supports both OSINT and security?
A polyglot persistence strategy: PostgreSQL for structured records, Neo4j for entity relationships, Redis for sub-millisecond caching, and vector stores for semantic search across 2.8 million embeddings.
How is monitoring handled?
Prometheus collects metrics, Grafana provides dashboards, and custom alerting monitors AI model latency, inference quality scores, and resource utilization.