Architecture | Simeon Garratt

Technology Stack

Core capabilities

Backend

↓

Python (FastAPI, async/await, Celery)
Go/Golang for high-performance services
Node.js / Express for real-time systems
SQLAlchemy + Alembic migrations

Frontend

↓

React 18+ / Next.js 14
TypeScript for type safety
React Native for mobile
TailwindCSS, shadcn/ui, Radix UI
Vite & Webpack build systems

AI & Machine Learning

↓

Ollama local model deployment
OpenAI & Anthropic Claude APIs
LangChain orchestration
PyTorch & TorchVision
OpenCV for computer vision
scikit-learn for ML pipelines

Databases

↓

PostgreSQL (optimized queries, partitioning)
Redis (caching, pub/sub, queues)
Neo4j for graph relationships
Weaviate vector database (2.8M+ vectors)

Infrastructure

↓

Docker Swarm (30+ services, 6 nodes)
Docker Compose for orchestration
Terraform for IaC
Nginx reverse proxy & load balancing
CI/CD pipelines (GitHub Actions)
Linux server management

Security

↓

JWT & OAuth 2.0 authentication
bcrypt encryption
Rate limiting & DDoS protection
Vulnerability scanning (SCAFU)
SSL/TLS certificate management

System Architecture

Production infrastructure

30+

Docker Services

8.5K

Daily Executions

150+

n8n Workflows

2.8M

Vector Embeddings

A modular production stack: gateway at the edge, services in the middle, knowledge and storage beneath. The specifics change per project—the structure stays consistent.

[01] ENTRY

Edge & Gateway

Traffic routing · Rate limiting · TLS termination

Receives all external traffic, routes to internal services, enforces global policies, and handles TLS. First line of defense and traffic control.

Nginx

CDN

WAF

[02] COMPUTE

Application Services

APIs · WebSocket · Background jobs

Stateless services handling REST/GraphQL APIs, real-time channels, and async workers. Message queues decouple workloads for resilience.

FastAPI

WebSocket

Redis Queue

[03] AUTOMATION

Orchestration

Workflows · Scheduling · Cross-service coordination

150+ workflows orchestrate data movement, scheduled tasks, and event-driven automations. 8.5K daily executions coordinate the entire stack.

n8n

Cron

Webhooks

[04] INTELLIGENCE

AI & ML Layer

Local models · Embeddings · Retrieval

Privacy-first AI with local Ollama models and 2.8M vector embeddings. Semantic search, code generation, and intelligent automation—all running in-house.

Ollama

Weaviate

Embeddings

[05] PERSISTENCE

Data Plane

Relational · Graph · Cache

Multi-model persistence: PostgreSQL for transactions, Neo4j for relationships, Redis for sub-millisecond reads. Purpose-built for each data pattern.

PostgreSQL

Neo4j

Redis

[06] OBSERVABILITY

Monitoring & Security

Metrics · Tracing · Alerts

End-to-end observability with Prometheus metrics, Grafana dashboards, and real-time alerting. Security policies enforced at every layer.

Prometheus

Grafana

Alerts

All services communicate via internal Docker network with automatic service discovery. Prometheus monitors health metrics, Grafana visualizes performance, and automated backups run daily.

AI Infrastructure

Local AI deployment

Privacy-first AI architecture. All models run locally via Ollama—no data leaves your infrastructure. Cloud APIs (OpenAI, Claude) used only for non-sensitive workloads.

llama3.1:8b

General Intelligence

Primary model for SCAFU, AURA, and general reasoning tasks. 8B parameter sweet spot for speed/quality balance. Runs at ~40 tokens/sec on consumer hardware.

deepseek-coder:6.7b

Code Generation

Security remediation code generation in SCAFU. Framework-specific fixes (React, Django, Express). 85% accuracy on CVE patch suggestions.

mistral:7b

Fast Inference

Quick responses for conversational systems. Optimized for low-latency applications. Used in AURA for real-time style adaptation.

voyage-large-2

Embeddings

Vector embeddings for semantic search in Weaviate. 1024 dimensions, optimized for retrieval accuracy. Powers Nuculair's context matching.

gpt-4o

Complex Reasoning

Cloud fallback for multi-step reasoning, security analysis deep-dives, and complex chain-of-thought tasks when local models insufficient.

claude-3.5-sonnet

Long Context

200K context window for document analysis, large codebase reasoning, and comprehensive security audits. Used in SCAFU's full-stack scanning.

                
# Model routing logic
if task.sensitive_data:
    model = "llama3.1:8b"  # Local only
elif task.requires_code:
    model = "deepseek-coder:6.7b"
elif task.context_length > 10000:
    model = "claude-3.5-sonnet"
else:
    model = "mistral:7b"  # Fast default
                
            

Vector Database

RAG & semantic search

Weaviate vector database with 2.8M+ embeddings enables sub-100ms semantic search across millions of entities. Hybrid search combines vector similarity with keyword filtering for precision.

2.8M+

Vector Embeddings

<100ms

Query Latency

1024d

Vector Dimensions

95%+

Recall Accuracy

Embedding Pipeline: Documents chunked to 512 tokens → Voyage AI embeddings → Weaviate index → Redis cache hot queries. Nuculair uses this for instant profile context retrieval across 300+ data sources.

Hybrid Search: Vector similarity (cosine distance) + BM25 keyword matching + metadata filters. Weighted fusion algorithm combines scores for optimal relevance ranking.

Workflow Orchestration

n8n automation platform

150+ n8n workflows handle data ingestion, processing, and delivery. 8,500+ daily executions power OSINT aggregation, security scanning, and AI model orchestration.

OSINT Workflows

Social media scraping (120+ platforms)
Professional network aggregation (80+ sources)
Public records collection (60+ databases)
Real-time alerts & monitoring
Data enrichment pipelines

Security Automation

Scheduled vulnerability scans
CVE database synchronization
Exploit code generation triggers
Remediation report delivery
False positive filtering

AI Orchestration

Model selection routing
Prompt template management
Response quality validation
Context aggregation
Embedding generation batch jobs

The guts