COMPUTER VISION
R&D 2025
← HOME

Frozen Moments of Machine Perception

These are visual snapshots captured mid-thought—neural activations extracted from the hidden layers of deep learning models and transformed into art. Each image is a window into how machines decode reality, translating pixels into patterns, edges into meaning, chaos into understanding.

STEP 1
INPUT
Data Ingestion & Preprocessing
Image tensors normalized via ImageNet statistics or audio converted to mel-spectrogram representation
• Shape: (3, 224, 224)
• Normalize: μ=[0.485,0.456,0.406]
• Audio: n_mels=128, sr=22050Hz
STEP 2
CNN
Neural Network Forward Pass
Convolutional feature extraction through progressively deeper layers with non-linear activations
• Models: ResNet18 | CSM-1B
• Layers: conv1 → layer1...layer4
• Activation: f(x) = max(0, x)
STEP 3
HOOK
Activation Map Extraction
PyTorch forward hooks capture intermediate feature tensors before pooling, batch norm, or output layers
• Hook: register_forward_hook()
• Outputs: (B, C, H, W) tensors
• Layers: 6 convolutional blocks
STEP 4
RENDER
Visualization & Encoding
Feature maps transformed via custom codecs—spatial gradients, colormap encoding, pattern synthesis algorithms
• Modes: 8 render pipelines
• Normalize: (x - min) / (max - min)
• Output: RGB uint8 [0-255]

Extracting the In-Between

Neural Vision captures the moments before a model makes its decision—the raw, unfiltered activations that exist in the hidden layers. What we see is normally invisible: billions of neurons firing, transforming input into understanding. By extracting these activation maps and rendering them as visual art, we reveal the alien beauty of machine perception.

Upload Anything
Feed the network an image or record live audio. Watch as it dissects what you give it—layer by layer, neuron by neuron—and renders its internal processing as visual art.
8 Render Styles
Matrix rain, circuit boards, hexadecimal memory, ASCII terminals, thermal spectrums, binary code, pointillist dots, raw RGB channels—each mode tells a different story about how the machine interprets your input.
Deep Layer Extraction
Neural Vision hooks directly into ResNet18 and CSM-1B models, extracting activation maps from 6 convolutional layers. You're seeing the network's raw thoughts before it reaches a conclusion.
Creative Control
Adjust temperature, intensity, noise, edge detection, contrast, pattern density. Turn machine learning into generative art—controllable, repeatable, endlessly curious.

Built With PyTorch & Flask

Full-stack application combining deep learning visualization with real-time audio processing. Flask backend handles model inference while WebAudio API enables live microphone capture and waveform display.

Neural Models
ResNet18
CSM-1B
PyTorch
Activation Hooks
Layer Visualization
Audio Processing
Librosa
Mel-Spectrogram
FFmpeg
WebAudio API
MediaRecorder
Real-time Waveforms
Backend & Processing
Flask
OpenCV
NumPy
Pillow
Matplotlib
Visualization Modes
Matrix Code Rain
Circuit Board PCB
Hexadecimal Memory
ASCII Terminal Art
Thermal Spectrum
Binary Patterns
Pointillist Dots
Neural RGB

Progressive Feature Extraction

Neural Vision captures activations from 6 key convolutional layers, showing how the network builds increasingly complex representations from simple edges to high-level semantic features.

Layer 01
Early Convolution
Edge detection, basic shapes, color gradients
Layer 02-03
Mid-Level Features
Textures, patterns, simple object parts
Layer 04-06
High-Level Semantics
Complex shapes, object recognition, contextual understanding

Educational & Creative Tool

Neural Vision makes deep learning interpretable and accessible. Perfect for understanding how convolutional neural networks process visual and audio information, teaching AI concepts, or creating generative art from neural activations.

Education
Demonstrate how CNNs extract features at different layers, from edge detection to high-level patterns.
Research
Analyze model behavior, compare activation patterns, and debug neural network architectures visually.
Generative Art
Create unique artwork from neural activations with customizable aesthetic parameters and render modes.
Audio Visualization
Transform speech, music, or ambient sound into visual representations through mel-spectrogram analysis.

Neural Vision FAQ

What is Neural Vision?

Neural Vision is an interactive tool that extracts hidden-layer activations from deep learning models and renders them as visual art. It reveals how convolutional neural networks perceive and process images and audio.

Does Neural Vision require a GPU?

No. Neural Vision runs entirely in the browser using pre-computed activations served from a Flask backend. The server handles PyTorch inference, so users need only a modern web browser.

What CNN concepts does Neural Vision demonstrate?

Neural Vision demonstrates convolutional feature extraction, activation maps, forward hooks, progressive layer abstraction from edges to semantics, and 8 distinct rendering pipelines for visualizing internal network states.

Can Neural Vision process audio inputs?

Yes. Audio is converted to mel-spectrograms using Librosa at 22050 Hz sample rate with 128 mel bands, then fed through the same convolutional pipeline as images for visualization.

Is Neural Vision available to use online?

Neural Vision is currently a research and development project. A public demo is planned. The codebase uses Flask, PyTorch, OpenCV, and WebAudio API for real-time processing.