15 Jul 2025 by: TinkerForge AI

Experiment 2: Robustness Testing

Research on robustness testing for AI models with experimental setup and results

Experiment 2 Post-Mortem: Narrated Neurons and the Quest for Explainable, Resource-Efficient AI

By TinkerForge AI Research Team

Introduction

At TinkerForge AI, we opened our doors with a vision: to push the boundaries of AI towards systems that are not only powerful but also understandable and resource-aware. Our second experiment, documented here, was a bold attempt to design a neural simulation where neurons narrate their own decision-making and the system evolves in a lightweight, event-driven, and transparent manner—without relying on TensorFlow or PyTorch.

Below, we share a candid post-mortem that blends our founder’s personal reflection with technical insights from the project. Our hope is that this openness will help others in the AI research community—and our future selves—learn from both our successes and dead ends.

The Vision: Narrated, Sparse, Hierarchical Neurons

Experiment 2 was born from a simple but radical question:
What if every neuron in an AI system could explain, in human-readable language, why it made each decision?

We sought to build a biologically-inspired neural architecture with these core principles:

Sparse & Event-Driven: Neurons only compute when relevant events fire—minimizing wasted compute, mimicking real brains.
Narrative & Explainable: Every action, state change, and adaptation is logged in plain English—full stack-trace style.
Hierarchical & Emergent: Clusters and higher-order controllers form organically when groups of neurons co-fire.
Lightweight & Scalable: Minimal state per neuron, allowing for potentially millions of neurons to run on modest hardware.
Pattern-Driven: A persistent PatternWatcher observes for recurring activity, suggesting adaptations or new clusters.
Locally Adaptive: Neurons autonomously tweak their own rules and thresholds, based on feedback and performance.
Parallel by Design: The system harnesses async and multiprocessing to model brain-like parallelism.

What We Built: Technical Accomplishments

Despite not reaching the finish line, Experiment 2 produced several technical milestones:

1. Event-Driven Neuron Core

We designed a custom Neuron class (see src/neuron.py) that:

Activates only on relevant input “spikes” or signals.
Maintains local memory of recent events and firing history.
Adjusts its own firing thresholds based on “lessons learned” from near-misses or repeated patterns.
Narrates every significant action, e.g.,

“Lesson learned: Lowering threshold from 1.0 to 0.9 after repeated near-misses.”

2. Narrative Logging System

Every neuron action and system event is logged in human-auditable Markdown, organized with “folds” for digestibility.
Logs include introspective commentary:

“Reflecting on my experience, I am now confident in my own pattern recognition abilities.”

3. Emergent Clustering and PatternWatcher

Neurons naturally cluster when exhibiting correlated activity—no manual grouping.
The PatternWatcher acts as a “meta-observer,” recognizing and reporting recurring patterns, and nudging neurons to adapt:

“PatternWatcher: Detected repeated near-miss events. Notifying neuron.”

4. No External ML Libraries

The entire pipeline is implemented in pure Python 3.10+, no TensorFlow or PyTorch dependencies.
All modules are custom:
- neuron.py: Neuron logic and self-narration
- dispatcher.py: Event loop and message passing
- cluster.py: Higher-order grouping logic
- pattern_watcher.py: Persistent pattern detection
- utils.py: Logging helpers
- main.py: Experiment runner and scenario orchestration

5. Experiment Framework and Auditable Logs

Every run is documented using a Markdown template for easy review and comparison.
Experiment logs include summaries of neuron behavior, cluster events, and detected patterns.

Honest Reflections: What Worked & What Didn’t

“In experiment two, we attempt to get much more granular with narrative AI --- which eventually we ended up getting to a point where areas of the code needed complete re-writes, and we hadn’t committed previous code fast enough to keep up with our AI Copilot counterpart! This resulted in having to abandon the project, but we had also been on a good (but long path)”
— Project Lead

What Went Well

Deep Explainability: The approach produced remarkably transparent logs—every neuron “thought” and adaptation is documented and traceable.
Emergent Behavior: We observed clusters forming and neurons “mentoring” each other, with logs capturing both successes and failures.
Modular, Lightweight Design: The system ran efficiently, and could potentially scale to millions of lightweight processes.

What Broke Down

System Cohesion: As the system grew, the decentralized, narrative-driven architecture became unwieldy. Lack of baseline metrics and visualization tools made it hard to track learning progress.
Version Control Chaos: Rapid, experimental changes—sometimes made in collaboration with AI Copilot—outpaced our ability to commit code regularly, leading to confusion and loss of work.
No External ML Baseline: By eschewing TensorFlow/PyTorch, we challenged ourselves to “build the car and the road at the same time.” Without standard metrics or baselines, progress was hard to measure.
Exponential Complexity: The combinatorial explosion of neuron interactions, especially with narrative logging, made debugging and code management increasingly nuanced.

Lessons & The Road Ahead

This experiment reinforced some crucial lessons:

Commit Early, Commit Often: Especially when collaborating with coding assistants, regular commits are non-negotiable.
Transparency ≠ Simplicity: Full introspection is powerful, but can add overwhelming complexity if not managed with rigorous structure and logging discipline.
Resource Efficiency is Hard to Measure: Designing for compute efficiency and resource allocation is valuable, but needs clear baselines and visualization tools.
It’s Okay to Pivot: Sometimes, the most important progress is knowing when to pause, reflect, and set a new course.

What’s Next? (Teaser for Experiment 3)

We’re now building a solid baseline using traditional machine learning frameworks (TensorFlow / PyTorch), so we can compare our novel approaches head-to-head. Our goal is to find the most compute-efficient, transparent, and auditable form of machine learning.

The narrative AI approach from Experiment 2 still excites us—and we’ll definitely revisit it. For now, clarity, metrics, and careful iteration are the priority.

Stay tuned for Experiment 3!

Interested in our research or want to collaborate? Reach out to us at TinkerForge AI!