Philosophy

ModelCub is built on a set of core principles that guide every design decision.

Local-First

Your data never leaves your machine.

This isn't just a feature - it's our fundamental architecture. ModelCub works 100% offline because:

Security

No network requests means:

No data exfiltration risk
No man-in-the-middle attacks
No cloud breaches
No third-party access

Privacy

Perfect for sensitive data:

Medical imaging (HIPAA compliant)
Pharmaceutical research
Defense applications
Proprietary datasets

Performance

Local processing is faster:

No network latency
No upload/download time
Full GPU utilization
Works on slow connections

Cost

Zero recurring fees:

No monthly subscriptions
No per-API-call charges
No surprise bills
No vendor lock-in

Stateless Backend

The backend is a view layer. All state lives in files.

State Storage:
├── .modelcub/config.yaml      # Configuration
├── .modelcub/datasets.yaml    # Dataset registry
├── .modelcub/runs.yaml        # Training runs
└── data/datasets/             # Actual data

Benefits

Multiple instances: Run several UI servers simultaneously. They all see the same state.

Easy backup: Copy the directory. That's it.

Version control: Git can track everything.

No synchronization: No database to keep in sync.

Transparent: All state is human-readable YAML/JSON.

Implications

Kill the server → restart → nothing lost
No "database migrations"
No connection pooling
No ORM complexity
No cache invalidation

API-First

Everything is accessible through clean APIs.

python

# Python SDK
from modelcub import Project
project = Project.init("my-project")

bash

# CLI
modelcub project init my-project

typescript

// Web API
const project = await api.createProject({path: "my-project"});

All three interfaces use the same underlying core API.

Benefits

Composable: Mix and match tools as needed.

Automation: Script any workflow.

Testing: Easy to test business logic.

Integration: Works with existing tools.

Future-proof: New interfaces can be added without changing core.

Format-Agnostic

YOLO internally, import/export anything.

Import              Internal            Export
─────────────────────────────────────────────────
YOLO       ─────┐                  ┌───→  YOLO
Roboflow   ─────┤                  ├───→  COCO
COCO       ─────┼────→  YOLO  ────┼───→  VOC
Images     ─────┘                  └───→  TFRecord

Why YOLO Internally?

Simple: Text-based format, easy to parse.

Universal: Every CV library supports it.

Git-friendly: Human-readable diffs.

Fast: No complex parsing required.

Standard: Industry-wide adoption.

Format Conversion

Transparent conversion on import/export:

python

# Import COCO
dataset = Dataset.from_coco("./coco", name="v1")

# Export to TFRecord
dataset.export("./output", format="tfrecord")

User never needs to think about internal format.

Git-Friendly

Version datasets like code.

bash

# Commit changes
modelcub commit "Added 100 new samples"

# View history
modelcub history

# Compare versions
modelcub diff v1 v2

# Rollback
modelcub checkout v1

Why Version Control?

Reproducibility: Exact state of data for every experiment.

Collaboration: Multiple people can work on same dataset.

Experimentation: Safe to try changes, easy to rollback.

Audit trail: Know exactly what changed and when.

Debugging: Bisect to find when issue was introduced.

Implementation

File-based: All state in text files Git can track.

Diff-friendly: Changes show up clearly in diffs.

Commit metadata: Full provenance for every change.

Branch support: Experiment in branches, merge when ready.

Developer-Friendly

Built by engineers who felt the pain.

Clear Error Messages

Bad:

Error: Invalid input

Good:

❌ Dataset not found: "production-v1"

Available datasets:
  • production-v2 (847 images)
  • test-v1 (120 images)

Use: modelcub dataset list

Sensible Defaults

Auto-detect:

GPU (CUDA, MPS, or CPU)
Optimal batch size
Image size
Number of workers

User only specifies what they care about.

Good Documentation

Every API has:

Clear description
Parameter documentation
Return value documentation
Code examples
Common use cases

Type Safety

Full type hints throughout:

python

def import_dataset(
    source: Path,
    name: str,
    classes: Optional[List[str]] = None
) -> Dataset:
    ...

Transparent

No black boxes. No hidden state.

Configuration

All config in .modelcub/config.yaml:

yaml

project:
  name: my-project
defaults:
  device: cuda
  batch_size: 16

No hidden registry files. No system-wide configuration.

State

All state in human-readable files:

yaml

# .modelcub/datasets.yaml
datasets:
  v1:
    name: v1
    classes: [cat, dog]
    images: 1000

No binary databases. No opaque blobs.

Logs

Clear, structured logs:

[2025-01-26 10:30:15] INFO: Importing dataset from ./data
[2025-01-26 10:30:16] INFO: Found 1000 images
[2025-01-26 10:30:17] INFO: Detected 2 classes: cat, dog
[2025-01-26 10:30:18] SUCCESS: Import complete

Errors

Full stack traces in debug mode. Clear messages in normal mode.

Composable

Use what you need. Ignore what you don't.

Standalone Components

Each piece works independently:

python

# Just dataset management
from modelcub import Dataset
dataset = Dataset.load("v1")

# Just annotation
from modelcub import Annotator
annotator = Annotator(dataset)

# Just training
from modelcub import Trainer
trainer = Trainer(dataset, model)

No Forced Workflows

Use ModelCub how you want:

CLI only
SDK only
UI only
Mix and match

Easy Integration

Works with existing tools:

python

# Use with your own training loop
dataset = Dataset.load("v1")
train_loader = dataset.to_pytorch_dataloader()

# Your code here
for batch in train_loader:
    ...

Performance

Fast enough to not be annoying.

Benchmarks

Import 10k images: <30 seconds
Validate dataset: <10 seconds
Load dataset metadata: <100ms
Render UI: 60fps

Optimization

Lazy loading: Only load what's needed.

Caching: Cache expensive computations.

Pagination: Don't load all images at once.

Async: Use async I/O where beneficial.

Parallelism: Multi-threading for CPU-bound tasks.

Security

Privacy by design. Security by default.

No Remote Code

No eval(), no exec(), no pickle of untrusted data.

Input Validation

All paths validated:

No directory traversal
No symlink attacks
File extension checking

Safe Parsing

YAML/JSON parsing with safe loaders only.

SQL Safety

Parameterized queries only (for optional SQLite cache).

No Network

Zero outbound connections:

No telemetry
No update checks
No analytics
No crash reporting

Extensibility

Designed for future growth.

Plugin System (Future)

python

# plugins/my_augmentation.py
from modelcub import Plugin

class MyAugmentation(Plugin):
    def augment(self, image):
        ...

# Register
modelcub.register_plugin(MyAugmentation)

Hook Points

Events for extension:

python

from modelcub import bus

@bus.subscribe(DatasetImported)
def on_import(event):
    print(f"Dataset {event.name} imported")

Custom Formats

Add new import/export formats:

python

from modelcub import register_format

@register_format("custom")
class CustomFormat:
    def parse(self, path): ...
    def export(self, dataset, path): ...

Summary

ModelCub is:

Local-First: Your data, your machine
Stateless: No hidden databases
API-First: Everything composable
Format-Agnostic: Use any format
Git-Friendly: Version like code
Developer-Friendly: Clear, simple APIs
Transparent: No black boxes
Composable: Use what you need
Performant: Fast enough
Secure: Privacy by design
Extensible: Ready for growth

These principles guide every decision we make.

Philosophy ​

Local-First ​

Security ​

Privacy ​

Performance ​

Cost ​

Stateless Backend ​

Benefits ​

Implications ​

API-First ​

Benefits ​

Format-Agnostic ​

Why YOLO Internally? ​

Format Conversion ​

Git-Friendly ​

Why Version Control? ​

Implementation ​

Developer-Friendly ​

Clear Error Messages ​

Sensible Defaults ​

Good Documentation ​

Type Safety ​

Transparent ​

Configuration ​

State ​

Logs ​

Errors ​

Composable ​

Standalone Components ​

No Forced Workflows ​

Easy Integration ​

Performance ​

Benchmarks ​

Optimization ​

Security ​

No Remote Code ​

Input Validation ​

Safe Parsing ​

SQL Safety ​

No Network ​

Extensibility ​

Plugin System (Future) ​

Hook Points ​

Custom Formats ​

Summary ​

Philosophy

Local-First

Security

Privacy

Performance

Cost

Stateless Backend

Benefits

Implications

API-First

Benefits

Format-Agnostic

Why YOLO Internally?

Format Conversion

Git-Friendly

Why Version Control?

Implementation

Developer-Friendly

Clear Error Messages

Sensible Defaults

Good Documentation

Type Safety

Transparent

Configuration

State

Logs

Errors

Composable

Standalone Components

No Forced Workflows

Easy Integration

Performance

Benchmarks

Optimization

Security

No Remote Code

Input Validation

Safe Parsing

SQL Safety

No Network

Extensibility

Plugin System (Future)

Hook Points

Custom Formats

Summary