Skip to content

Phony Cloud Platform - Implementation Plan


Overview: Three-Phase Strategy

┌─────────────────────────────────────────────────────────────────────────┐
│                    PHONY IMPLEMENTATION STRATEGY                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  PHASE 1: OPEN SOURCE FOUNDATION                                        │
│  ══════════════════════════════════                                     │
│  Goal: Become the best Faker alternative for PHP/Laravel                 │
│  Revenue: $0 (community building)                                        │
│  Timeline: Year 1, Q1-Q2                                                 │
│                                                                          │
│         │                                                                │
│         ▼                                                                │
│                                                                          │
│  PHASE 2: CLOUD PLATFORM                                                │
│  ═══════════════════════════                                            │
│  Goal: Monetize with SaaS features OSS can't provide                     │
│  Revenue: $0 → $150K ARR                                                 │
│  Timeline: Year 1 Q3 → Year 2                                            │
│                                                                          │
│         │                                                                │
│         ▼                                                                │
│                                                                          │
│  PHASE 3: SCALE & OPTIMIZATION                                          │
│  ═════════════════════════════════                                      │
│  Goal: Enterprise features, performance, exit preparation                │
│  Revenue: $150K → $600K+ ARR → Exit                                      │
│  Timeline: Year 3-5                                                      │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

PHASE 1: Open Source Foundation

Strategic Goal: Establish Phony as THE modern Faker alternative for PHP

Scope Boundary: OSS = Faker alternative ONLY. No sync, no anonymization, no hosted features.

Why OSS First:

  • Build trust and community before asking for money
  • Validate N-gram engine with real-world usage
  • Create upgrade path: OSS users need sync → Cloud customers
  • SEO/visibility: GitHub stars, Packagist downloads
  • Avoid Neosync mistake: They open-sourced sync features, couldn't monetize, got acquired

1.1 Core Engine (phonyland/ngram)

Foundation for everything. Must be rock-solid.

DeliverableDescription
N-gram tokenizationWord, character, and hybrid modes
Frequency map buildingEfficient storage of n-gram frequencies
Probability-weighted generationWeighted random walk algorithm
Seed supportDeterministic output for CI/CD
excludeOriginals optionNever output training data
Performance100K+ generations/second target

Success Criteria: 100% test coverage, benchmarks documented, zero external dependencies

1.2 Model System (phonyland/language-model)

DeliverableDescription
Model file format.phony binary or .json
Model metadataN-gram size, training source, version
Model loading/cachingLazy loading for memory efficiency
Model trainingTrain from arrays, files (txt, csv, json)
Model savingExport trained models to .phony files
Pre-trained modelsShip with library (names, emails, etc.)

Note: Local training from files is included in OSS. Cloud adds training from DB columns, versioning, and team sharing.

1.3 Generator Engine (phonyland/language-generator)

DeliverableDescription
Generate from modelCore generation API
ConstraintsMin/max length, prefix, suffix
Batch generationEfficient for large volumes
Format templates"{first} {last}" patterns
Pipe/chain operationsComposable transformations

1.4 Pre-trained Models

Priority models (trained from public data):

  • person_names (multi-cultural)
  • company_names
  • street_names
  • city_names
  • email_usernames
  • product_names
  • lorem_words

1.5 Main Library (phonyland/phony)

php
// API Design
Phony::name()->first();              // "Mehmet"
Phony::name()->full();               // "Ayşe Yılmaz"
Phony::email()->generate();          // "mehmet.yilmaz@example.com"
Phony::model('custom')->generate();  // From custom model
Phony::seed(12345)->name();          // Deterministic
Phony::unique()->email();            // Guaranteed unique

1.6 Laravel Integration (phonyland/phony-laravel)

DeliverableDescription
Service ProviderAuto-discovery
FacadePhony::
Factory integrationfn() => Phony::name()->full()
Artisan commandsphony:generate, phony:train, phony:models
Config fileconfig/phony.php
PhonySeeder traitEasy seeding

Artisan Commands:

bash
# Generate data
php artisan phony:generate users 1000

# Train model from file
php artisan phony:train --source=names.txt -o=names.phony

# List available models (pre-trained + custom)
php artisan phony:models

Phase 1 Success Criteria

  • [ ] 500+ GitHub stars
  • [ ] 200+ weekly Packagist downloads
  • [ ] 10+ community issues/PRs
  • [ ] Featured in Laravel News or similar
  • [ ] Local model training working & documented
  • [ ] Ready for Cloud development

PHASE 2: Cloud Platform

Strategic Goal: Monetize with features that require infrastructure

Implementation Order

OrderFeatureRationale
1Platform FoundationEverything depends on this
2Schema-First GenerationSimpler than DB sync, validates core
3Custom Model Training UIDifferentiator from competitors
4Database Sync (MVP)Core enterprise feature, MySQL only
5Mock API (Read-Only)Unique feature, no competitor has
6Database Sync (Full)Add PostgreSQL, incremental, scheduled
7Mock API (Stateful)After read-only proves valuable
8Hybrid LLM ModeNice-to-have, not critical path
9MCP ServerOnce core features stable

2.1 Platform Foundation

Technical Stack (Nuxt + Go + Rust):

DASHBOARD (Nuxt)                 GO ENGINE
├── Nuxt 3                       ├── Go 1.22+
├── Vue 3 + Composition API      ├── pgx (PostgreSQL)
├── TypeScript                   ├── go-mysql-driver
├── Tailwind CSS                 ├── go-sqlite3
├── Auth.js (authentication)     ├── net/http (API server)
├── Stripe SDK (billing)         └── asynq (job queue)
└── Nuxt UI (components)

RUST CORE                        INFRASTRUCTURE
├── N-gram engine (5M/sec)       ├── PostgreSQL 15
├── MessagePack (serde)          ├── Redis (cache, queue)
└── FFI exports (C ABI)          └── S3-compatible storage

Why This Stack:

  • Nuxt: Vue familiar (from Inertia planning), TypeScript, easy deploy
  • Go: Best DB libs (pgx 30-50% faster), goroutines, fast HTTP for Mock API
  • Rust: Maximum N-gram performance (5M/sec), memory efficient
  • No Laravel: Go handles all backend; Laravel would be overhead

Core Features:

  • User registration & authentication
  • Team/organization support
  • Project management
  • API key management
  • Usage tracking & metering
  • Billing & subscription management

2.2 Database Sync & Anonymization

┌─────────────────────────────────────────────────────────────────────────┐
│                    SYNC ARCHITECTURE                                     │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Customer Environment              Phony Cloud                           │
│  ┌─────────────┐                  ┌─────────────────────────┐           │
│  │ Production  │                  │                         │           │
│  │ Database    │──── Secure ─────▶│  1. Schema Analysis     │           │
│  │             │     Connection   │  2. PII Detection       │           │
│  └─────────────┘                  │  3. Transform Rules     │           │
│                                   │  4. Generate/Anonymize  │           │
│  ┌─────────────┐                  │  5. Output              │           │
│  │ Staging     │◀─── Output ──────│                         │           │
│  │ Database    │                  └─────────────────────────┘           │
│  └─────────────┘                                                        │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Database Connectors (Priority Order):

  1. MySQL (first)
  2. PostgreSQL
  3. SQLite (for local dev)
  4. MariaDB

Schema Introspection:

  • Auto-detect tables, columns, types
  • Identify relationships (FK inference)
  • Detect PII columns (heuristics + patterns)
  • Generate transformation recommendations

Sync Modes:

  • Full sync (first run)
  • Incremental sync (subsequent runs)
  • Subset sync (% of data with FK integrity)
  • Scheduled syncs (cron-like)

Subset Sync Algorithm

┌─────────────────────────────────────────────────────────────────────────┐
│                    SUBSET SYNC WITH FK INTEGRITY                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Problem: Get 1% of data while maintaining FK relationships              │
│                                                                          │
│  Algorithm: Breadth-First Dependency Resolution                          │
│                                                                          │
│  1. Build dependency graph from FK constraints                           │
│     orders → users → addresses → countries                               │
│     orders → products → categories                                       │
│                                                                          │
│  2. Start from "root" tables (configurable)                              │
│     Default: Tables with most FKs pointing to them                       │
│     Example: users, products (high fan-out)                              │
│                                                                          │
│  3. Sample root tables (target %)                                        │
│     SELECT * FROM users ORDER BY RAND(seed) LIMIT target_count           │
│                                                                          │
│  4. Resolve dependencies (breadth-first)                                 │
│     For each sampled user:                                               │
│       → Include their orders                                             │
│       → Include products from those orders                               │
│       → Include addresses for user                                       │
│       → Continue until all FKs resolved                                  │
│                                                                          │
│  5. Handle circular dependencies                                         │
│     Mark visited, skip cycles                                            │
│                                                                          │
│  Result: Consistent subset with valid FK relationships                   │
│                                                                          │
│  Configuration:                                                          │
│  ├── root_tables: ["users", "products"]                                 │
│  ├── target_percentage: 1                                               │
│  ├── max_depth: 5 (prevent infinite traversal)                          │
│  └── seed: 12345 (reproducible)                                         │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

2.3 Schema-First Generation

Generate data without a source database.

Schema Definition Options:

  • YAML/JSON schema file
  • Visual schema builder (drag-drop UI)
  • Import from SQL DDL
  • Import from Laravel migrations
  • Import from OpenAPI/JSON Schema

2.4 Mock API Generation

See /product/database-sync for detailed algorithm.

Phase 2 Implementation:

  • Read-Only mode first (GET endpoints)
  • Deterministic via seed + request hashing
  • Infinite pagination support
  • Custom subdomain routing

2.5 Custom Model Training (Web UI)

Cloud training UI adds value beyond OSS local training:

Cloud-Only Data Sources:

  • Database column (from connected DB) ← Key differentiator
  • API endpoint (fetch & extract)
  • S3/GCS bucket

Also Supported (convenience over CLI):

  • File upload (TXT, CSV, JSON)
  • Paste text (quick training)

Cloud Training Advantages:

  • Web UI (no CLI knowledge needed)
  • Train directly from production DB columns
  • Preview source data before training
  • Model versioning & history
  • Team sharing & collaboration
  • Scheduled re-training jobs
  • Quality metrics & comparisons

Phase 2 Success Criteria

  • [ ] Internal production use (dogfooding) working
  • [ ] 10+ beta customers using sync feature
  • [ ] 5+ paying customers
  • [ ] $30K+ ARR
  • [ ] <5% monthly churn

PHASE 3: Scale & Optimization

Strategic Goal: Enterprise readiness, exit preparation

3.1 Rust Core (Performance Optimization)

Trigger Conditions (any of):

  • Customer needs >1M records/minute
  • Memory issues with large models
  • Enterprise RFP requires performance guarantee
  • Competitor ships faster solution

Architecture:

Rust Core (phony-core)
├── N-gram tokenization
├── Frequency map building
├── Weighted random generation
└── Model serialization

    ┌────┴────┬────────┬────────┐
    │         │        │        │
    ▼         ▼        ▼        ▼
 PHP FFI  Node FFI  Python   Go CGO

Benefits:

  • 10-100x performance improvement
  • Single codebase for all languages
  • Can compile to WASM for browser

3.2 Additional Language Support

Decision Signals:

  • GitHub issues requesting language
  • Cloud customers asking for language
  • Gap in market (no good Faker alternative)
  • Strategic partnership opportunity

Candidates (Revenue-Optimized Order):

  • Python (data science, ML, enterprise budgets) ★ Year 2 Priority
  • TypeScript (huge ecosystem, volume play) — Year 3 Optional
  • Go (cloud-native)
  • Ruby (Rails ecosystem)

3.3 Enterprise Features

CategoryFeatures
AuthSSO (SAML, OIDC), SCIM provisioning, RBAC, IP allowlisting
ComplianceSOC2 Type II, GDPR docs, HIPAA BAA, Data residency
DeploymentDedicated instance, VPC peering, On-premise, Air-gapped
Support99.9%+ SLA, 24/7 support, Dedicated success manager

3.4 Advanced Data Features (Tonic-Inspired)

ARR TriggerFeatures
$300K+Database subsetting, NER, Data discovery
$600K+Unstructured de-identification, Document redaction
$1M+Guided redaction, Expert determination, Compliance reports

Phase 3 Success Criteria

  • [ ] $600K-1M+ ARR
  • [ ] 300+ paying customers
  • [ ] SOC2 Type II certified
  • [ ] At least 2 enterprise customers ($5K+/mo)
  • [ ] Rust core (if performance became issue)
  • [ ] At least 1 additional language (if demand)
  • [ ] Exit-ready metrics

Dependency Graph

PHASE 1 (Must complete before Phase 2)

ngram ─────────┬──────────────────────────────────┐
               │                                   │
               ▼                                   ▼
language-model ────▶ language-generator ────▶ phony (main)


                                            phony-laravel


                                            Documentation


                                            OSS LAUNCH

───────────────────────────────────────────────────────────────────

PHASE 2 (Can start after OSS launch)

Platform Foundation ───────┬────────────────────────┐
       │                   │                        │
       ▼                   ▼                        ▼
Schema-First ────▶ Custom Model UI          Database Sync (MVP)
       │                   │                        │
       └──────────┬────────┘                        │
                  │                                 │
                  ▼                                 ▼
            Mock API (Read-Only)            Database Sync (Full)
                  │                                 │
                  ▼                                 │
            Mock API (Stateful)◀────────────────────┘


            MCP Server

───────────────────────────────────────────────────────────────────

PHASE 3 (Signal-driven, not sequential)

┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│   Rust Core      │  │  Enterprise      │  │  Advanced Data   │
│   (if needed)    │  │  Features        │  │  Features        │
└────────┬─────────┘  └────────┬─────────┘  └────────┬─────────┘
         │                     │                     │
         └─────────────────────┴─────────────────────┘


                        Additional Languages

Phony Cloud Platform Specification