Phony Cloud Platform - Implementation Plan
Overview: Three-Phase Strategy
┌─────────────────────────────────────────────────────────────────────────┐
│ PHONY IMPLEMENTATION STRATEGY │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ PHASE 1: OPEN SOURCE FOUNDATION │
│ ══════════════════════════════════ │
│ Goal: Become the best Faker alternative for PHP/Laravel │
│ Revenue: $0 (community building) │
│ Timeline: Year 1, Q1-Q2 │
│ │
│ │ │
│ ▼ │
│ │
│ PHASE 2: CLOUD PLATFORM │
│ ═══════════════════════════ │
│ Goal: Monetize with SaaS features OSS can't provide │
│ Revenue: $0 → $150K ARR │
│ Timeline: Year 1 Q3 → Year 2 │
│ │
│ │ │
│ ▼ │
│ │
│ PHASE 3: SCALE & OPTIMIZATION │
│ ═════════════════════════════════ │
│ Goal: Enterprise features, performance, exit preparation │
│ Revenue: $150K → $600K+ ARR → Exit │
│ Timeline: Year 3-5 │
│ │
└─────────────────────────────────────────────────────────────────────────┘PHASE 1: Open Source Foundation
Strategic Goal: Establish Phony as THE modern Faker alternative for PHP
Scope Boundary: OSS = Faker alternative ONLY. No sync, no anonymization, no hosted features.
Why OSS First:
- Build trust and community before asking for money
- Validate N-gram engine with real-world usage
- Create upgrade path: OSS users need sync → Cloud customers
- SEO/visibility: GitHub stars, Packagist downloads
- Avoid Neosync mistake: They open-sourced sync features, couldn't monetize, got acquired
1.1 Core Engine (phonyland/ngram)
Foundation for everything. Must be rock-solid.
| Deliverable | Description |
|---|---|
| N-gram tokenization | Word, character, and hybrid modes |
| Frequency map building | Efficient storage of n-gram frequencies |
| Probability-weighted generation | Weighted random walk algorithm |
| Seed support | Deterministic output for CI/CD |
| excludeOriginals option | Never output training data |
| Performance | 100K+ generations/second target |
Success Criteria: 100% test coverage, benchmarks documented, zero external dependencies
1.2 Model System (phonyland/language-model)
| Deliverable | Description |
|---|---|
| Model file format | .phony binary or .json |
| Model metadata | N-gram size, training source, version |
| Model loading/caching | Lazy loading for memory efficiency |
| Model training | Train from arrays, files (txt, csv, json) |
| Model saving | Export trained models to .phony files |
| Pre-trained models | Ship with library (names, emails, etc.) |
Note: Local training from files is included in OSS. Cloud adds training from DB columns, versioning, and team sharing.
1.3 Generator Engine (phonyland/language-generator)
| Deliverable | Description |
|---|---|
| Generate from model | Core generation API |
| Constraints | Min/max length, prefix, suffix |
| Batch generation | Efficient for large volumes |
| Format templates | "{first} {last}" patterns |
| Pipe/chain operations | Composable transformations |
1.4 Pre-trained Models
Priority models (trained from public data):
person_names(multi-cultural)company_namesstreet_namescity_namesemail_usernamesproduct_nameslorem_words
1.5 Main Library (phonyland/phony)
// API Design
Phony::name()->first(); // "Mehmet"
Phony::name()->full(); // "Ayşe Yılmaz"
Phony::email()->generate(); // "mehmet.yilmaz@example.com"
Phony::model('custom')->generate(); // From custom model
Phony::seed(12345)->name(); // Deterministic
Phony::unique()->email(); // Guaranteed unique1.6 Laravel Integration (phonyland/phony-laravel)
| Deliverable | Description |
|---|---|
| Service Provider | Auto-discovery |
| Facade | Phony:: |
| Factory integration | fn() => Phony::name()->full() |
| Artisan commands | phony:generate, phony:train, phony:models |
| Config file | config/phony.php |
| PhonySeeder trait | Easy seeding |
Artisan Commands:
# Generate data
php artisan phony:generate users 1000
# Train model from file
php artisan phony:train --source=names.txt -o=names.phony
# List available models (pre-trained + custom)
php artisan phony:modelsPhase 1 Success Criteria
- [ ] 500+ GitHub stars
- [ ] 200+ weekly Packagist downloads
- [ ] 10+ community issues/PRs
- [ ] Featured in Laravel News or similar
- [ ] Local model training working & documented
- [ ] Ready for Cloud development
PHASE 2: Cloud Platform
Strategic Goal: Monetize with features that require infrastructure
Implementation Order
| Order | Feature | Rationale |
|---|---|---|
| 1 | Platform Foundation | Everything depends on this |
| 2 | Schema-First Generation | Simpler than DB sync, validates core |
| 3 | Custom Model Training UI | Differentiator from competitors |
| 4 | Database Sync (MVP) | Core enterprise feature, MySQL only |
| 5 | Mock API (Read-Only) | Unique feature, no competitor has |
| 6 | Database Sync (Full) | Add PostgreSQL, incremental, scheduled |
| 7 | Mock API (Stateful) | After read-only proves valuable |
| 8 | Hybrid LLM Mode | Nice-to-have, not critical path |
| 9 | MCP Server | Once core features stable |
2.1 Platform Foundation
Technical Stack (Nuxt + Go + Rust):
DASHBOARD (Nuxt) GO ENGINE
├── Nuxt 3 ├── Go 1.22+
├── Vue 3 + Composition API ├── pgx (PostgreSQL)
├── TypeScript ├── go-mysql-driver
├── Tailwind CSS ├── go-sqlite3
├── Auth.js (authentication) ├── net/http (API server)
├── Stripe SDK (billing) └── asynq (job queue)
└── Nuxt UI (components)
RUST CORE INFRASTRUCTURE
├── N-gram engine (5M/sec) ├── PostgreSQL 15
├── MessagePack (serde) ├── Redis (cache, queue)
└── FFI exports (C ABI) └── S3-compatible storageWhy This Stack:
- Nuxt: Vue familiar (from Inertia planning), TypeScript, easy deploy
- Go: Best DB libs (pgx 30-50% faster), goroutines, fast HTTP for Mock API
- Rust: Maximum N-gram performance (5M/sec), memory efficient
- No Laravel: Go handles all backend; Laravel would be overhead
Core Features:
- User registration & authentication
- Team/organization support
- Project management
- API key management
- Usage tracking & metering
- Billing & subscription management
2.2 Database Sync & Anonymization
┌─────────────────────────────────────────────────────────────────────────┐
│ SYNC ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Customer Environment Phony Cloud │
│ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Production │ │ │ │
│ │ Database │──── Secure ─────▶│ 1. Schema Analysis │ │
│ │ │ Connection │ 2. PII Detection │ │
│ └─────────────┘ │ 3. Transform Rules │ │
│ │ 4. Generate/Anonymize │ │
│ ┌─────────────┐ │ 5. Output │ │
│ │ Staging │◀─── Output ──────│ │ │
│ │ Database │ └─────────────────────────┘ │
│ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘Database Connectors (Priority Order):
- MySQL (first)
- PostgreSQL
- SQLite (for local dev)
- MariaDB
Schema Introspection:
- Auto-detect tables, columns, types
- Identify relationships (FK inference)
- Detect PII columns (heuristics + patterns)
- Generate transformation recommendations
Sync Modes:
- Full sync (first run)
- Incremental sync (subsequent runs)
- Subset sync (% of data with FK integrity)
- Scheduled syncs (cron-like)
Subset Sync Algorithm
┌─────────────────────────────────────────────────────────────────────────┐
│ SUBSET SYNC WITH FK INTEGRITY │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Problem: Get 1% of data while maintaining FK relationships │
│ │
│ Algorithm: Breadth-First Dependency Resolution │
│ │
│ 1. Build dependency graph from FK constraints │
│ orders → users → addresses → countries │
│ orders → products → categories │
│ │
│ 2. Start from "root" tables (configurable) │
│ Default: Tables with most FKs pointing to them │
│ Example: users, products (high fan-out) │
│ │
│ 3. Sample root tables (target %) │
│ SELECT * FROM users ORDER BY RAND(seed) LIMIT target_count │
│ │
│ 4. Resolve dependencies (breadth-first) │
│ For each sampled user: │
│ → Include their orders │
│ → Include products from those orders │
│ → Include addresses for user │
│ → Continue until all FKs resolved │
│ │
│ 5. Handle circular dependencies │
│ Mark visited, skip cycles │
│ │
│ Result: Consistent subset with valid FK relationships │
│ │
│ Configuration: │
│ ├── root_tables: ["users", "products"] │
│ ├── target_percentage: 1 │
│ ├── max_depth: 5 (prevent infinite traversal) │
│ └── seed: 12345 (reproducible) │
│ │
└─────────────────────────────────────────────────────────────────────────┘2.3 Schema-First Generation
Generate data without a source database.
Schema Definition Options:
- YAML/JSON schema file
- Visual schema builder (drag-drop UI)
- Import from SQL DDL
- Import from Laravel migrations
- Import from OpenAPI/JSON Schema
2.4 Mock API Generation
See /product/database-sync for detailed algorithm.
Phase 2 Implementation:
- Read-Only mode first (GET endpoints)
- Deterministic via seed + request hashing
- Infinite pagination support
- Custom subdomain routing
2.5 Custom Model Training (Web UI)
Cloud training UI adds value beyond OSS local training:
Cloud-Only Data Sources:
- Database column (from connected DB) ← Key differentiator
- API endpoint (fetch & extract)
- S3/GCS bucket
Also Supported (convenience over CLI):
- File upload (TXT, CSV, JSON)
- Paste text (quick training)
Cloud Training Advantages:
- Web UI (no CLI knowledge needed)
- Train directly from production DB columns
- Preview source data before training
- Model versioning & history
- Team sharing & collaboration
- Scheduled re-training jobs
- Quality metrics & comparisons
Phase 2 Success Criteria
- [ ] Internal production use (dogfooding) working
- [ ] 10+ beta customers using sync feature
- [ ] 5+ paying customers
- [ ] $30K+ ARR
- [ ] <5% monthly churn
PHASE 3: Scale & Optimization
Strategic Goal: Enterprise readiness, exit preparation
3.1 Rust Core (Performance Optimization)
Trigger Conditions (any of):
- Customer needs >1M records/minute
- Memory issues with large models
- Enterprise RFP requires performance guarantee
- Competitor ships faster solution
Architecture:
Rust Core (phony-core)
├── N-gram tokenization
├── Frequency map building
├── Weighted random generation
└── Model serialization
│
┌────┴────┬────────┬────────┐
│ │ │ │
▼ ▼ ▼ ▼
PHP FFI Node FFI Python Go CGOBenefits:
- 10-100x performance improvement
- Single codebase for all languages
- Can compile to WASM for browser
3.2 Additional Language Support
Decision Signals:
- GitHub issues requesting language
- Cloud customers asking for language
- Gap in market (no good Faker alternative)
- Strategic partnership opportunity
Candidates (Revenue-Optimized Order):
- Python (data science, ML, enterprise budgets) ★ Year 2 Priority
- TypeScript (huge ecosystem, volume play) — Year 3 Optional
- Go (cloud-native)
- Ruby (Rails ecosystem)
3.3 Enterprise Features
| Category | Features |
|---|---|
| Auth | SSO (SAML, OIDC), SCIM provisioning, RBAC, IP allowlisting |
| Compliance | SOC2 Type II, GDPR docs, HIPAA BAA, Data residency |
| Deployment | Dedicated instance, VPC peering, On-premise, Air-gapped |
| Support | 99.9%+ SLA, 24/7 support, Dedicated success manager |
3.4 Advanced Data Features (Tonic-Inspired)
| ARR Trigger | Features |
|---|---|
| $300K+ | Database subsetting, NER, Data discovery |
| $600K+ | Unstructured de-identification, Document redaction |
| $1M+ | Guided redaction, Expert determination, Compliance reports |
Phase 3 Success Criteria
- [ ] $600K-1M+ ARR
- [ ] 300+ paying customers
- [ ] SOC2 Type II certified
- [ ] At least 2 enterprise customers ($5K+/mo)
- [ ] Rust core (if performance became issue)
- [ ] At least 1 additional language (if demand)
- [ ] Exit-ready metrics
Dependency Graph
PHASE 1 (Must complete before Phase 2)
ngram ─────────┬──────────────────────────────────┐
│ │
▼ ▼
language-model ────▶ language-generator ────▶ phony (main)
│
▼
phony-laravel
│
▼
Documentation
│
▼
OSS LAUNCH
───────────────────────────────────────────────────────────────────
PHASE 2 (Can start after OSS launch)
Platform Foundation ───────┬────────────────────────┐
│ │ │
▼ ▼ ▼
Schema-First ────▶ Custom Model UI Database Sync (MVP)
│ │ │
└──────────┬────────┘ │
│ │
▼ ▼
Mock API (Read-Only) Database Sync (Full)
│ │
▼ │
Mock API (Stateful)◀────────────────────┘
│
▼
MCP Server
───────────────────────────────────────────────────────────────────
PHASE 3 (Signal-driven, not sequential)
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Rust Core │ │ Enterprise │ │ Advanced Data │
│ (if needed) │ │ Features │ │ Features │
└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘
│ │ │
└─────────────────────┴─────────────────────┘
│
▼
Additional Languages