Cluster Tenant Handling
Status: Draft | Last updated: 2025-07-15 | Version: v0.0.3
Table of Contents
- Overview
- Environment Architecture
- Provisioning Model
- Migration System
- Versioning & Coordination
- Lifecycle Management
- Platform Integration
- Appendix
1. Overview
This document describes a Kubernetes-native, operator-driven workflow for Arematics infrastructure provisioning, supporting multi-tenant and geo-aware deployments. The system is modular, secure, and environment-aware, enabling:
- Automated provisioning of databases and schemas for every tenant, app, and (optional) country.
- Application of service-versioned SQL migrations embedded within each service container image.
- Tracking of schema versions per service and support for controlled upgrades (opt-in).
- Safe operation across test, QA, and prod environments.
Operators are deployed per-cluster but share the same code base. Image tags (e.g., v1.2.3-test) isolate config differences. The system supports both global and country-scoped apps, robust migration and rollback mechanics, and integrates with the platform backend for provisioning and deletion flows.
2. Environment Architecture
2.1 Environment Layout
| Env | K8s cluster | Postgres instance(s) | Vault namespace |
|---|---|---|---|
| test | test | shared dev instance | vault-test |
| qa | qa | dedicated staging | vault-qa |
| prod | prod | regional HA pairs | vault-prod |
2.2 Data-Isolation Model
Each tenant receives one main Postgres database (e.g., tenant_xyz). Apps with country scoping result in an additional database per country (e.g., tenant_xyz_de). Within each DB, schemas are created per service and app, formatted as service-app-name.
┌────────────────────────┐
│ DB: tenant_xyz (organization) │
│ ├── auth (base service) │
│ ├── entity (base service) │
│ ├── payments (base service) │
│ ├── entity-crm (app, no country) │
└────────────────────────┘
┌────────────────────────┐
│ DB: tenant_xyz_de (country-specific DB) │
│ ├── entity-crm (schema) │
│ └── payments-crm (schema) │
└────────────────────────┘
- One Postgres DB per organization (tenant)
- Extra DB per tenant+country (only when an app is country-scoped)
- Schemas isolate services and apps
2.3 Country-Scope Transitions
- Downgrade (country ➜ global): Select origin country → copy data into new service-app schema → keep old country DB read-only for 30 days.
- Upgrade (global ➜ country): Select target origin → split data; other countries start empty.
Cleanup jobs delete obsolete DBs/schemas after retention expiry.
3. Provisioning Model
3.1 Custom Resource Definitions (CRDs) & Operators
| CRD | Trigger Type | Key actions |
|---|---|---|
TenantProvisionJob | K8s Job | • Create DB tenant_<slug>• Create base service schemas • Run one-time provisioning tasks |
AppProvisionJob | K8s Job | • Resolve target DB • Create service-app schemas• Run one-time provisioning tasks |
TenantSchemaVersion | Migration Operator | • Track currentVersion / desiredVersion• Handle upgrades, retries, dry-runs |
Provisioning jobs are:
- One-time executed and idempotent
- Destroyed or garbage collected after success
- Reversible via a dedicated deletion job, not via finalizers
Reconciler vs Job Model
- Tenant and App provisioning use Kubernetes Jobs created from CRDs.
- A central controller (optional) may monitor Job completion and status for observability.
- The Migration Operator is a long-lived reconciler for version tracking, upgrades, dry-runs, and dependency checks.
3.2 Vault Secret Handling & Schema Registry
All K8s Jobs used for provisioning (e.g., schema creation, backup) are responsible for writing necessary secrets into Vault (per-tenant path). Secrets are created once and reused unless explicitly rotated.
Each tenant DB includes a schema_registry table for tracking migration state:
CREATE TABLE schema_registry (
service text,
schema_name text,
current_version text,
desired_version text,
status text -- pending | migrating | complete | failed
);
Operators update this table and CRD status fields in lock-step.
3.3 Service Discovery & Targeting
- Operators locate service pods for migration by custom labels:
arematics.com/service-name: <service-name>
arematics.com/service-version: <semver> - Operators exec into the first Ready pod matching label selectors in authorized namespaces (e.g.,
arematics-*).
4. Migration System
4.1 Migration Delivery & Execution
- Each service image contains its own migrations under
/migrations/<service>/*.sql. - Operator execs into the running service Pod, reads SQL files, and pipes them to Postgres via
pgx. - Dry-Run Mode: Set
spec.dryRun=trueonTenantSchemaVersionto execute migrations inside a transaction and roll back, emitting a diff report.- Each SQL migration file must contain a
-- ROLLBACK:section describing how to undo the changes. Operator runs each file inside a transaction by default.
- Each SQL migration file must contain a
- If
backupBeforeMigration: trueis set, a snapshot job is triggered before applying the migration:- PostgreSQL-level dumps via
pg_dumpto tenant S3 buckets - Optional: Velero for volume snapshots
- PostgreSQL-level dumps via
- Jobs respect a configured timeout (default: 10m) and retry twice on failure before blocking migration with a warning.
- Operators emit backup completion logs and Prometheus metrics (
tenant_backup_duration_seconds,tenant_backup_failed_total). - Cluster-wide concurrency limit (default: 3 parallel migrations), configurable via
MigrationControllerConfigCRD or environment variable. Operators queue excess jobs and expose this viamigration_queue_lengthmetric.
Migration Execution (Locking, Rollback, Backup, Metrics)
- PostgreSQL advisory locks (
pg_advisory_lock(hashtext('tenant:service'))) during migration phase, per schema. - Operator gracefully skips or retries if the lock is held by another executor.
- Migration run updates both
schema_registry.statusin Postgres andstatus.phasein the relevant CRD. - States:
pending,migrating,complete,failed,blocked - Failed states are surfaced to Prometheus and UI. Retry requires manual intervention.
- Retrying a failed migration:
- Locks the schema
- Re-runs all SQL steps not yet marked as applied (based on file naming)
- Transitions state back to
migrating → completeon success
- Admin
kubectlplugin may list failed schemas, trigger retry, and view logs.
4.2 Schema-Registry Table
See Provisioning Model.
4.3 Migration UI & Workflow
Feature Overview
| Feature | Triggered by | Available in UI? | Automation Option? |
|---|---|---|---|
| List current schema versions | Platform | ✅ Yes | N/A |
| Check for updates | Operator | ✅ Yes | ✅ Background scan |
| Dry-run migration | Tenant/Admin | ✅ Yes | ✅ (via CRD) |
| Apply migration | Tenant/Admin | ✅ Yes | ❌ Manual only |
| Retry failed migration | Admin only | ✅ Yes | ✅ (admin tools) |
| View logs/status/history | Platform | ✅ Yes | ✅ via CRD + Prometheus |
Migration Journey per Schema
START
│
│ (background scan or manual)
▼
Check for Upgrade Availability
│
├── No → Stay on current version
▼
Yes
│
▼
Dry-Run Option → Show Diff Output
│
▼
Apply Migration
│
▼
Track Status: migrating → complete / failed
│
├── Success → Done
└── Failure → Retry via UI (admin only)
UI Components
- Schema Overview: Shows schema name, current/desired version, status, last migration time, logs (preview), filter/search, warnings for failed/blocked.
- Dry-Run Interface: Button:
Try Migration to vX.Y.Z, executes SQL in a transaction (rolled back), shows diff output:
+ ALTER TABLE user ADD COLUMN tfa_enabled BOOLEAN;
- DROP TABLE old_sessions;
Actions: Cancel, Apply Migration, Download SQL Diff
- Apply Migration: Triggers patch:
UI shows progress, logs, and status transitions. If
spec:
desiredVersion: vX.Y.Z
dryRun: falsebackupBeforeMigration: true, UI prompts for backup confirmation. - Retry Failed Migration: Admin-only, button:
Retry Migration, re-applies patch for the same version.
RBAC/Access Control
| Role | Can Migrate | Can Dry Run | Can Retry |
|---|---|---|---|
| Tenant Admin | ✅ | ✅ | ❌ |
| Platform Admin | ✅ | ✅ | ✅ |
API/Backend Integration
| Action | How it's triggered |
|---|---|
| Start dry-run | PATCH CRD: dryRun: true |
| Start real migration | PATCH CRD: dryRun: false, set desiredVersion |
| Retry failed migration | PATCH same desiredVersion again |
| Read status and logs | From CRD .status and schema_registry table |
| Check dependency conflicts | Surfaced via .status (blocked) |
5. Versioning & Coordination
5.1 Service Pods per Version
- The deployment controller keeps ≥1 Pod for each service version still used by at least one tenant.
- Version Deprecation policy: Notify tenants N days ahead → auto-upgrade or block.
5.2 Multi-Service Coordination & Version Planning
- Services may depend on specific versions of others (e.g., auth@1.4 requires payments@2.0).
- Each service version may define dependency rules in
dependencies.ymllocated in/migrations/<service>/. - Migration Operator checks rules before applying migrations; if dependency not met:
- Blocks migration, sets schema status to
blocked, emits event + Prometheus alert.
- Blocks migration, sets schema status to
- Cross-service dependencies must be tracked cluster-wide.
Example:
# /migrations/entity/dependencies.yml
requires:
- service: auth
minVersion: v1.2.0
- service: payments
minVersion: v2.1.0
- Central service version registry table (e.g.
infra.service_versions) at cluster level tracks which service versions must be deployed to satisfy all tenant needs. This registry can drive GitOps pipelines or Helm values per namespace.
5.3 Version Compatibility & Dependencies
- Service-A@v1.4 can declare it requires service-B >= v2.0.
- Dependencies encoded in a YAML file inside the image.
- Operator blocks migration if dependency not satisfied.
6. Lifecycle Management
6.1 Retention Policies & Schema Lifecycle
Certain transitions (e.g., downgrade from country-level to app-level) require retention of old schemas:
retentionPolicy:
schemaTransition: 30d # Retain downgraded country schemas
deprecatedVersions: 60d # Keep old versions active for rollbacks or audits
- Cleanup Jobs scan these policies and delete data beyond expiration.
- Per-tenant or per-app override via annotations or CRD fields.
- Policies tracked per schema and surfaced via
schema_registry.retention_expires_at.
6.2 Failure Recovery & Deletion Policy
- Tenant deletion policies must comply with legal requirements (e.g., GDPR):
deletionPolicy:
mode: retained | immediate | manual
gracePeriod: 30dretained: preserves data for gracePeriod before purgingimmediate: drops all schemas and databases after deletion requestmanual: requires manual intervention to finalize
- Setting stored in CRD metadata and reflected in cleanup controller.
6.3 Failure Recovery & Migration State Management
- Every migration run updates both
schema_registry.statusin Postgres andstatus.phasein CRD. - States:
pending,migrating,complete,failed - Failed states surfaced to Prometheus + UI. Retry must be triggered manually.
- Retrying a failed migration locks the schema, re-runs unapplied steps, and transitions state on success.
- Admin
kubectlplugin may list failed schemas, trigger retries, and view logs.
7. Platform Integration
While operators automate infrastructure provisioning inside Kubernetes, they rely on CRDs being created to trigger actions.
To connect the Arematics Platform backend to the infrastructure layer:
- The Platform backend must create or update:
TenantProvisionJobCRDs when a new organization is registeredAppProvisionJobCRDs when a user creates or modifies an app
- Typically performed via:
- Secure service account with CRD write access in
infra-systemnamespace - Kubernetes API via client-go or HTTP
- Optionally: platform-side webhook service forwarding to Kubernetes operator proxy
- Secure service account with CRD write access in
Example flow:
User signs up ➜ Arematics backend creates TenantProvisionJob ➜ CRD created ➜ Job triggers provisioning
Additional Considerations:
- Audit logs should capture who created each CRD
- Tenants can query infra status via platform (mirroring CRD
statusfields) - Tenant deletion should trigger CRD cleanup (finalizers may be needed)
- All job CRDs are idempotent. In case of failure (e.g., partial outage), orphaned DBs/schemas can be deleted and the job retried. Since data creation happens after provisioning completes, there is no critical data loss at this stage.
8. Appendix
8.1 RBAC Table
| Role | Can Migrate | Can Dry Run | Can Retry |
|---|---|---|---|
| Tenant Admin | ✅ | ✅ | ❌ |
| Platform Admin | ✅ | ✅ | ✅ |
8.2 Metrics Overview Table
| Metric | Type | Labels |
|---|---|---|
| tenant_migration_duration_seconds | Histogram | cluster, service, tenant, version |
| tenant_migration_total | Counter | outcome (success/failed) |
| tenant_schema_version | Gauge | tenant, service, schema, version |
| tenant_backup_duration_seconds | Histogram | tenant, cluster, outcome |
| tenant_backup_failed_total | Counter | tenant, cluster |
- Exposed via
/metricson the operator Pod. - Grafana dashboard: Schema Version Heat-map + Migration Latency.
8.3 UI API Endpoint Summary
| Action | How it's triggered |
|---|---|
| Start dry-run | PATCH CRD: dryRun: true |
| Start real migration | PATCH CRD: dryRun: false, set desiredVersion |
| Retry failed migration | PATCH same desiredVersion again |
| Read status and logs | From CRD .status and schema_registry table |
| Check dependency conflicts | Surfaced via .status (blocked) |
Notes & Clarifications
- Schema Initialization: Migrations must include DDL and may optionally include initial fixture data via
init_data.sql. - Multi-service Version Alignment: Managed by the Migration Operator.
autoUpgrade: trueenables automatic minor upgrades. - Country-Scoped CRDs: A single
AppCRD withcountryScoped: truecontrols creation oftenant_xyz_<country>DBs. Schemas inside follow normal service-app structure. - Migration Sequencing: Dependencies must be respected; optional field
migrationOrdercan define sequencing explicitly. - Migration Rollback: All SQL files must include a
-- ROLLBACK:section describing reverse operations. Migrations are executed in isolated transactions. - Pre-Backup: Backups are required before risky migrations. Use
backupBeforeMigration: trueto trigger platform hooks or pg_dump/Velero scripts. - No Cross-Region Sync: Country-level databases are isolated. Any sync is left to higher-level process orchestration.