Skip to main content

Cluster Tenant Handling

Status: Draft | Last updated: 2025-07-15 | Version: v0.0.3


Table of Contents

  1. Overview
  2. Environment Architecture
  3. Provisioning Model
  4. Migration System
  5. Versioning & Coordination
  6. Lifecycle Management
  7. Platform Integration
  8. Appendix

1. Overview

This document describes a Kubernetes-native, operator-driven workflow for Arematics infrastructure provisioning, supporting multi-tenant and geo-aware deployments. The system is modular, secure, and environment-aware, enabling:

  • Automated provisioning of databases and schemas for every tenant, app, and (optional) country.
  • Application of service-versioned SQL migrations embedded within each service container image.
  • Tracking of schema versions per service and support for controlled upgrades (opt-in).
  • Safe operation across test, QA, and prod environments.

Operators are deployed per-cluster but share the same code base. Image tags (e.g., v1.2.3-test) isolate config differences. The system supports both global and country-scoped apps, robust migration and rollback mechanics, and integrates with the platform backend for provisioning and deletion flows.


2. Environment Architecture

2.1 Environment Layout

EnvK8s clusterPostgres instance(s)Vault namespace
testtestshared dev instancevault-test
qaqadedicated stagingvault-qa
prodprodregional HA pairsvault-prod

2.2 Data-Isolation Model

Each tenant receives one main Postgres database (e.g., tenant_xyz). Apps with country scoping result in an additional database per country (e.g., tenant_xyz_de). Within each DB, schemas are created per service and app, formatted as service-app-name.

┌────────────────────────┐
│ DB: tenant_xyz (organization) │
│ ├── auth (base service) │
│ ├── entity (base service) │
│ ├── payments (base service) │
│ ├── entity-crm (app, no country) │
└────────────────────────┘

┌────────────────────────┐
│ DB: tenant_xyz_de (country-specific DB) │
│ ├── entity-crm (schema) │
│ └── payments-crm (schema) │
└────────────────────────┘
  • One Postgres DB per organization (tenant)
  • Extra DB per tenant+country (only when an app is country-scoped)
  • Schemas isolate services and apps

2.3 Country-Scope Transitions

  • Downgrade (country ➜ global): Select origin country → copy data into new service-app schema → keep old country DB read-only for 30 days.
  • Upgrade (global ➜ country): Select target origin → split data; other countries start empty.

Cleanup jobs delete obsolete DBs/schemas after retention expiry.


3. Provisioning Model

3.1 Custom Resource Definitions (CRDs) & Operators

CRDTrigger TypeKey actions
TenantProvisionJobK8s Job• Create DB tenant_<slug>
• Create base service schemas
• Run one-time provisioning tasks
AppProvisionJobK8s Job• Resolve target DB
• Create service-app schemas
• Run one-time provisioning tasks
TenantSchemaVersionMigration Operator• Track currentVersion / desiredVersion
• Handle upgrades, retries, dry-runs

Provisioning jobs are:

  • One-time executed and idempotent
  • Destroyed or garbage collected after success
  • Reversible via a dedicated deletion job, not via finalizers

Reconciler vs Job Model

  • Tenant and App provisioning use Kubernetes Jobs created from CRDs.
  • A central controller (optional) may monitor Job completion and status for observability.
  • The Migration Operator is a long-lived reconciler for version tracking, upgrades, dry-runs, and dependency checks.

3.2 Vault Secret Handling & Schema Registry

All K8s Jobs used for provisioning (e.g., schema creation, backup) are responsible for writing necessary secrets into Vault (per-tenant path). Secrets are created once and reused unless explicitly rotated.

Each tenant DB includes a schema_registry table for tracking migration state:

CREATE TABLE schema_registry (
service text,
schema_name text,
current_version text,
desired_version text,
status text -- pending | migrating | complete | failed
);

Operators update this table and CRD status fields in lock-step.

3.3 Service Discovery & Targeting

  • Operators locate service pods for migration by custom labels:
    arematics.com/service-name: <service-name>
    arematics.com/service-version: <semver>
  • Operators exec into the first Ready pod matching label selectors in authorized namespaces (e.g., arematics-*).

4. Migration System

4.1 Migration Delivery & Execution

  • Each service image contains its own migrations under /migrations/<service>/*.sql.
  • Operator execs into the running service Pod, reads SQL files, and pipes them to Postgres via pgx.
  • Dry-Run Mode: Set spec.dryRun=true on TenantSchemaVersion to execute migrations inside a transaction and roll back, emitting a diff report.
    • Each SQL migration file must contain a -- ROLLBACK: section describing how to undo the changes. Operator runs each file inside a transaction by default.
  • If backupBeforeMigration: true is set, a snapshot job is triggered before applying the migration:
    • PostgreSQL-level dumps via pg_dump to tenant S3 buckets
    • Optional: Velero for volume snapshots
  • Jobs respect a configured timeout (default: 10m) and retry twice on failure before blocking migration with a warning.
  • Operators emit backup completion logs and Prometheus metrics (tenant_backup_duration_seconds, tenant_backup_failed_total).
  • Cluster-wide concurrency limit (default: 3 parallel migrations), configurable via MigrationControllerConfig CRD or environment variable. Operators queue excess jobs and expose this via migration_queue_length metric.

Migration Execution (Locking, Rollback, Backup, Metrics)

  • PostgreSQL advisory locks (pg_advisory_lock(hashtext('tenant:service'))) during migration phase, per schema.
  • Operator gracefully skips or retries if the lock is held by another executor.
  • Migration run updates both schema_registry.status in Postgres and status.phase in the relevant CRD.
  • States: pending, migrating, complete, failed, blocked
  • Failed states are surfaced to Prometheus and UI. Retry requires manual intervention.
  • Retrying a failed migration:
    • Locks the schema
    • Re-runs all SQL steps not yet marked as applied (based on file naming)
    • Transitions state back to migrating → complete on success
  • Admin kubectl plugin may list failed schemas, trigger retry, and view logs.

4.2 Schema-Registry Table

See Provisioning Model.

4.3 Migration UI & Workflow

Feature Overview

FeatureTriggered byAvailable in UI?Automation Option?
List current schema versionsPlatform✅ YesN/A
Check for updatesOperator✅ Yes✅ Background scan
Dry-run migrationTenant/Admin✅ Yes✅ (via CRD)
Apply migrationTenant/Admin✅ Yes❌ Manual only
Retry failed migrationAdmin only✅ Yes✅ (admin tools)
View logs/status/historyPlatform✅ Yes✅ via CRD + Prometheus

Migration Journey per Schema

START

│ (background scan or manual)

Check for Upgrade Availability

├── No → Stay on current version

Yes


Dry-Run Option → Show Diff Output


Apply Migration


Track Status: migrating → complete / failed

├── Success → Done
└── Failure → Retry via UI (admin only)

UI Components

  • Schema Overview: Shows schema name, current/desired version, status, last migration time, logs (preview), filter/search, warnings for failed/blocked.
  • Dry-Run Interface: Button: Try Migration to vX.Y.Z, executes SQL in a transaction (rolled back), shows diff output:
  + ALTER TABLE user ADD COLUMN tfa_enabled BOOLEAN;
- DROP TABLE old_sessions;

Actions: Cancel, Apply Migration, Download SQL Diff

  • Apply Migration: Triggers patch:
    spec:
    desiredVersion: vX.Y.Z
    dryRun: false
    UI shows progress, logs, and status transitions. If backupBeforeMigration: true, UI prompts for backup confirmation.
  • Retry Failed Migration: Admin-only, button: Retry Migration, re-applies patch for the same version.

RBAC/Access Control

RoleCan MigrateCan Dry RunCan Retry
Tenant Admin
Platform Admin

API/Backend Integration

ActionHow it's triggered
Start dry-runPATCH CRD: dryRun: true
Start real migrationPATCH CRD: dryRun: false, set desiredVersion
Retry failed migrationPATCH same desiredVersion again
Read status and logsFrom CRD .status and schema_registry table
Check dependency conflictsSurfaced via .status (blocked)

5. Versioning & Coordination

5.1 Service Pods per Version

  • The deployment controller keeps ≥1 Pod for each service version still used by at least one tenant.
  • Version Deprecation policy: Notify tenants N days ahead → auto-upgrade or block.

5.2 Multi-Service Coordination & Version Planning

  • Services may depend on specific versions of others (e.g., auth@1.4 requires payments@2.0).
  • Each service version may define dependency rules in dependencies.yml located in /migrations/<service>/.
  • Migration Operator checks rules before applying migrations; if dependency not met:
    • Blocks migration, sets schema status to blocked, emits event + Prometheus alert.
  • Cross-service dependencies must be tracked cluster-wide.

Example:

# /migrations/entity/dependencies.yml
requires:
- service: auth
minVersion: v1.2.0
- service: payments
minVersion: v2.1.0
  • Central service version registry table (e.g. infra.service_versions) at cluster level tracks which service versions must be deployed to satisfy all tenant needs. This registry can drive GitOps pipelines or Helm values per namespace.

5.3 Version Compatibility & Dependencies

  • Service-A@v1.4 can declare it requires service-B >= v2.0.
  • Dependencies encoded in a YAML file inside the image.
  • Operator blocks migration if dependency not satisfied.

6. Lifecycle Management

6.1 Retention Policies & Schema Lifecycle

Certain transitions (e.g., downgrade from country-level to app-level) require retention of old schemas:

retentionPolicy:
schemaTransition: 30d # Retain downgraded country schemas
deprecatedVersions: 60d # Keep old versions active for rollbacks or audits
  • Cleanup Jobs scan these policies and delete data beyond expiration.
  • Per-tenant or per-app override via annotations or CRD fields.
  • Policies tracked per schema and surfaced via schema_registry.retention_expires_at.

6.2 Failure Recovery & Deletion Policy

  • Tenant deletion policies must comply with legal requirements (e.g., GDPR):
    deletionPolicy:
    mode: retained | immediate | manual
    gracePeriod: 30d
    • retained: preserves data for gracePeriod before purging
    • immediate: drops all schemas and databases after deletion request
    • manual: requires manual intervention to finalize
  • Setting stored in CRD metadata and reflected in cleanup controller.

6.3 Failure Recovery & Migration State Management

  • Every migration run updates both schema_registry.status in Postgres and status.phase in CRD.
  • States: pending, migrating, complete, failed
  • Failed states surfaced to Prometheus + UI. Retry must be triggered manually.
  • Retrying a failed migration locks the schema, re-runs unapplied steps, and transitions state on success.
  • Admin kubectl plugin may list failed schemas, trigger retries, and view logs.

7. Platform Integration

While operators automate infrastructure provisioning inside Kubernetes, they rely on CRDs being created to trigger actions.

To connect the Arematics Platform backend to the infrastructure layer:

  • The Platform backend must create or update:
    • TenantProvisionJob CRDs when a new organization is registered
    • AppProvisionJob CRDs when a user creates or modifies an app
  • Typically performed via:
    • Secure service account with CRD write access in infra-system namespace
    • Kubernetes API via client-go or HTTP
    • Optionally: platform-side webhook service forwarding to Kubernetes operator proxy

Example flow:

User signs up ➜ Arematics backend creates TenantProvisionJob ➜ CRD created ➜ Job triggers provisioning

Additional Considerations:

  • Audit logs should capture who created each CRD
  • Tenants can query infra status via platform (mirroring CRD status fields)
  • Tenant deletion should trigger CRD cleanup (finalizers may be needed)
  • All job CRDs are idempotent. In case of failure (e.g., partial outage), orphaned DBs/schemas can be deleted and the job retried. Since data creation happens after provisioning completes, there is no critical data loss at this stage.

8. Appendix

8.1 RBAC Table

RoleCan MigrateCan Dry RunCan Retry
Tenant Admin
Platform Admin

8.2 Metrics Overview Table

MetricTypeLabels
tenant_migration_duration_secondsHistogramcluster, service, tenant, version
tenant_migration_totalCounteroutcome (success/failed)
tenant_schema_versionGaugetenant, service, schema, version
tenant_backup_duration_secondsHistogramtenant, cluster, outcome
tenant_backup_failed_totalCountertenant, cluster
  • Exposed via /metrics on the operator Pod.
  • Grafana dashboard: Schema Version Heat-map + Migration Latency.

8.3 UI API Endpoint Summary

ActionHow it's triggered
Start dry-runPATCH CRD: dryRun: true
Start real migrationPATCH CRD: dryRun: false, set desiredVersion
Retry failed migrationPATCH same desiredVersion again
Read status and logsFrom CRD .status and schema_registry table
Check dependency conflictsSurfaced via .status (blocked)

Notes & Clarifications

  1. Schema Initialization: Migrations must include DDL and may optionally include initial fixture data via init_data.sql.
  2. Multi-service Version Alignment: Managed by the Migration Operator. autoUpgrade: true enables automatic minor upgrades.
  3. Country-Scoped CRDs: A single App CRD with countryScoped: true controls creation of tenant_xyz_<country> DBs. Schemas inside follow normal service-app structure.
  4. Migration Sequencing: Dependencies must be respected; optional field migrationOrder can define sequencing explicitly.
  5. Migration Rollback: All SQL files must include a -- ROLLBACK: section describing reverse operations. Migrations are executed in isolated transactions.
  6. Pre-Backup: Backups are required before risky migrations. Use backupBeforeMigration: true to trigger platform hooks or pg_dump/Velero scripts.
  7. No Cross-Region Sync: Country-level databases are isolated. Any sync is left to higher-level process orchestration.