Z-Arch Platform Documentation
Table of Contents
I - Understanding Z-Arch
Introduction
Why Z-Arch exists
Application logic is cheap and easy to generate. Backend security architecture is not. A solo developer can produce routes, handlers, and database calls in a day. What they cannot reliably produce with the same momentum is a correct perimeter, a consistent authentication model, and defense-in-depth in a scalable shape that does not leak cost or drift over time. The usual result is either improvised security or reliance on hosted platforms that own your infrastructure, data, and wallet.
Z-Arch closes the gap cleanly by defining a fixed backend architecture in one small configuration file, zarch.yaml. Authentication, perimeter enforcement, and service trust are enforced automatically. This is not another AI code generator or agent “skill”. Nor is it a “codebase security scanner” that looks for problems only after they exist. It is proactive, deterministic software that constrains and guards your application code in the simplest manner possible. You describe your system in zarch.yaml. Z-Arch makes that description safe and guaranteed.
In practice this means:
All public traffic passes through a single enforced ingress plane.
No request is trusted, not even internally.
Requests that reach your business logic have already satisfied the authentication guarantees defined by the platform.
Scale-to-zero is the default economic behavior of every component.
Your entire backend architecture is described in an elegant and concise format.
With the architecture guaranteed, a single developer, or an LLM operating within the constraints of the Z-Arch MCP, can generate a complete and secure backend with ease. Business logic becomes the focus. Your AI saves context, you save peace of mind.
The Z-Arch Philosophy
Z-Arch is Post-SaaS software. It does not host your application. It does not store your data. It does not introduce a proprietary runtime you cannot leave. Everything deploys into your own GCP project. You own the infrastructure, the secrets, and the data. The architecture is opinionated by design, because the goal is correctness, repeatability, and security at minimal cost. But unlike alternative services, it never stands in your way. Z-Arch is anti lock-in.
Platform Architecture
When you bootstrap a project with Z-Arch, you will see a small set of platform-owned resources appear alongside your own services and jobs. These are the stable components Z-Arch uses to apply zarch.yaml, enforce the API perimeter, and keep your project convergent over time.
At a high level, Z-Arch separates the system into three concerns: deployment orchestration, the API perimeter, and your application workloads. Each concern has a dedicated component, and those components keep their responsibilities narrow.
Z-Arch Components
Control Plane
Z-Arch deploys a Control Plane job into your project. This is the execution surface that applies zarch.yaml and performs platform operations in a defined order. It is not part of your runtime request path. It exists so infrastructure and policy are applied deterministically rather than by ad-hoc scripts or manual console work.
Gateway
Z-Arch deploys the Gateway as a Cloud Run service. This is the single API perimeter for your backend. Requests are routed and authenticated here before reaching your services. Your services are not intended to be public ingress points.
Edge Proxy
Z-Arch deploys a Cloudflare Worker, the Edge Proxy, in front of the Gateway. The default edge exists to route traffic to the Gateway, provide cost-effective load balancing, apply rate limiting at the edge, and benefit from Cloudflare’s superior abuse absorption. It can inject client-specific API keys for attestation to the Gateway. The edge proxy is not where Z-Arch’s authentication model is enforced. The Gateway is.
Extension Runner
You may also see an Extension Runner job. Its presence is normal. It exists to execute configured extension lifecycle logic as part of Z-Arch operations.
Where Your Code Runs
Services
Your services are Cloud Run services that implement application logic. They sit behind the Gateway. In the intended topology, they receive requests that have already passed gateway enforcement.
Jobs
Your jobs are Cloud Run jobs for run-to-completion workloads that don’t need to serve HTTP traffic. They are invoked by schedulers, event flows, or directly by services and can interact with other resources according to declared targets.
Topics and Schedulers
Topics and schedulers are the event and time primitives that connect services and jobs without turning everything into direct synchronous calls. They are declared in zarch.yaml and deployed as part of the same convergent system.
Request Flow
Public ingress in the default topology is:
Client -> Edge Proxy -> Gateway -> Service
The operational boundary sequence is:
- Edge proxy validates client API-key posture and forwards traffic to the gateway.
- Gateway matches route and enforces declared auth mode (
dual,jwt-only, orpublic). - For protected routes, JWT validation and session enforcement complete before proxying.
- Gateway attaches short-lived internal trust assertions to the upstream call.
- Target service verifies S2S trust assertions before accepting the internal call.
Service-to-service calls follow the same trust model: callers mint short-lived assertions and receivers verify caller identity, intended audience, and token freshness.
Container boundary rule: protected traffic does not execute inside an application service container until perimeter checks have passed at the gateway.
Gateway Enforcement Model
The Gateway is the single API perimeter for a Z-Arch system. It exists so authentication and perimeter controls are enforced before any request reaches your application container.
Application services are not ingress points. The Gateway is.
Routing and Request Matching
The Gateway is the only component responsible for mapping incoming requests to backend services.
- Routes are declared in
zarch.yaml. - Path and method evaluation are deterministic.
- Literal and regex route styles are supported.
- Endpoint exposure is controlled per service.
If a route is not declared, it does not exist.
JWT Verification with OIDC
For authenticated flows, the Gateway validates end-user JWTs against your configured OIDC provider.
- OIDC discovery metadata is configurable.
- JWT signature and audience checks are enforced.
- Validation is based on OIDC/JWKS standards.
Services do not parse or validate end-user JWTs for public traffic. That enforcement occurs at the Gateway.
Session Enforcement and Dual Requirement
For protected routes using the dual auth mode, Z-Arch enforces two independent checks:
- A valid end-user JWT.
- A valid session cookie bound to that JWT.
Both must pass. If either fails, the request is rejected.
This enforcement occurs before traffic is forwarded to the target service. A request never executes inside the same container as your application logic unless it satisfies its declared auth mode.
Route Auth Modes
Auth mode is declared per route.
Available modes:
dual— JWT and session cookie required.jwt-only— JWT required, no session cookie.public— no end-user authentication required.
Even public routes remain inside the Gateway perimeter and are not directly exposed as independent services.
CORS and Security Headers
CORS policy and security headers are enforced at the Gateway level.
- Allowed origins, headers, and methods are centrally configured.
- Credential mode and cache windows are applied uniformly.
- Security headers are applied consistently across routes.
Services do not manage CORS logic. This prevents inconsistent browser-facing behavior across services.
Request Validation and Rate Controls
The Gateway enforces route-level operational controls:
- Required content type enforcement.
- Per-route rate limit declarations.
- Capability checks where
authzis configured.
These controls are applied before forwarding to services. Services do not implement their own rate limiting for public traffic unless explicitly required for domain reasons.
Internal Service Trust Propagation
When the Gateway forwards a request to a service, it attaches short-lived internal trust token.
Services verify this assertion with ZArchAuth.s2s.verify() to ensure the caller is the Gateway and that the request is intended for them.
What You Still Implement in Your Services
The Gateway does not make domain-level authorization decisions.
Services remain responsible for:
- Business authorization rules.
- Domain-specific input validation.
- Application error handling and observability.
- Data-layer constraints and integrity.
The Gateway enforces perimeter authentication and route policy. Application services enforce business logic.
Authentication and Access Model
OIDC Model
Z-Arch validates end-user JWTs using any OIDC-compliant provider.
Primary inputs:
authn.discovery: OIDC discovery URL.authn.client_id: expected JWT audience.
Managed Firebase bootstrap remains available, but runtime JWT validation is provider-agnostic under the same OIDC model.
JWT Validation and Session Binding
For authenticated routes, gateway policy enforces the configured route auth mode.
dual: both a valid JWT and a valid session cookie bound to that identity are required.jwt-only: a valid JWT is required; no session cookie requirement.public: no end-user JWT/session requirement.
Session behavior is governed by gateway/security configuration (same-site policy, TTL, and issued-at skew tolerance).
Client API Keys
Frontend clients are declared in clients and mapped to API-key identities stored in secret manager.
Each client includes:
idtype(web,ios,android)api_keysecret referenceadd_to_edgeto control edge-side API-key injection on the client segment
Capability Enforcement Status
Z-Arch includes authz capability and role structures with route-level requires checks. This model exists in the contract surface but remains an actively developing area.
Perimeter vs Domain Authorization
Perimeter authentication and route policy are enforced by the edge/gateway boundary. Domain authorization remains a service concern.
Serverless Primitives
Z-Arch breaks backend architecture into explicit primitives.
Services
Services are request/response workloads that can be exposed as API endpoints or kept internal.
Common use cases:
- Product API endpoints.
- Session and account services.
- Internal orchestration APIs.
Example:
services:
- id: orders
endpoint: true
authenticated: true
routes:
- path: /orders
method: GET
- path: /orders
method: POST
auth: dual
location: /services/orders
env: {}
flags: []
targets: [payments, order-events]Jobs
Jobs are run-to-completion workloads for asynchronous or batch execution.
Common use cases:
- Daily reconciliation.
- Data cleanup.
- Bulk import/export.
Example:
jobs:
- id: daily-reconcile
location: /jobs/daily-reconcile
env:
BATCH_SIZE: "500"
flags:
- "--task-timeout=1200s"
targets: [orders, reporting-events]Topics
Topics are passive event channels for publish/subscribe flows.
Common use cases:
- Event fan-out.
- Workflow decoupling.
- Retry-friendly async processing.
Example:
topics:
- id: reporting-events
sub:
- id: analytics
mode: push
ack_deadline_secs: 60Schedulers
Schedulers are time-based trigger resources that invoke services, run jobs, or publish messages.
Common use cases:
- Hourly sync tasks.
- Nightly cleanup.
- Weekly report generation.
Example:
schedulers:
- id: hourly-sync
cadence:
every: 1
unit: hour
timezone: "UTC"
targets: [daily-reconcile]Targets and Trust Edges
targets define the declared execution graph for a project.
- Service and job targets define callable relationships.
- Topic targets define event publish permissions.
- Scheduler targets define trigger actions.
Trust edges are explicit and deterministic:
- Short-lived assertions are minted only for declared call paths.
- Receivers validate caller identity and intended audience.
- Undeclared caller/target edges are outside the trust graph and are not accepted as valid internal calls.
This keeps topology reviewable and prevents hidden trust relationships from emerging outside zarch.yaml.
II - Operating Guides
Getting Started
Project Bootstrap
Create a new app:
zarch new appTypical guided flow:
- Choose new vs existing repository mode.
- Select template/source repository.
- Select cloud project and region details.
- Configure domain and edge deployment options.
- Configure authentication options.
- Apply bootstrap and initial deployment steps.
Existing Repository Bootstrap
You can bootstrap into an existing repository/branch instead of generating from a template repository.
zarch new app --not-new-repoUse this when:
- You already have an active codebase integrated with Z-Arch.
- You want to bootstrap a separate GCP project for different branches (e.g. prod, test)
- You are integrating an existing codebase with Z-Arch for the first time.
New Project from Any Template
You can point Z-Arch to any GitHub template repository that includes a valid zarch.yaml.
zarch new app --template-repo owner/template-repoThis enables:
- Company-standard starter templates.
- Shared community templates.
- Distributing / installing ready to deploy applications.
Sharing Templates
Recommended template practice:
- Keep a stable template repository with a clean
zarch.yamlbaseline. - Version template changes via standard branch/tag workflows.
- Document template assumptions in repository README.
- Bootstrap new projects by referencing that template repository directly.
Deployment Model
Z-Arch operates on a convergent deployment model. You declare intent in zarch.yaml; deployment applies that intent in a fixed order; repeated deployment converges toward the same runtime shape.
You do not manually orchestrate infrastructure components. The Control Plane applies resources in dependency order.
Deployment Model
When deploying a full project, resources are applied in this order:
- Topics
- Services
- Jobs
- Schedulers
- Gateway
- Edge updates
This ordering ensures:
- Event infrastructure exists before publishers or subscribers.
- Services and jobs exist before schedulers reference them.
- The Gateway reflects the current route definitions.
- Edge routing reflects the current Gateway endpoints.
Most workflows use:
zarch deploy allThis applies the full convergence cycle.
Anti-Pattern
- Manually wire load balancers.
- Manually configure IAM bindings for service trust.
- Manually synchronize route definitions between services and ingress.
- Deploy hidden infrastructure outside
zarch.yaml.
Operational changes are expressed in configuration. Deployment enforces alignment.
Convergence Principle
Repeated deployment of the same configuration produces the same infrastructure state.
Drift introduced outside Z-Arch is not part of the intended model. The Control Plane is responsible for re-aligning declared resources with runtime state.
Devbox
Overview
zarch devbox gives each developer a project-scoped cloud workspace that can be created, paused, resumed, and replaced on demand.
The core value is operational consistency: every developer starts from the same environment profile, tied to the same project controls, with a predictable access pattern.
For teams, devboxes provide:
- Faster onboarding with fewer machine-specific setup issues.
- Lower local environment drift across engineers.
- Better isolation between projects and developers.
- Safer experimentation with limited blast radius.
- Cost control through explicit start/stop lifecycle commands.
Why Ephemeral Environments Matter
Devboxes are intentionally disposable. If an environment becomes unstable, slow, or misconfigured, the preferred recovery path is to replace it quickly instead of spending hours on manual repair.
This model improves:
- Mean time to recovery for developer blockers.
- Reproducibility across engineers.
- Support handoff quality (shared, repeatable recovery workflow).
Quick Start
# 1) Set active project directory
zarch set project /path/to/your/zarch/project
# 2) Connect Cloudflare credentials (project scope recommended)
zarch connect cloudflare --project
# 3) Create devbox
zarch new devbox <username>
# 4) Connect over SSH using the alias printed by the CLI
ssh devbox-<project-id>-<usr3>Example:
zarch new devbox ram
ssh devbox-myproject-ramPrerequisites
Before creating or managing devboxes, ensure:
- You are in a valid Z-Arch project directory (
zarch.yamlexists). - Your active GCP project is configured and accessible.
domainis set inzarch.yaml.- Cloudflare credentials are connected through Z-Arch:
zarch connect cloudflare --project- or
zarch connect cloudflare --global
Required IAM capabilities typically include:
- Compute Engine instance management
- IAM service account and policy binding management
- Secret Manager secret and policy binding management
- Service API enablement permissions
First Login: Identity and Authorization (Recommended Standard)
After your first SSH login, it is highly recommended that you authenticate gcloud and ADC as your user identity.
Why this matters
The VM service account is intentionally limited for platform runtime operations.
Developer workflows (project administration, many gcloud actions, and parts of Z-Arch usage) should run under the developer’s user identity for correct authorization and auditability.
Run this once per new devbox
gcloud auth login
gcloud auth application-default login
gcloud config set project <project_id>Verify
gcloud auth list
gcloud auth application-default print-access-tokenExpected outcome:
- Your interactive CLI/API calls execute as your user identity.
- You avoid common
PERMISSION_DENIEDfailures caused by relying on the VM service account for user workflows.
Default Tooling Baseline
New devboxes are delivered with a platform baseline of developer tooling.
This baseline combines platform-provisioned tools and Ubuntu image utilities.
Platform baseline tools
The following tools are expected in a newly provisioned devbox:
zarch(Z-Arch CLI; platform standard)gcloud(Google Cloud CLI)gitgh(GitHub CLI)curljqbinutilsbuild-essentialdockerand Docker Compose plugin- Node.js (LTS) and
npm wranglerfirebaseCLI- Go (
go, installed viagolang-go) - Rust (
rustcandcargo) codexCLIclaudeCLIgeminiCLIplaywrightandplaywright-mcp- Miniforge/Conda-based Python environment tools
Ubuntu base image utilities
In addition to platform tooling, Ubuntu provides core operational utilities commonly used for day-to-day engineering tasks, including:
- shell and core GNU tooling (
bash, coreutils) - package management (
apt) - native build and linker tooling (
binutils,build-essential) - language toolchains installed from Ubuntu packages (
golang-go,rustc,cargo) - service management (
systemctl/systemd) - SSL and crypto utilities (
ca-certificates,gnupg) - common Linux networking and process utilities
Validation checklist (recommended)
After first login, validate your baseline:
command -v zarch gcloud git gh curl jq docker node npm wrangler go rustc cargoIf a required tool is missing, follow your team’s custom provisioning approach in the next section.
Command Reference
| Command | Purpose |
|---|---|
zarch new devbox [username] | Create and configure a new developer VM. |
zarch devbox list | List all devboxes in the active project. |
zarch devbox on <username> | Start a devbox VM by username. |
zarch devbox off <username> | Stop a devbox VM by username. |
zarch devbox delete <username> | Delete a devbox VM by username. |
zarch devbox delete <username> --delete-service-account | Delete VM and associated service account. |
Custom Provisioning with an Appended Startup Script
If your team needs additional packages, internal CLIs, or project-specific setup, you can provide an additional startup script during devbox creation.
How it works
During zarch new devbox <username> (interactive mode), Z-Arch prompts for:
Additional startup script path (optional)
Provide a local script file path. Z-Arch will append and run that script as part of devbox provisioning.
Recommended usage pattern
- Store team customization scripts in source control (for example:
devbox/custom-startup.sh). - Keep scripts idempotent so reruns are safe.
- Use explicit version pins for critical toolchains.
- Log clearly to simplify support and audits.
Example flow
zarch new devbox alice
# When prompted:
# Additional startup script path (optional): ./devbox/custom-startup.shImportant notes
- The script is applied at provisioning time for that devbox creation.
- To apply a changed script to an existing environment, the recommended pattern is recreate:
zarch devbox off <username>zarch devbox delete <username>zarch new devbox <username>(with updated script path)
Naming and Access Conventions
For username alice in project my-project:
- VM name:
devbox-alice - Service account:
devbox-alice-sa@my-project.iam.gserviceaccount.com - Forward domain:
alice.dev.<domain> - Local SSH alias format:
devbox-<project-id>-<usr3>
Z-Arch updates local SSH configuration on the machine running zarch, enabling alias-based connection.
Ephemeral Operations Model
Standard policy
- One devbox per developer per project.
- Prefer fast replacement over prolonged repair when the environment is degraded.
- Treat devbox state as recoverable and reproducible.
Reset runbook (recommended for broken environments)
zarch devbox off <username>
zarch devbox delete <username>
zarch new devbox <username>Then reconnect:
ssh devbox-<project-id>-<usr3>Operational Runbooks
Daily startup / shutdown
Start your environment:
zarch devbox on <username>Stop when idle:
zarch devbox off <username>Environment recovery (nuke and recreate)
Use this when troubleshooting exceeds a reasonable threshold:
zarch devbox off <username>
zarch devbox delete <username>
zarch new devbox <username>Developer offboarding
Remove compute resource:
zarch devbox delete <username>If identity should also be removed:
zarch devbox delete <username> --delete-service-accountIncident triage flow
- Verify VM state:
zarch devbox list- Verify reachability:
- SSH alias in
~/.ssh/config - Forward domain resolves as expected
- If unresolved after basic checks, escalate to reset runbook.
Team Operating Model
Recommended team practices:
- Use stable, unique usernames per developer.
- Define ownership boundaries: one devbox equals one responsible engineer.
- Use start/stop discipline for cost management.
- Standardize recovery on replace-over-repair for severe drift.
- Use project-scoped credentials wherever possible for governance clarity.
Security and Identity Notes
- Devboxes use least-privilege runtime identity for platform-controlled operations.
- Developer-admin actions should use authenticated user identity (see ADC section).
- Runtime secret access is managed through project controls, not local plaintext artifacts.
- Devbox lifecycle actions do not require adding new fields to
zarch.yaml.
Troubleshooting
| Symptom | Likely Cause | Resolution | Escalation |
|---|---|---|---|
No valid zarch.yaml found in this project directory. | Wrong working directory or project not set | zarch set project /path/to/project | Confirm repo contains correct zarch.yaml |
Cloudflare token not found for this project | Cloudflare credentials not connected for active scope | zarch connect cloudflare --project (or --global) | Validate active project context and retry |
PERMISSION_DENIED running gcloud/Z-Arch in VM | Running as limited VM identity instead of user | Run first-login identity steps (gcloud auth login, gcloud auth application-default login, set project) | Verify org/project IAM grants for user |
Authenticated as VM service account when user actions are expected | User auth/ADC not initialized | Re-run user auth + ADC commands | Recheck gcloud auth list output |
| SSH alias fails to connect | VM is off, alias missing, or DNS not yet updated | zarch devbox list, zarch devbox on <username>, verify ~/.ssh/config alias | If unresolved, run reset workflow |
| Devbox repeatedly unstable after manual fixes | Environment drift or corrupted local state | Execute reset runbook (delete + recreate) | Escalate with logs and issue context |
Success Criteria
Your devbox adoption is working well when:
- New developers can reach a usable environment quickly.
- Most severe environment failures are resolved by fast recreate cycles.
- Daily operations use
on/offpredictably for cost control. - Teams see reduced “works on my machine” inconsistency.
- User identity authorization in devboxes is consistently configured.
Day-2 Operations
Incremental Changes
Z-Arch supports targeted deployments. You do not redeploy the entire system for every change.
Changing a Service
If you modify:
- Service code
- Environment variables
- Declared targets
You redeploy that service:
zarch deploy service <name>If you modify routes for that service, the Gateway must also be redeployed so route definitions remain aligned.
Changing Gateway Behavior
If you modify:
- Route auth modes
- Session configuration
- CORS configuration
- Security settings
You redeploy the Gateway:
zarch deploy gatewayIf rotating session encryption keys:
zarch deploy gateway --rotate-session-keySession rotation invalidates existing sessions by design.
Client Identity Changes
Client API keys can be regenerated independently:
zarch client all
zarch deploy edgeRemoving an API key’s name from the api_key field in a client’s config before running zarch client all will rotate the secret.
Troubleshooting Guide
Common issues and checks:
- Active project not set.
- Run
zarch set project <path>.
- Run
- Cloud project not active.
- Verify cloud CLI auth and active project selection.
- Deploy command reports missing resource ID.
- Confirm
idexists in the relevant block.
- Confirm
- Route validation errors.
- Verify path format and duplicate method/path keys.
- CORS failures in browser clients.
- Verify
security.cors.allowed_origins, headers, methods, and credential settings.
- Verify
- Auth failures on protected endpoints.
- Check route
authmode and your JWT/session flow assumptions.
- Check route
Multi-Region Model
Use regions in zarch.yaml to declare multi-region intent, then apply region-aware edge load-balancing settings.
Regions are intended to be convergent. Divergent per-region code paths and undeclared regional drift are outside the intended model.
In practice:
- Deploy intent remains centralized in one
zarch.yamlcontract. - Gateway and service shape are expected to remain structurally consistent across declared regions.
- Edge load-balancing policy determines request distribution across those converged regional deployments.
Security Posture and Shared Responsibility
Z-Arch guarantees perimeter authentication enforcement at the edge/gateway boundary according to declared route policy.
Z-Arch does not guarantee domain authorization correctness inside your application services.
Z-Arch does not protect against incorrect business logic, unsafe data handling, or domain-level policy mistakes in service code.
Security best-practice baseline:
- Keep
zarch.yamldeclarative and free of sensitive values. - Prefer explicit origin allowlists for CORS.
- Keep protected routes on
dualunless there is a deliberate reason to usejwt-only. - Treat API keys as perimeter credentials and rotate them periodically.
- Keep service
targetsminimal and intentional. - Use short-lived internal trust tokens and avoid custom long-lived shared secrets between services.
- Always keep secrets in Secret Manager.
- Call third-party APIs from backend services, not directly from untrusted clients where avoidable.
III - Reference Manuals
CLI Reference
Global Usage
zarch [--non-interactive] [--quiet] <command>Run with no command to open the interactive console:
zarchBehavior:
zarchwith no subcommand opens the interactive shell.--non-interactiveavoids prompts and relies on explicit flags and existing config values.--quietsuppresses most non-critical output.
Command groups:
| Group | Purpose |
|---|---|
set | Active project and Control Plane source pinning |
new | App bootstrap and Devbox creation |
deploy | Resource deployments |
devbox | Devbox lifecycle operations |
client | API client identity and key generation |
connect / disconnect | Third-party credential management |
ext | Extension scaffolding, install, and hook triggering |
mcp | Local MCP server for controlled config operations and AI assisted development |
update | Update the Control Plane to the latest version |
register | Register your project with a lifetime Z-Arch license |
set
Project and Control Plane pinning commands.
zarch set project [path]
zarch set branch <branch>
zarch set repo <owner/repo>
zarch set defaultsUsage notes:
set projectactivates a local directory that containszarch.yaml.set branchchanges the branch pinned for Control Plane operations.set repochanges the repository pinned for Control Plane operations.set defaultsis reserved for default preference management.
new
Create new platform resources.
zarch new app [options] [path]
zarch new devbox [username]new app options:
| Option | Purpose |
|---|---|
--region | Initial deployment region |
--project-id-prefix | Prefix for generated cloud project IDs |
--template-repo | Source template repository (owner/repo) |
--new-repo | New repository name |
--private-repo | Create repository as private |
--signup-mode | Signup policy |
--auth-methods | Managed auth methods |
--mfa | MFA mode |
--billing-account | Billing account override |
--not-new-repo | Use an existing repository |
Examples:
zarch new app --template-repo my-org/my-template
zarch new app --region us-east1 --project-id-prefix prod-
zarch new app --not-new-repo
zarch new devbox alicedeploy
Deploy one or more declared resources.
zarch deploy gateway [--region] [--rotate-session-key]
zarch deploy service [--region] <name>
zarch deploy job [--region] <name>
zarch deploy topic [--region] <name>
zarch deploy scheduler [--region] <name>
zarch deploy edge
zarch deploy all [--region]Usage notes:
- If
--regionis omitted, deploy commands use iterate through all regions declared in config. deploy allapplies the full deployment sequence for declared resources.--rotate-session-keyon gateway deployment rotates session encryption key material and forces new sessions.
Examples:
zarch deploy service orders
zarch deploy gateway --region us-east1 --rotate-session-key
zarch deploy alldevbox
Manage Devbox lifecycle.
zarch devbox list
zarch devbox on <username>
zarch devbox off <username>
zarch devbox delete <username> [--delete-service-account]client
Manage API clients and keys.
zarch client new <type> <id> [--no-edge] [--firebase]
zarch client all [--firebase]Usage notes:
- Client
typefor validatedzarch.yamlconfiguration isweb,ios, orandroid. --no-edgeprevents automatic key injection setup at the Edge Proxy layer.--firebaseprovisions provider-side client configuration for managed auth flows.
Examples:
zarch client new web web-app
zarch client all --firebaseconnect and disconnect
Manage third-party credentials in bootstrap/global or project scope.
zarch connect github [--project|--global]
zarch connect cloudflare [--project|--global]
zarch disconnect github [--project|--global]
zarch disconnect cloudflare [--project|--global]Scope behavior:
--projectstores credentials for the active project.--globalstores bootstrap credentials used as fallback when project-scoped credentials are not set.
ext
Manage Z-Arch Extensions.
zarch ext new <name>
zarch ext install <source> [--all] [--editable]
zarch ext trigger <hook_name> [--extension <name>] [--region <region>]Usage notes:
ext newscaffolds a new extension package.ext install --allinstalls all discoverable extensions in the active project.ext triggermanually dispatches a lifecycle hook for testing and operations.
mcp
Start the local MCP server.
zarch mcpExposed MCP tools are focused on safe, validated config operations and scaffolding workflows. You do not need to manually invoke this command.
update
Update the deployed Control Plane environment.
zarch updateregister
Purchase a Z-Arch license. Scope = license type.
zarch register [scope]zarch.yaml Reference
zarch.yaml is the source of truth for your platform architecture:
- It defines what exists.
- It defines what is exposed.
- It defines how resources are allowed to interact.
Compressed Architectural Language
zarch.yaml is not just configuration. It is a compact mapping of your backend architecture.
In one machine-legible document, it encodes:
- Identity and authentication boundaries.
- Authorization capabilities and roles.
- IAM-relevant topology via explicit resource targets.
- Infrastructure primitives (Services, Jobs, Topics, Schedulers).
- Gateway exposure and route behavior.
- Extension declarations and deployment controls.
- Operational guardrails and runtime policy defaults.
Core architectural intent is centralized and structured. This compactness is useful for both engineers and LLMs:
- Humans can review architecture intent quickly without chasing hidden dashboard state.
- The grammar is small enough for LLMs to reason over the full backend topology without burning tokens or hallucinating.
Z-Arch is therefore opinionated, structured, and machine-reasonable by design: a deterministic serverless pattern expressed in a concise file that is practical for both production operations and AI-assisted engineering.
Top-Level Contract
Top-level keys:
| Key | Required | Type | Notes |
|---|---|---|---|
platform | Yes | string | Currently gcp |
domain | Yes | `string | null` |
clients | Yes | array | Frontend/API clients |
gateway | Yes | `object | null` |
authn | Yes | `object | null` |
security | Yes | `object | null` |
project_id | No | `string | null` |
firebase_project_id | No | `string | null` |
firebase_tenant_id | No | `string | null` |
regions | No | array[string] | Deployment regions |
edge | No | `object | null` |
authz | No | `object | null` |
services | No | array | Service resources |
jobs | No | `array | null` |
topics | No | `array | null` |
schedulers | No | `array | null` |
Gateway and Security Blocks
Gateway block:
| Field | Type | Allowed Values / Behavior |
|---|---|---|
gateway.type | `string | null` |
gateway.version | `string | null` |
gateway.min_instance | integer | 0+ |
gateway.session.service_id | `string | null` |
gateway.session.stateful | boolean | Stateful session toggle |
Security block:
| Field | Type | Allowed Values / Behavior |
|---|---|---|
security.session_samesite | string | Strict, Lax, None |
security.session_ttl_secs | integer | Session lifetime seconds, 0+ |
security.iat_skew_secs | integer | JWT issued-at skew tolerance, 0+ |
security.cors.allowed_origins | array[string] | Explicit allowed origins |
security.cors.allowed_headers | array[string] | Allowed request headers |
security.cors.allowed_methods | array[string] | GET, POST, PUT, PATCH, DELETE, OPTIONS, HEAD, TRACE, CONNECT |
security.cors.expose_headers | array[string] | Headers exposed to browser JS |
security.cors.allow_credentials | boolean | Credentialed browser requests |
security.cors.max_age_seconds | integer | Preflight cache duration |
Auth Blocks
authn supports two primary models:
- Managed auth bootstrap path (
auth_methods,signup_mode,mfa). - Generic OIDC validation path (
discovery,client_id).
authn field values:
| Field | Type | Allowed Values / Behavior |
|---|---|---|
authn.discovery | `string | null` |
authn.client_id | `string | null` |
authn.auth_methods | array[string] | Email/Password, Email Link, Google, Github, Microsoft, Apple, Facebook |
authn.signup_mode | `string | null` |
authn.mfa | `string | null` |
authz supports:
capabilitiesrolesroles.<role>.grants
Use requires on routes for capability checks where applicable.
Resource Blocks
clients:
| Field | Type | Allowed Values / Behavior |
|---|---|---|
id | string | Unique client ID |
type | string | web, ios, android |
api_key | `string | null` |
add_to_edge | boolean | Edge-side key injection toggle |
services:
| Field | Type | Allowed Values / Behavior |
|---|---|---|
id | string | Unique service ID |
endpoint | boolean | Expose via Gateway |
authenticated | boolean | Default route auth requirement |
routes | array | Route definitions |
location | `string | null` |
image | `string | null` |
env | object | Runtime env vars |
flags | array[string] | Additional deploy flags |
targets | array[string] | Allowed downstream targets |
location and image are mutually exclusive in intended usage.
Service route fields:
| Field | Type | Allowed Values / Behavior |
|---|---|---|
path | string | Literal (/path) or regex (^/path$) |
method | string | HTTP verb |
mode | string | http, stream, webhook, websocket |
auth | string | dual, jwt-only, public |
content_type | string | Required request content type |
rate | integer | Route limit, minimum 1 |
requires | array[string] | Capability checks |
health | boolean | Health endpoint designation |
jobs:
| Field | Type | Allowed Values / Behavior |
|---|---|---|
id | string | Unique job ID |
location | `string | null` |
image | `string | null` |
env | object | Runtime env vars |
flags | array[string] | Additional deploy flags |
targets | array[string] | Callable/publish targets |
topics:
| Field | Type | Allowed Values / Behavior |
|---|---|---|
id | string | Unique topic ID |
sub | `array | null` |
Subscriber entry forms:
- String form: subscriber ID.
- Object form:
id, optionalmode(push/pull), optionalack_deadline_secs, optionalmax_delivery_attempts, optionaldead_letter_topic.
schedulers:
| Field | Type | Allowed Values / Behavior |
|---|---|---|
id | string | Unique scheduler ID |
cadence | `object | null` |
cron | `string | null` |
paused | boolean | Pause state |
targets | array[string] | Trigger targets |
Scheduler rule:
- Exactly one of
cadenceorcronmust be set.
Cadence fields:
everyunit(minute,hour,day,week,month)- optional
at - optional
on - optional
timezone
Full Example
platform: gcp
project_id: my-project
firebase_project_id: my-project
domain: api.example.com
regions:
- us-east1
edge:
load_balancer:
LB_METHOD: DIRECT
rate_limiter:
global_lockout: true
rate: 60
window: 10
cooldown: 300
clients:
- id: web
type: web
api_key: API_KEY_WEB
add_to_edge: true
gateway:
type: serverless
min_instance: 0
session:
service_id: session
stateful: false
authn:
discovery: https://issuer.example.com/.well-known/openid-configuration
client_id: my-client-id
auth_methods: []
signup_mode: null
mfa: null
authz:
capabilities:
- orders:read
- orders:write
roles:
admin:
grants:
- orders:read
- orders:write
security:
session_samesite: Lax
session_ttl_secs: 1209600
iat_skew_secs: 60
cors:
allowed_origins:
- https://app.example.com
allowed_headers:
- Authorization
- Content-Type
- x-api-key
allowed_methods:
- GET
- POST
- PUT
- PATCH
- DELETE
expose_headers: []
allow_credentials: true
max_age_seconds: 600
services:
- id: session
endpoint: true
authenticated: true
routes:
- path: /session
method: GET
- path: /session/login
method: POST
auth: jwt-only
- path: /session/logout
method: POST
- path: /session/health
method: GET
auth: public
health: true
location: /services/session
env: {}
flags: []
targets: []
- id: orders
endpoint: true
authenticated: true
routes:
- path: /orders
method: GET
requires: [orders:read]
- path: /orders
method: POST
requires: [orders:write]
- path: ^/orders/[a-zA-Z0-9_-]+$
method: GET
location: /services/orders
env: {}
flags: []
targets: [order-events]
jobs:
- id: nightly-sync
location: /jobs/nightly-sync
env: {}
flags:
- "--task-timeout=1200s"
targets: [orders]
topics:
- id: order-events
sub:
- id: orders
mode: push
schedulers:
- id: nightly-sync-scheduler
cron: "0 2 * * *"
paused: false
targets: [nightly-sync]Route Semantics
Path Semantics
Path rules:
- Literal paths must start with
/and contain no whitespace. - Regex paths are opt-in and must start with
^/and end with$.
Method rules:
- Methods must be valid HTTP verbs supported by the config contract.
Uniqueness rules:
- Duplicate route keys are rejected within the same resource for the same method/path kind.
- Resource IDs must be unique across major resource groups.
authz.roles.*.grantsvalues must exist inauthz.capabilities.
Auth Modes
Route auth mode is declared per route:
dual- JWT and session cookie required.jwt-only- JWT required, no session cookie.public- no end-user authentication required.
public routes remain behind gateway perimeter controls and are not direct service ingress endpoints.
Capability Requirements
Where authz is configured, route requires declarations enforce capability checks as part of gateway policy evaluation.
Conditional Requirements (required_when)
Route fields follow conditional requirement semantics in the config contract:
pathandmethodare required for each route entry.auth,content_type,rate,requires, andhealthare conditional policy modifiers.- Conditional requirement state is represented in schema metadata as
required_when.
Runtime Effect Notes
- Route matching and auth-mode enforcement occur at the gateway before forwarding.
- For protected routes, failed JWT/session checks reject the request before service execution.
- For internal calls, services verify short-lived S2S assertions to enforce caller/target trust constraints.
Edge Proxy Reference
Custom Edge Implementations
You can provide your own Edge Proxy implementation, as long as it satisfies the platform contract expected by deployment and runtime flows.
Deployment Assumptions
A compatible Edge Proxy should:
- Sit in front of all public API ingress.
- Enforce API-key checks for incoming traffic.
- Forward requests to the Z-Arch Gateway endpoint.
- Preserve request method, path, and required auth headers.
- Support forwarding cookies for authenticated browser flows.
- Prevent direct bypass patterns where Gateway endpoints are exposed outside intended routing.
Default Edge Implementation
The Z-Arch Edge Proxy Worker operates at the Cloudflare edge as the central routing, access-control, and traffic moderation layer for all /api/* requests. It performs authentication, configurable load-balancing, and includes a built-in per-IP rate-limiting system to protect backend services.
Logic Overview
- Parses incoming
/api/*request paths - Optionally detects an explicit API version segment (
/api/v1/...) - Resolves the client API key:
- If a client segment is present and
add_to_edgeis enabled, injects the corresponding client API key - Otherwise expects the client to supply
x-api-keydirectly
- If a client segment is present and
- Strips the client segment from the path (edge-only concern)
- Preserves the API version segment if present
- Applies configurable rate limits before forwarding
- Determines backend origin using the selected load-balancing mode
- Proxies the request to the chosen Z-Arch Gateway
Environment Variables
| Variable | Description |
|---|---|
GW_URL | Default backend gateway URL used in DIRECT mode |
API_KEY_<CLIENT> | Client-specific API keys (e.g. API_KEY_WEB, API_KEY_ADMIN) |
LB_METHOD | Load-balancing mode: DIRECT, RANDOM, ROUND_ROBIN, REGION, or PING |
LB_TARGETS | Comma-separated gateway URLs used in RANDOM, ROUND_ROBIN, or PING modes |
LB_REGION_MAP | JSON map of country or colo codes to gateway URLs for REGION mode |
LOADBALANCER_KV | KV binding used for stateful load-balancer operations (round-robin counter, latency cache) |
RATELIMIT_KV | Optional KV binding for global lockout flags when global_lockout is enabled |
RL_CONFIG | JSON configuration object defining rate-limiting behavior (see below) |
PROJECT_NAME | Project identifier for metadata/logging (optional) |
Rate Limiting
Overview
The Worker includes a built-in per-IP rate limiter that executes before any other logic. In all cases, request counting is in-memory (per isolate / per PoP). Optionally, KV can be used only for global lockout flags.
| Mode | Counter Storage | Global Lockout Storage | Description |
|---|---|---|---|
Local-only (global_lockout: false) | Cloudflare isolate memory | None | Fast local burst/cooldown enforcement without any KV reads or writes. |
Local + global lockout (global_lockout: true) | Cloudflare isolate memory | Cloudflare KV (RATELIMIT_KV) flag | Checks a global throttle flag before counting and writes a throttle flag on local limit trip. |
Configuration
The rate limiter is controlled via the RL_CONFIG environment variable:
{
"global_lockout": true,
"rate": 60,
"cooldown": 300,
"window": 10
}| Field | Type | Description |
|---|---|---|
global_lockout | boolean | Enable (true) or disable (false) KV global lockout flags. Counting remains in-memory either way. |
rate | integer | Requests per time window allowed per IP. Default: 60. |
cooldown | integer | Duration (seconds) for which an IP is throttled after exceeding limit. Default: 300. |
window | integer | Size of the rate-limiting window in seconds. Default: 10. Shorter windows (10–30s) make the limiter more responsive to bursts. |
Behavior Summary
- Each request checks the IP’s usage bucket, grouped by the configured
windowinterval. - Request counting always happens in isolate memory.
- If
global_lockoutistrue, the Worker checksRATELIMIT_KVfor a throttle flag before local counting. - If local counting trips and
global_lockoutistrue, the Worker writes a KV throttle flag withTTL = cooldown. - If
global_lockoutisfalse, no rate-limiter KV reads/writes are performed. - Exceeding the limit returns HTTP 429 Too Many Requests.
- All limiter operations are wrapped in try/catch and never block normal execution on error.
Load-Balancing Behavior
If rate limiting passes, traffic is distributed using one of the following strategies:
| Mode | Description | Required Variables |
|---|---|---|
| DIRECT | Sends all traffic to the primary gateway (GW_URL). | GW_URL |
| RANDOM | Randomly selects one origin from LB_TARGETS. | LB_TARGETS |
| ROUND_ROBIN | Sequentially rotates through origins, maintaining index in LOADBALANCER_KV. | LB_TARGETS, LOADBALANCER_KV |
| REGION | Routes requests based on Cloudflare’s cf-ipcountry or PoP (cf.colo). | LB_REGION_MAP |
| PING | Probes origins to measure latency, caching the fastest per Cloudflare PoP in LOADBALANCER_KV. | LB_TARGETS, LOADBALANCER_KV |
Example Configurations
1. Round-Robin Example
GW_URL=https://us-east1-gateway.example.net
API_VERSION=v1
LB_METHOD=ROUND_ROBIN
LB_TARGETS=https://us-east1-gateway.example.net,https://europe-west1-gateway.example.net2. Region-Based Example
LB_METHOD=REGION
LB_REGION_MAP={"US":"https://us-east1-gateway.example.net","EU":"https://europe-west1-gateway.example.net","DEFAULT":"https://us-east1-gateway.example.net"}3. Latency-Based Example (PING)
LB_METHOD=PING
LB_TARGETS=https://us-east1-gateway.example.net,https://europe-west1-gateway.example.net,https://asia-southeast1-gateway.example.net4. Rate Limiter Example (Global lockout enabled)
RL_CONFIG={"global_lockout":true,"rate":60,"cooldown":300,"window":30}Client API Keys
Each client defined in zarch.yaml has a dedicated API key stored in Secret Manager.
Client API keys are exposed to the edge proxy only when add_to_edge: true is set for that client.
Example:
API_KEY_WEB="key-for-web-client"
API_KEY_ADMIN="key-for-admin-client"If add_to_edge is disabled for a client, its API key is not exposed to the edge proxy. In that case, the client must supply its API key directly via the x-api-key request header.
Updates take effect immediately via the Cloudflare API—no redeploy required.
Behavior Summary
| Request Path | API Key Source | Forwarded Path |
|---|---|---|
/api/web/session | Injected (WEB) | /api/session |
/api/v1/web/session | Injected (WEB) | /api/v1/session |
/api/session | Client-supplied | /api/session |
/api/v1/session | Client-supplied | /api/v1/session |
Client Segment Semantics
The client path segment exists solely to support edge-side API key injection.
Rules:
- A client segment (
/api/{client}/...) is required only when the edge proxy is injecting the API key (e.g.add_to_edge: true). - When present, the client segment is stripped before forwarding to the gateway.
- The backend never receives or routes on the client segment.
- If the edge is not injecting, the client segment must not appear in the path and the API key must be supplied via
x-api-key.
Summary
The Z-Arch Edge Proxy provides a secure, scalable, and configurable edge layer that:
- Authenticates API requests using global or per-client keys
- Rewrites requests with API versioning
- Distributes traffic via multiple load-balancing modes (
DIRECT,RANDOM,ROUND_ROBIN,REGION,PING) - Enforces per-IP rate limiting (
RL_CONFIG) with in-memory counters and optional global KV lockout flags - Allows configurable time windows (
window) for precise control over limiter sensitivity - Uses KV bindings (
LOADBALANCER_KV, optionalRATELIMIT_KV) for efficient, stateful coordination - Operates entirely at the Cloudflare edge for maximum speed, safety, and zero backend coupling
Z-Arch Runtime Library
Z-Arch Runtime Library (zarch) provides the authentication primitives (ZArchAuth) used by services running within the Z-Arch architecture. It exposes a stable Python API for encrypted session cookies, service-to-service trust, and the ZArchExtension interface with lifecycle hooks used during bootstrap and deployment workflows with the Z-Arch CLI.
Quick Start: ZArchAuth
Use ZArchAuth in real services to:
- run the session service endpoints (
/session,/session/login,/session/logout,/session/verify) - let the Z-Arch Gateway own end-user auth verification in normal Z-Arch deployments
- add optional session hooks for revocation, backend session control, and
/sessionresponse enrichment
from zarch import ZArchAuth
auth = ZArchAuth()
# Session service entrypoint.
# In a standard Z-Arch deployment, the gateway validates JWT + session cookie
# before protected traffic reaches your business services.
app = auth.session.start()Common deployment pattern:
- Keep the session service separate from business services.
- Let Z-Arch Gateway enforce end-user auth; app services focus on business logic.
- Use
ZArchAuth.s2s.sign(...)andZArchAuth.s2s.verify(...)for internal service trust. - Use direct
ZArchAuth.session.verify(...)in application code only for custom/non-standard topologies.
Session Mode: Stateless by Default
Session cookies are stateless by default. If no hooks are registered, cryptographic cookie validation is enough and /session/verify defaults to valid after payload checks.
To enable stateful behavior (revocation, server-side deny lists, tenant-specific controls) and enrich the /session response, register these hooks:
on_login(sid, uid, tenant, iat, exp)to persist session stateon_logout(sid, uid, tenant)to revoke stateon_verify(sid, uid, tenant, iat, exp) -> boolto allow/deny each sessionon_session(uid, email, tenant, sid, iat, exp, claims) -> dict | Noneto add or override fields returned by/session
Any dict returned by on_session is merged after the default uid/email/tenant fields, so matching keys override the built-in values.
For on_session, sid, iat, and exp come from the current encrypted session cookie when it is available and valid; otherwise they are None.
Real-world pattern:
- hash
sidbefore storage - persist session records on login
- mark
revoked_aton logout (idempotent) - deny in
on_verifywhen revoked, missing, or expired - decorate
/sessionwith app-specific profile metadata when needed
from zarch import ZArchAuth
from google.cloud import firestore
from datetime import datetime, timezone
import hashlib
import time
auth = ZArchAuth()
db = firestore.Client()
def _hash_sid(sid: str) -> str:
return hashlib.sha256(sid.encode()).hexdigest()
def on_login(sid: str, uid: str, tenant: str | None, iat: int, exp: int) -> None:
db.collection("zarch_sessions").document(_hash_sid(sid)).set({
"uid": uid,
"tenant": tenant,
"created_at": datetime.fromtimestamp(iat, tz=timezone.utc),
"expires_at": datetime.fromtimestamp(exp, tz=timezone.utc),
"revoked_at": None,
})
def on_logout(sid: str, uid: str, tenant: str | None) -> None:
db.collection("zarch_sessions").document(_hash_sid(sid)).set({
"revoked_at": datetime.now(tz=timezone.utc),
}, merge=True)
def on_verify(sid: str, uid: str, tenant: str | None, iat: int, exp: int) -> bool:
doc = db.collection("zarch_sessions").document(_hash_sid(sid)).get()
if not doc.exists:
return False
data = doc.to_dict()
if data.get("revoked_at") is not None:
return False
expires_at = data.get("expires_at")
return bool(expires_at and expires_at.timestamp() >= time.time())
def on_session(
uid: str,
email: str | None,
tenant: str | None,
sid: str | None,
iat: int | None,
exp: int | None,
claims: dict,
) -> dict | None:
return {
"display_name": claims.get("name") or email or uid,
"has_active_session": bool(sid),
}
auth.session.register_hook("on_login", on_login)
auth.session.register_hook("on_logout", on_logout)
auth.session.register_hook("on_verify", on_verify)
auth.session.register_hook("on_session", on_session)
app = auth.session.start()Service-to-Service (S2S) Mechanics
ZArchAuth.s2s exists so internal calls are explicitly authorized at the application-policy layer, not just network-reachable.
When you call auth.s2s.sign(req, target, url=...):
- Z-Arch always adds
x-zarch-s2s-token: <jwt>(short-lived Ed25519 JWT withiss,aud,iat,exp,typ). - On GCP (
ZARCH_PLATFORM=gcp) and whenurlis provided, it also addsAuthorization: Bearer <google-id-token>.
Why both are used in GCP deployments:
- The Google ID token is the stronger platform authentication mechanism (Cloud Run/IAM identity boundary).
- The Z-Arch token is a service-authorization policy mechanism (enforces caller identity, audience, and Z-Arch trust graph rules).
- Using both gives layered control: Google proves caller identity to the platform, Z-Arch enforces project policy at the service layer.
The Z-Arch token can also be used by itself in non-GCP or non-IAM topologies. In that mode, services still get signed caller identity and audience/policy checks without requiring Google-authenticated services.
Caller example (adds both headers on GCP):
import json
import urllib.request
from zarch import ZArchAuth
auth = ZArchAuth()
def call_orders_service() -> dict:
req = urllib.request.Request(
"https://orders-abc-uc.a.run.app/internal/create",
data=json.dumps({"sku": "A-100", "qty": 1}).encode("utf-8"),
method="POST",
headers={"Content-Type": "application/json"},
)
# Always injects x-zarch-s2s-token.
# On GCP + url provided, also injects Authorization: Bearer <google-id-token>.
auth.s2s.sign(req, target="orders", url="https://orders-abc-uc.a.run.app")
with urllib.request.urlopen(req, timeout=30) as resp:
return json.loads(resp.read().decode("utf-8"))Receiver example (verifies Z-Arch policy token):
from flask import Flask, jsonify, request
from zarch import ZArchAuth
app = Flask(__name__)
auth = ZArchAuth()
@app.post("/internal/create")
def create_order():
# Verifies x-zarch-s2s-token signature, audience, issuer trust, and freshness.
claims = auth.s2s.verify(request)
caller_service = claims["iss"]
# Cloud Run IAM / Google token auth (if enabled) is evaluated by platform/gateway.
return jsonify({"ok": True, "caller": caller_service}), 200S2S verification data is deployment-derived and local at runtime:
SERVICE_ID: current service identity.S2S_PUBLIC_KEYS_JSON: trusted caller public keys by service ID.S2S_ALLOWED_TARGETS: mint-time policy for where this service may call.
Security Model Summary
- Session cookies are encrypted and stateless by default; stateful controls are added explicitly via
on_login,on_logout, andon_verifyhooks, while/sessionresponse enrichment is opt-in viaon_session. - In the intended Z-Arch platform flow, gateway/session components handle end-user auth so application services do not need to implement cookie auth logic directly.
ZArchAuth.s2s.sign(...)andZArchAuth.s2s.verify(...)enforce short-lived signed service-to-service trust with explicit caller/target validation.- Auth helpers fail closed: invalid, expired, tampered, or unauthorized credentials raise errors that should map to
401/403. - Secret material (cookie encryption keys, S2S keys, API credentials) should come from secure secret management and never be hardcoded or logged.
Z-Arch Extensions: Project Context Interface
This document describes the extension-facing interface exposed to extensions through the project_context argument passed into lifecycle hooks. This is the stable API extensions should use. It is intentionally narrow, safe, and versioned by Z-Arch.
If you are authoring an extension, you should only access functionality via project_context (not internal modules).
Quick Start
A minimal extension looks like:
from typing import Any, Dict
from zarch.extensions.base import ZArchExtension
class Extension(ZArchExtension):
def claim(self, extension_name: str, extension_block: Dict[str, Any]) -> bool:
return extension_block.get("type") == "example"
def on_post_deploy(self, project_context, extension_configuration: Dict[str, Any]) -> None:
project_context.log("Hello from my extension!")The project_context object is your primary tool. It provides:
- project metadata (project ID, region, repo path)
- config accessors
- safe prompt helpers
- GCP helpers (secrets, service URLs, env vars, service accounts)
- GitHub and Cloudflare helpers
Lifecycle Hooks (from ZArchExtension)
Extensions can implement any subset of these methods. Each hook receives project_context and the extension-specific configuration block.
claim(extension_name, extension_block) -> bool
Return True if your extension should handle this extension block in zarch.yaml.
Example
def claim(self, extension_name: str, extension_block: Dict[str, Any]) -> bool:
return extension_block.get("type") == "my-extension"pre_project_bootstrap(project_context, extension_configuration)
Runs before initial project bootstrap, but after prompting and repo cloning.
Example
def pre_project_bootstrap(self, project_context, extension_configuration):
project_context.log("Preparing custom bootstrap")post_project_bootstrap(project_context, extension_configuration)
Runs after initial project bootstrap.
Example
def post_project_bootstrap(self, project_context, extension_configuration):
domain = project_context.config_get("domain")
project_context.log(f"Project domain is {domain}")pre_service_deploy(project_context, extension_configuration)
Runs before a Cloud Run service is deployed.
Example
def pre_service_deploy(self, project_context, extension_configuration):
project_context.log("Preparing service deployment")post_service_ensureSA(project_context, extension_configuration)
Runs immediately after the service runtime service account has been ensured/created.
Example
def post_service_ensureSA(self, project_context, extension_configuration):
event = project_context.get_event_data() or {}
sa = ((event.get("payload") or {}).get("service_account") or {}).get("email")
project_context.log(f"Service SA ready: {sa}")post_service_deploy(project_context, extension_configuration)
Runs after a Cloud Run service has been deployed.
Example
def post_service_deploy(self, project_context, extension_configuration):
project_context.log("Service deployed successfully")pre_gateway_deploy(project_context, extension_configuration)
Runs before the Z-Arch gateway is deployed.
Example
def pre_gateway_deploy(self, project_context, extension_configuration):
project_context.log("Preparing gateway deployment")post_gateway_ensureSA(project_context, extension_configuration)
Runs immediately after the gateway service account has been ensured/created.
Example
def post_gateway_ensureSA(self, project_context, extension_configuration):
payload = (project_context.get_event_data() or {}).get("payload") or {}
project_context.log(f"Gateway SA: {payload.get('service_account', {}).get('email')}")post_gateway_deploy(project_context, extension_configuration)
Runs after the Z-Arch gateway has been deployed.
Example
def post_gateway_deploy(self, project_context, extension_configuration):
project_context.log("Gateway deployed successfully")pre_job_deploy(project_context, extension_configuration)
Runs before a Cloud Run job is deployed.
Example
def pre_job_deploy(self, project_context, extension_configuration):
project_context.log("Preparing job deployment")post_job_ensureSA(project_context, extension_configuration)
Runs immediately after the job runtime service account has been ensured/created.
Example
def post_job_ensureSA(self, project_context, extension_configuration):
payload = (project_context.get_event_data() or {}).get("payload") or {}
project_context.log(f"Job SA: {payload.get('service_account', {}).get('email')}")post_job_deploy(project_context, extension_configuration)
Runs after a Cloud Run job has been deployed.
Example
def post_job_deploy(self, project_context, extension_configuration):
job_id = project_context.config_get("jobs[0].id")
project_context.log(f"Job {job_id} deployed successfully")pre_scheduler_deploy(project_context, extension_configuration)
Runs before a Cloud Scheduler job is deployed.
Example
def pre_scheduler_deploy(self, project_context, extension_configuration):
project_context.log("Preparing scheduler deployment")post_scheduler_ensureSA(project_context, extension_configuration)
Runs immediately after the scheduler service account has been ensured/created.
Example
def post_scheduler_ensureSA(self, project_context, extension_configuration):
payload = (project_context.get_event_data() or {}).get("payload") or {}
principal = payload.get("principal", {}).get("id")
project_context.log(f"Scheduler principal with SA ready: {principal}")post_scheduler_deploy(project_context, extension_configuration)
Runs after a Cloud Scheduler job has been deployed.
Example
def post_scheduler_deploy(self, project_context, extension_configuration):
scheduler_id = project_context.config_get("schedulers[0].id")
project_context.log(f"Scheduler {scheduler_id} deployed successfully")pre_topic_deploy(project_context, extension_configuration)
Runs before a Pub/Sub topic is deployed.
Example
def pre_topic_deploy(self, project_context, extension_configuration):
project_context.log("Preparing topic deployment")post_topic_deploy(project_context, extension_configuration)
Runs after a Pub/Sub topic has been deployed.
Example
def post_topic_deploy(self, project_context, extension_configuration):
topic_id = project_context.config_get("topics[0].id")
project_context.log(f"Topic {topic_id} deployed successfully")Hook Payload Matrix
Lifecycle event payloads are additive schema-v1 summaries. Use .get(...) and tolerate unknown keys.
| Hook | Key payload fields (summary) |
|---|---|
pre_project_bootstrap | project_id, module, principal, repo, create_gcp_project, regions, domain, edge_proxy, userbase |
post_project_bootstrap | status, repo, domain, edge_proxy, userbase, clients, control_plane_ready, gateway_deployed |
pre_service_deploy | principal, resource_type, source, endpoint, authenticated, flags, routes, targets, env, schema, control_plane_args (wrapper) |
post_service_ensureSA | principal, service_account, resource_type, source, endpoint, authenticated, flags, targets, routes, env, schema |
post_service_deploy | deployment, inbound_callers, outbound_targets, s2s, env, endpoint, authenticated, targets |
pre_gateway_deploy | principal, rotate_session_key, min_instance, auth_profile, session, trial_mode, control_plane_args (wrapper) |
post_gateway_ensureSA | principal, service_account, rotate_session_key, min_instance, auth_profile, session, trial_mode |
post_gateway_deploy | deployment, gateway, session, s2s, env |
pre_job_deploy | principal, source, flags, targets, env, control_plane_args (wrapper) |
post_job_ensureSA | principal, service_account, source, targets, flags, env |
post_job_deploy | deployment, targets, target_summary, s2s, env |
pre_scheduler_deploy | principal, schedule_mode, schedule, timezone, paused, targets, target_count |
post_scheduler_ensureSA | principal, service_account, schedule_mode, schedule, timezone, paused, targets, target_count |
post_scheduler_deploy | service_account, schedule_mode, schedule, timezone, paused, targets, target_summary, created_scheduler_job_ids |
pre_topic_deploy | principal, subscribers, subscriber_ids, subscriber_count, publisher_candidates, publisher_candidate_count |
post_topic_deploy | principal, subscribers, subscriber_ids, subscriber_count, publishers, publisher_ids, publisher_count |
All payloads avoid secret values (for example: env var values, session keys, gateway URL/suffix secrets).
Manual Hook Triggering
Use the CLI to manually dispatch lifecycle hooks for configured extensions:
zarch ext trigger pre_service_deploy
zarch ext trigger post_service_ensureSA --extension my-extension
zarch ext trigger post_service_deploy --extension my-extension
zarch ext trigger post_gateway_deploy --extension audit --extension cachehook_namemust be one of the lifecycle hooks defined byZArchExtension.--extensionis optional and can be repeated. Values must match extension block names underextensions:inzarch.yaml.- Without
--extension, all configured extension blocks are considered, and only installed extensions that claim those blocks are invoked. - Dispatch follows the normal hook execution policy (
localvsremote) used by live deployments. - Manual dispatches include minimal event metadata where
sourceis"manual"andpayload.extension_nameslists any explicitly selected extension blocks.
project_context Interface
The sections below describe all available attributes and methods exposed to extensions. Use them as the primary API surface.
Core Attributes
These attributes represent the current project state in a safe, read-only form.
project_context.id(str)- The active GCP project ID.
- Example:
"my-gcp-project"
Example
project_id = project_context.id project_context.log(f"Deploying project {project_id}")project_context.region(str)- The active region for this deployment run.
- Example:
"us-east1"
Example
region = project_context.region project_context.log(f"Active region: {region}")project_context.project_root_path(pathlib.Path)- Absolute path to the project root directory.
Example
root = project_context.project_root_path project_context.log(f"Root path: {root}")project_context.non_interactive(bool)- True if Z-Arch is running in non-interactive mode.
Example
if project_context.non_interactive: project_context.log("Running non-interactively")project_context.config(zarch_cli.helpers.config.Config)- The loaded Z-Arch config object.
- Most extensions should use the
config_get,config_set, andconfig_savehelpers instead of accessingconfigdirectly.
Example
cfg = project_context.config project_context.log(f"Config loaded from: {cfg.root}")
Event Metadata
get_event_data() -> dict[str, Any] | None
Read optional metadata for the lifecycle hook currently being dispatched.
- This may be
Nonewhen metadata is unavailable. - Keys are additive and may grow over time; extensions should tolerate unknown keys.
- Known envelope fields include:
schema_version(integer)source("live"or"manual")hook(hook name)timestamp(UTC ISO-8601)resource(e.g. kind/id/region)payload(hook-specific details, may be empty)
Example
event = project_context.get_event_data() or {}
payload = event.get("payload") or {}
principal = payload.get("principal") or {}
service_account = payload.get("service_account") or {}
project_context.log(
f"Hook={event.get('hook')} principal={principal.get('kind')}:{principal.get('id')} "
f"sa={service_account.get('email')}"
)Logging
log(message: str, level: str | None = None) -> None
Write a styled message to the Z-Arch console.
levelis optional and used only to tag the message (e.g. “INFO”, “WARN”).
Example
project_context.log("Preparing extension steps", level="info")Command Execution
run_command(command_parts: list[str]) -> tuple[str, int]
Run a local shell command. Returns (stdout, exit_code).
Example
out, code = project_context.run_command(["echo", "hello"])
if code == 0:
project_context.log(out.strip())gcloud(command_parts: list[str]) -> tuple[str, int]
Run a gcloud command using the embedded or system gcloud binary. Returns (stdout, exit_code).
Example
out, code = project_context.gcloud(["projects", "list", "--format=value(projectId)"])
if code == 0:
project_context.log("Projects:\n" + out)Config Access
config_get(key: str, default: Any = None) -> Any
Fetch a config value using dotted path notation.
Example
domain = project_context.config_get("domain", "")
project_context.log(f"Domain: {domain}")config_set(key: str, value: Any) -> None
Set a config value in memory (does not write to disk).
Example
project_context.config_set("gateway.session.stateful", False)config_save() -> None
Persist config changes to zarch.yaml.
Example
project_context.config_set("gateway.session.stateful", False)
project_context.config_save()Prompts
These are safe wrappers around Z-Arch’s prompt system.
ask(message: str, default: str | None = None, required: bool = True, validate: Callable | None = None) -> str
Prompt the user for a string value.
Example
name = project_context.ask("What is the service name?", default="session")choice(message: str, choices: list[str], default: str | None = None, sub_prompt: str = "") -> str
Prompt the user to select a single option.
Example
region = project_context.choice("Select region", ["us-east1", "us-west1"], default="us-east1")multichoice(message: str, choices: list[str], default: list[str] | None = None, sub_prompt: str = "(space to toggle, enter to confirm)") -> list[str]
Prompt the user to select multiple options.
Example
features = project_context.multichoice("Enable features", ["cdn", "auth", "logging"])yes_no(message: str, default: bool = True, sub_prompt: str = "") -> bool
Prompt the user for a yes/no response.
Example
confirm = project_context.yes_no("Proceed with cleanup?", default=False)review_and_confirm() -> None
Render the config and ask the user to confirm. Useful before sensitive operations.
Example
project_context.review_and_confirm()GCP Helpers
These helpers wrap common GCP operations and automatically use the project context’s id and region where applicable.
ensure_service_account(service_account_name: str, **kwargs) -> str
Ensure a service account exists, creating it if missing.
service_account_namecan be either a full email (name@project.iam.gserviceaccount.com) or just the short name.- Optional kwargs:
project_id(str) override the current project IDdisplay_name(str) override the display name
Example
sa = project_context.ensure_service_account("zarch-ext")
project_context.log(f"Service account: {sa}")secret_exists(secret_name: str) -> bool
Check if a Secret Manager secret exists in the current project.
Example
if not project_context.secret_exists("my-secret"):
project_context.log("Secret does not exist")store_secret(secret_name: str, secret_value: str) -> None
Create or update a Secret Manager secret with a new version.
Example
project_context.store_secret("my-secret", "super-secure-token")get_secret(secret_name: str) -> str
Fetch the latest version of a Secret Manager secret.
Example
token = project_context.get_secret("my-secret")get_service_url(service_name: str) -> str
Fetch the Cloud Run service URL for a named service in the current region.
Example
session_url = project_context.get_service_url("session")
project_context.log(f"Session URL: {session_url}")get_env_var(service_name: str, env_var_key: str) -> str
Read a specific environment variable from a deployed service or function.
Example
public_key = project_context.get_env_var("zarch-gateway", "S2S_PUBLIC_KEY")set_env_vars(service_name: str, env_vars: dict[str, str]) -> None
Set or update environment variables on a deployed service or function.
Example
project_context.set_env_vars("session", {"SESSION_TTL": "1209600"})GitHub
github()
Return an authenticated GitHub client (PyGitHub-style client used internally by Z-Arch).
Example
gh = project_context.github()
user = gh.get_user()
project_context.log(f"GitHub user: {user.login}")get_connected_repo() -> tuple[str, str]
Return the connected repo fullname and branch as ("owner/repo", "branch").
Example
repo, branch = project_context.get_connected_repo()
project_context.log(f"Connected repo: {repo} ({branch})")Cloudflare
These helpers manage Cloudflare workers and pages as used by Z-Arch.
update_edge_proxy(project_name: str | None = None) -> None
Update the edge proxy worker for the project.
- If
project_nameis omitted, it is inferred from the connected repo name.
Example
project_context.update_edge_proxy()set_edge_proxy_envs(env_vars: dict[str, str], project_name: str | None = None) -> bool
Set environment variables on the edge proxy worker.
- Returns
Trueon success,Falseon failure.
Example
ok = project_context.set_edge_proxy_envs({"API_VERSION": "v1"})
if not ok:
project_context.log("Failed to update edge envs", level="warn")deploy_cf_worker(script_name: str, repo_root_dir: str, repo_full: str | None = None, branch: str | None = None, domain: str | None = None) -> None
Deploy a Cloudflare Worker from the connected repo.
script_name: Worker script identifierrepo_root_dir: Root path in the repo to deployrepo_full: Optionalowner/repooverridebranch: Optional branch overridedomain: Optional custom domain
Example
project_context.deploy_cf_worker(
script_name="my-worker",
repo_root_dir="services/edge",
branch="main",
)set_worker_route(script_name: str, domain: str, route: str = "/api/*") -> None
Attach a route to a worker script.
Example
project_context.set_worker_route("my-worker", "example.com", "/api/*")deploy_cf_pages(domain: str, project_name: str | None = None, repo_full: str | None = None, branch: str | None = None) -> None
Deploy a Cloudflare Pages project from the connected repo.
Example
project_context.deploy_cf_pages("example.com")End-to-End Example
A realistic extension that uses multiple helpers:
from typing import Any, Dict
from zarch.extensions.base import ZArchExtension
class Extension(ZArchExtension):
def claim(self, extension_name: str, extension_block: Dict[str, Any]) -> bool:
return extension_block.get("type") == "my-ext"
def post_service_deploy(self, project_context, extension_configuration: Dict[str, Any]) -> None:
project_context.log("Post-deploy hook starting")
# Read config
domain = project_context.config_get("domain", "")
if not domain:
project_context.log("No domain configured", level="warn")
return
# Ensure a secret exists
if not project_context.secret_exists("edge-api-key"):
project_context.store_secret("edge-api-key", "replace-me")
# Update edge proxy envs
project_context.set_edge_proxy_envs({"API_VERSION": "v1"})
# Deploy pages site
project_context.deploy_cf_pages(domain)
project_context.log("Post-deploy hook complete")zarch.yaml
extensions:
{extension_name}:
type: "{extension_name}"
required_roles: []
config:
example_key: example_valueAdd each extention to zarch.yaml or it will not run even if it is installed. The extension block is a dictionary of objects keyed by each extension’s name. type is the extension’s name. Include all GCP IAM roles that are required by the service account that will run the extension in the required_roles list. Values in config: are available to the extension code at runtime.
Notes and Best Practices
- Prefer
config_get/config_setover accessingproject_context.configdirectly. - Use
log()for all extension output to stay consistent with Z-Arch UX. - Avoid raw shell calls unless absolutely necessary; use provided helpers first.
- Never log secrets or gateway URL suffixes.
If you need additional helpers, consider filing a request rather than importing internal modules directly.
