I - Understanding Z-Arch

Introduction

Why Z-Arch exists

Application logic is cheap and easy to generate. Backend security architecture is not. A solo developer can produce routes, handlers, and database calls in a day. What they cannot reliably produce with the same momentum is a correct perimeter, a consistent authentication model, and defense-in-depth in a scalable shape that does not leak cost or drift over time. The usual result is either improvised security or reliance on hosted platforms that own your infrastructure, data, and wallet.

Z-Arch closes the gap cleanly by defining a fixed backend architecture in one small configuration file, zarch.yaml. Authentication, perimeter enforcement, and service trust are enforced automatically. This is not another AI code generator or agent “skill”. Nor is it a “codebase security scanner” that looks for problems only after they exist. It is proactive, deterministic software that constrains and guards your application code in the simplest manner possible. You describe your system in zarch.yaml. Z-Arch makes that description safe and guaranteed.

In practice this means:

All public traffic passes through a single enforced ingress plane.
No request is trusted, not even internally.
Requests that reach your business logic have already satisfied the authentication guarantees defined by the platform.
Scale-to-zero is the default economic behavior of every component.
Your entire backend architecture is described in an elegant and concise format.

With the architecture guaranteed, a single developer, or an LLM operating within the constraints of the Z-Arch MCP, can generate a complete and secure backend with ease. Business logic becomes the focus. Your AI saves context, you save peace of mind.

The Z-Arch Philosophy

Z-Arch is Post-SaaS software. It does not host your application. It does not store your data. It does not introduce a proprietary runtime you cannot leave. Everything deploys into your own GCP project. You own the infrastructure, the secrets, and the data. The architecture is opinionated by design, because the goal is correctness, repeatability, and security at minimal cost. But unlike alternative services, it never stands in your way. Z-Arch is anti lock-in.

Platform Architecture

When you bootstrap a project with Z-Arch, you will see a small set of platform-owned resources appear alongside your own services and jobs. These are the stable components Z-Arch uses to apply zarch.yaml, enforce the API perimeter, and keep your project convergent over time.

At a high level, Z-Arch separates the system into three concerns: deployment orchestration, the API perimeter, and your application workloads. Each concern has a dedicated component, and those components keep their responsibilities narrow.

Z-Arch Components

Control Plane

Z-Arch deploys a Control Plane job into your project. This is the execution surface that applies zarch.yaml and performs platform operations in a defined order. It is not part of your runtime request path. It exists so infrastructure and policy are applied deterministically rather than by ad-hoc scripts or manual console work.

Gateway

Z-Arch deploys the Gateway as a Cloud Run service. This is the single API perimeter for your backend. Requests are routed and authenticated here before reaching your services. Your services are not intended to be public ingress points.

Edge Proxy

Z-Arch deploys a Cloudflare Worker, the Edge Proxy, in front of the Gateway. The default edge exists to route traffic to the Gateway, provide cost-effective load balancing, apply rate limiting at the edge, and benefit from Cloudflare’s superior abuse absorption. It can inject client-specific API keys for attestation to the Gateway. The edge proxy is not where Z-Arch’s authentication model is enforced. The Gateway is.

Extension Runner

You may also see an Extension Runner job. Its presence is normal. It exists to execute configured extension lifecycle logic as part of Z-Arch operations.

Where Your Code Runs

Services

Your services are Cloud Run services that implement application logic. They sit behind the Gateway. In the intended topology, they receive requests that have already passed gateway enforcement.

Jobs

Your jobs are Cloud Run jobs for run-to-completion workloads that don’t need to serve HTTP traffic. They are invoked by schedulers, event flows, or directly by services and can interact with other resources according to declared targets.

Topics and Schedulers

Topics and schedulers are the event and time primitives that connect services and jobs without turning everything into direct synchronous calls. They are declared in zarch.yaml and deployed as part of the same convergent system.

Request Flow

Public ingress in the default topology is:

Client -> Edge Proxy -> Gateway -> Service

The operational boundary sequence is:

Edge proxy validates client API-key posture and forwards traffic to the gateway.
Gateway matches route and enforces declared auth mode (dual, jwt-only, or public).
For protected routes, JWT validation and session enforcement complete before proxying.
Gateway attaches short-lived internal trust assertions to the upstream call.
Target service verifies S2S trust assertions before accepting the internal call.

Service-to-service calls follow the same trust model: callers mint short-lived assertions and receivers verify caller identity, intended audience, and token freshness.

Container boundary rule: protected traffic does not execute inside an application service container until perimeter checks have passed at the gateway.

Gateway Enforcement Model

The Gateway is the single API perimeter for a Z-Arch system. It exists so authentication and perimeter controls are enforced before any request reaches your application container.

Application services are not ingress points. The Gateway is.

Routing and Request Matching

The Gateway is the only component responsible for mapping incoming requests to backend services.

Routes are declared in zarch.yaml.
Path and method evaluation are deterministic.
Literal and regex route styles are supported.
Endpoint exposure is controlled per service.

If a route is not declared, it does not exist.

JWT Verification with OIDC

For authenticated flows, the Gateway validates end-user JWTs against your configured OIDC provider.

OIDC discovery metadata is configurable.
JWT signature and audience checks are enforced.
Validation is based on OIDC/JWKS standards.

Services do not parse or validate end-user JWTs for public traffic. That enforcement occurs at the Gateway.

Session Enforcement and Dual Requirement

For protected routes using the dual auth mode, Z-Arch enforces two independent checks:

A valid end-user JWT.
A valid session cookie bound to that JWT.

Both must pass. If either fails, the request is rejected.

This enforcement occurs before traffic is forwarded to the target service. A request never executes inside the same container as your application logic unless it satisfies its declared auth mode.

Route Auth Modes

Auth mode is declared per route.

Available modes:

dual — JWT and session cookie required.
jwt-only — JWT required, no session cookie.
public — no end-user authentication required.

Even public routes remain inside the Gateway perimeter and are not directly exposed as independent services.

CORS and Security Headers

CORS policy and security headers are enforced at the Gateway level.

Allowed origins, headers, and methods are centrally configured.
Credential mode and cache windows are applied uniformly.
Security headers are applied consistently across routes.

Services do not manage CORS logic. This prevents inconsistent browser-facing behavior across services.

Request Validation and Rate Controls

The Gateway enforces route-level operational controls:

Required content type enforcement.
Per-route rate limit declarations.
Capability checks where authz is configured.

These controls are applied before forwarding to services. Services do not implement their own rate limiting for public traffic unless explicitly required for domain reasons.

Internal Service Trust Propagation

When the Gateway forwards a request to a service, it attaches short-lived internal trust token.

Services verify this assertion with ZArchAuth.s2s.verify() to ensure the caller is the Gateway and that the request is intended for them.

What You Still Implement in Your Services

The Gateway does not make domain-level authorization decisions.

Services remain responsible for:

Business authorization rules.
Domain-specific input validation.
Application error handling and observability.
Data-layer constraints and integrity.

The Gateway enforces perimeter authentication and route policy. Application services enforce business logic.

Authentication and Access Model

OIDC Model

Z-Arch validates end-user JWTs using any OIDC-compliant provider.

Primary inputs:

authn.discovery: OIDC discovery URL.
authn.client_id: expected JWT audience.

Managed Firebase bootstrap remains available, but runtime JWT validation is provider-agnostic under the same OIDC model.

JWT Validation and Session Binding

For authenticated routes, gateway policy enforces the configured route auth mode.

dual: both a valid JWT and a valid session cookie bound to that identity are required.
jwt-only: a valid JWT is required; no session cookie requirement.
public: no end-user JWT/session requirement.

Session behavior is governed by gateway/security configuration (same-site policy, TTL, and issued-at skew tolerance).

Client API Keys

Frontend clients are declared in clients and mapped to API-key identities stored in secret manager.

Each client includes:

id
type (web, ios, android)
api_key secret reference
add_to_edge to control edge-side API-key injection on the client segment

Capability Enforcement Status

Z-Arch includes authz capability and role structures with route-level requires checks. This model exists in the contract surface but remains an actively developing area.

Perimeter vs Domain Authorization

Perimeter authentication and route policy are enforced by the edge/gateway boundary. Domain authorization remains a service concern.

Serverless Primitives

Z-Arch breaks backend architecture into explicit primitives.

Services

Services are request/response workloads that can be exposed as API endpoints or kept internal.

Common use cases:

Product API endpoints.
Session and account services.
Internal orchestration APIs.

Example:

services:
  - id: orders
    endpoint: true
    authenticated: true
    routes:
      - path: /orders
        method: GET
      - path: /orders
        method: POST
        auth: dual
    location: /services/orders
    env: {}
    flags: []
    targets: [payments, order-events]

Jobs

Jobs are run-to-completion workloads for asynchronous or batch execution.

Common use cases:

Daily reconciliation.
Data cleanup.
Bulk import/export.

Example:

jobs:
  - id: daily-reconcile
    location: /jobs/daily-reconcile
    env:
      BATCH_SIZE: "500"
    flags:
      - "--task-timeout=1200s"
    targets: [orders, reporting-events]

Topics

Topics are passive event channels for publish/subscribe flows.

Common use cases:

Event fan-out.
Workflow decoupling.
Retry-friendly async processing.

Example:

topics:
  - id: reporting-events
    sub:
      - id: analytics
        mode: push
        ack_deadline_secs: 60

Schedulers

Schedulers are time-based trigger resources that invoke services, run jobs, or publish messages.

Common use cases:

Hourly sync tasks.
Nightly cleanup.
Weekly report generation.

Example:

schedulers:
  - id: hourly-sync
    cadence:
      every: 1
      unit: hour
      timezone: "UTC"
    targets: [daily-reconcile]

Targets and Trust Edges

targets define the declared execution graph for a project.

Service and job targets define callable relationships.
Topic targets define event publish permissions.
Scheduler targets define trigger actions.

Trust edges are explicit and deterministic:

Short-lived assertions are minted only for declared call paths.
Receivers validate caller identity and intended audience.
Undeclared caller/target edges are outside the trust graph and are not accepted as valid internal calls.

This keeps topology reviewable and prevents hidden trust relationships from emerging outside zarch.yaml.

II - Operating Guides

Getting Started

Project Bootstrap

Create a new app:

zarch new app

Typical guided flow:

Choose new vs existing repository mode.
Select template/source repository.
Select cloud project and region details.
Configure domain and edge deployment options.
Configure authentication options.
Apply bootstrap and initial deployment steps.

Existing Repository Bootstrap

You can bootstrap into an existing repository/branch instead of generating from a template repository.

zarch new app --not-new-repo

Use this when:

You already have an active codebase integrated with Z-Arch.
You want to bootstrap a separate GCP project for different branches (e.g. prod, test)
You are integrating an existing codebase with Z-Arch for the first time.

New Project from Any Template

You can point Z-Arch to any GitHub template repository that includes a valid zarch.yaml.

zarch new app --template-repo owner/template-repo

This enables:

Company-standard starter templates.
Shared community templates.
Distributing / installing ready to deploy applications.

Recommended template practice:

Keep a stable template repository with a clean zarch.yaml baseline.
Version template changes via standard branch/tag workflows.
Document template assumptions in repository README.
Bootstrap new projects by referencing that template repository directly.

Deployment Model

Z-Arch operates on a convergent deployment model. You declare intent in zarch.yaml; deployment applies that intent in a fixed order; repeated deployment converges toward the same runtime shape.

You do not manually orchestrate infrastructure components. The Control Plane applies resources in dependency order.

Deployment Model

When deploying a full project, resources are applied in this order:

Topics
Services
Jobs
Schedulers
Gateway
Edge updates

This ordering ensures:

Event infrastructure exists before publishers or subscribers.
Services and jobs exist before schedulers reference them.
The Gateway reflects the current route definitions.
Edge routing reflects the current Gateway endpoints.

Most workflows use:

zarch deploy all

This applies the full convergence cycle.

Anti-Pattern

Manually wire load balancers.
Manually configure IAM bindings for service trust.
Manually synchronize route definitions between services and ingress.
Deploy hidden infrastructure outside zarch.yaml.

Operational changes are expressed in configuration. Deployment enforces alignment.

Convergence Principle

Repeated deployment of the same configuration produces the same infrastructure state.

Drift introduced outside Z-Arch is not part of the intended model. The Control Plane is responsible for re-aligning declared resources with runtime state.

Devbox

Overview

zarch devbox gives each developer a project-scoped cloud workspace that can be created, paused, resumed, and replaced on demand.
The core value is operational consistency: every developer starts from the same environment profile, tied to the same project controls, with a predictable access pattern.

For teams, devboxes provide:

Faster onboarding with fewer machine-specific setup issues.
Lower local environment drift across engineers.
Better isolation between projects and developers.
Safer experimentation with limited blast radius.
Cost control through explicit start/stop lifecycle commands.

Why Ephemeral Environments Matter

Devboxes are intentionally disposable. If an environment becomes unstable, slow, or misconfigured, the preferred recovery path is to replace it quickly instead of spending hours on manual repair.

This model improves:

Mean time to recovery for developer blockers.
Reproducibility across engineers.
Support handoff quality (shared, repeatable recovery workflow).

Quick Start

# 1) Set active project directory
zarch set project /path/to/your/zarch/project

# 2) Connect Cloudflare credentials (project scope recommended)
zarch connect cloudflare --project

# 3) Create devbox
zarch new devbox <username>

# 4) Connect over SSH using the alias printed by the CLI
ssh devbox-<project-id>-<usr3>

Example:

zarch new devbox ram
ssh devbox-myproject-ram

Prerequisites

Before creating or managing devboxes, ensure:

You are in a valid Z-Arch project directory (zarch.yaml exists).
Your active GCP project is configured and accessible.
domain is set in zarch.yaml.
Cloudflare credentials are connected through Z-Arch:
- zarch connect cloudflare --project
- or zarch connect cloudflare --global

Required IAM capabilities typically include:

Compute Engine instance management
IAM service account and policy binding management
Secret Manager secret and policy binding management
Service API enablement permissions

After your first SSH login, it is highly recommended that you authenticate gcloud and ADC as your user identity.

Why this matters

The VM service account is intentionally limited for platform runtime operations.
Developer workflows (project administration, many gcloud actions, and parts of Z-Arch usage) should run under the developer’s user identity for correct authorization and auditability.

Run this once per new devbox

gcloud auth login
gcloud auth application-default login
gcloud config set project <project_id>

Verify

gcloud auth list
gcloud auth application-default print-access-token

Expected outcome:

Your interactive CLI/API calls execute as your user identity.
You avoid common PERMISSION_DENIED failures caused by relying on the VM service account for user workflows.

Default Tooling Baseline

New devboxes are delivered with a platform baseline of developer tooling.
This baseline combines platform-provisioned tools and Ubuntu image utilities.

Platform baseline tools

The following tools are expected in a newly provisioned devbox:

zarch (Z-Arch CLI; platform standard)
gcloud (Google Cloud CLI)
git
gh (GitHub CLI)
curl
jq
binutils
build-essential
docker and Docker Compose plugin
Node.js (LTS) and npm
wrangler
firebase CLI
Go (go, installed via golang-go)
Rust (rustc and cargo)
codex CLI
claude CLI
gemini CLI
playwright and playwright-mcp
Miniforge/Conda-based Python environment tools

Ubuntu base image utilities

In addition to platform tooling, Ubuntu provides core operational utilities commonly used for day-to-day engineering tasks, including:

shell and core GNU tooling (bash, coreutils)
package management (apt)
native build and linker tooling (binutils, build-essential)
language toolchains installed from Ubuntu packages (golang-go, rustc, cargo)
service management (systemctl/systemd)
SSL and crypto utilities (ca-certificates, gnupg)
common Linux networking and process utilities

Validation checklist (recommended)

After first login, validate your baseline:

command -v zarch gcloud git gh curl jq docker node npm wrangler go rustc cargo

If a required tool is missing, follow your team’s custom provisioning approach in the next section.

Command Reference

Command	Purpose
`zarch new devbox [username]`	Create and configure a new developer VM.
`zarch devbox list`	List all devboxes in the active project.
`zarch devbox on <username>`	Start a devbox VM by username.
`zarch devbox off <username>`	Stop a devbox VM by username.
`zarch devbox delete <username>`	Delete a devbox VM by username.
`zarch devbox delete <username> --delete-service-account`	Delete VM and associated service account.

Custom Provisioning with an Appended Startup Script

If your team needs additional packages, internal CLIs, or project-specific setup, you can provide an additional startup script during devbox creation.

How it works

During zarch new devbox <username> (interactive mode), Z-Arch prompts for:

Additional startup script path (optional)

Provide a local script file path. Z-Arch will append and run that script as part of devbox provisioning.

Recommended usage pattern

Store team customization scripts in source control (for example: devbox/custom-startup.sh).
Keep scripts idempotent so reruns are safe.
Use explicit version pins for critical toolchains.
Log clearly to simplify support and audits.

Example flow

zarch new devbox alice
# When prompted:
# Additional startup script path (optional): ./devbox/custom-startup.sh

Important notes

The script is applied at provisioning time for that devbox creation.
To apply a changed script to an existing environment, the recommended pattern is recreate:
- zarch devbox off <username>
- zarch devbox delete <username>
- zarch new devbox <username> (with updated script path)

Naming and Access Conventions

For username alice in project my-project:

VM name: devbox-alice
Service account: devbox-alice-sa@my-project.iam.gserviceaccount.com
Forward domain: alice.dev.<domain>
Local SSH alias format: devbox-<project-id>-<usr3>

Z-Arch updates local SSH configuration on the machine running zarch, enabling alias-based connection.

Ephemeral Operations Model

Standard policy

One devbox per developer per project.
Prefer fast replacement over prolonged repair when the environment is degraded.
Treat devbox state as recoverable and reproducible.

Reset runbook (recommended for broken environments)

zarch devbox off <username>
zarch devbox delete <username>
zarch new devbox <username>

Then reconnect:

ssh devbox-<project-id>-<usr3>

Operational Runbooks

Daily startup / shutdown

Start your environment:

zarch devbox on <username>

Stop when idle:

zarch devbox off <username>

Environment recovery (nuke and recreate)

Use this when troubleshooting exceeds a reasonable threshold:

zarch devbox off <username>
zarch devbox delete <username>
zarch new devbox <username>

Developer offboarding

Remove compute resource:

zarch devbox delete <username>

If identity should also be removed:

zarch devbox delete <username> --delete-service-account

Incident triage flow

Verify VM state:

zarch devbox list

Verify reachability:

SSH alias in ~/.ssh/config
Forward domain resolves as expected

If unresolved after basic checks, escalate to reset runbook.

Team Operating Model

Recommended team practices:

Use stable, unique usernames per developer.
Define ownership boundaries: one devbox equals one responsible engineer.
Use start/stop discipline for cost management.
Standardize recovery on replace-over-repair for severe drift.
Use project-scoped credentials wherever possible for governance clarity.

Security and Identity Notes

Devboxes use least-privilege runtime identity for platform-controlled operations.
Developer-admin actions should use authenticated user identity (see ADC section).
Runtime secret access is managed through project controls, not local plaintext artifacts.
Devbox lifecycle actions do not require adding new fields to zarch.yaml.

Troubleshooting

Symptom	Likely Cause	Resolution	Escalation
`No valid zarch.yaml found in this project directory.`	Wrong working directory or project not set	`zarch set project /path/to/project`	Confirm repo contains correct `zarch.yaml`
`Cloudflare token not found for this project`	Cloudflare credentials not connected for active scope	`zarch connect cloudflare --project` (or `--global`)	Validate active project context and retry
`PERMISSION_DENIED` running `gcloud`/Z-Arch in VM	Running as limited VM identity instead of user	Run first-login identity steps (`gcloud auth login`, `gcloud auth application-default login`, set project)	Verify org/project IAM grants for user
`Authenticated as VM service account` when user actions are expected	User auth/ADC not initialized	Re-run user auth + ADC commands	Recheck `gcloud auth list` output
SSH alias fails to connect	VM is off, alias missing, or DNS not yet updated	`zarch devbox list`, `zarch devbox on <username>`, verify `~/.ssh/config` alias	If unresolved, run reset workflow
Devbox repeatedly unstable after manual fixes	Environment drift or corrupted local state	Execute reset runbook (delete + recreate)	Escalate with logs and issue context

Success Criteria

Your devbox adoption is working well when:

New developers can reach a usable environment quickly.
Most severe environment failures are resolved by fast recreate cycles.
Daily operations use on/off predictably for cost control.
Teams see reduced “works on my machine” inconsistency.
User identity authorization in devboxes is consistently configured.

Day-2 Operations

Incremental Changes

Z-Arch supports targeted deployments. You do not redeploy the entire system for every change.

Changing a Service

If you modify:

Service code
Environment variables
Declared targets

You redeploy that service:

zarch deploy service <name>

If you modify routes for that service, the Gateway must also be redeployed so route definitions remain aligned.

Changing Gateway Behavior

If you modify:

Route auth modes
Session configuration
CORS configuration
Security settings

You redeploy the Gateway:

zarch deploy gateway

If rotating session encryption keys:

zarch deploy gateway --rotate-session-key

Session rotation invalidates existing sessions by design.

Client Identity Changes

Client API keys can be regenerated independently:

zarch client all
zarch deploy edge

Removing an API key’s name from the api_key field in a client’s config before running zarch client all will rotate the secret.

Troubleshooting Guide

Common issues and checks:

Active project not set.
- Run zarch set project <path>.
Cloud project not active.
- Verify cloud CLI auth and active project selection.
Deploy command reports missing resource ID.
- Confirm id exists in the relevant block.
Route validation errors.
- Verify path format and duplicate method/path keys.
CORS failures in browser clients.
- Verify security.cors.allowed_origins, headers, methods, and credential settings.
Auth failures on protected endpoints.
- Check route auth mode and your JWT/session flow assumptions.

Multi-Region Model

Use regions in zarch.yaml to declare multi-region intent, then apply region-aware edge load-balancing settings.

Regions are intended to be convergent. Divergent per-region code paths and undeclared regional drift are outside the intended model.

In practice:

Deploy intent remains centralized in one zarch.yaml contract.
Gateway and service shape are expected to remain structurally consistent across declared regions.
Edge load-balancing policy determines request distribution across those converged regional deployments.

Security Posture and Shared Responsibility

Z-Arch guarantees perimeter authentication enforcement at the edge/gateway boundary according to declared route policy.

Z-Arch does not guarantee domain authorization correctness inside your application services.

Z-Arch does not protect against incorrect business logic, unsafe data handling, or domain-level policy mistakes in service code.

Security best-practice baseline:

Keep zarch.yaml declarative and free of sensitive values.
Prefer explicit origin allowlists for CORS.
Keep protected routes on dual unless there is a deliberate reason to use jwt-only.
Treat API keys as perimeter credentials and rotate them periodically.
Keep service targets minimal and intentional.
Use short-lived internal trust tokens and avoid custom long-lived shared secrets between services.
Always keep secrets in Secret Manager.
Call third-party APIs from backend services, not directly from untrusted clients where avoidable.

III - Reference Manuals

CLI Reference

Global Usage

zarch [--non-interactive] [--quiet] <command>

Run with no command to open the interactive console:

zarch

Behavior:

zarch with no subcommand opens the interactive shell.
--non-interactive avoids prompts and relies on explicit flags and existing config values.
--quiet suppresses most non-critical output.

Command groups:

Group	Purpose
`set`	Active project and Control Plane source pinning
`new`	App bootstrap and Devbox creation
`deploy`	Resource deployments
`devbox`	Devbox lifecycle operations
`client`	API client identity and key generation
`connect` / `disconnect`	Third-party credential management
`ext`	Extension scaffolding, install, and hook triggering
`mcp`	Local MCP server for controlled config operations and AI assisted development
`update`	Update the Control Plane to the latest version
`register`	Register your project with a lifetime Z-Arch license

`set`

Project and Control Plane pinning commands.

zarch set project [path]
zarch set branch <branch>
zarch set repo <owner/repo>
zarch set defaults

Usage notes:

set project activates a local directory that contains zarch.yaml.
set branch changes the branch pinned for Control Plane operations.
set repo changes the repository pinned for Control Plane operations.
set defaults is reserved for default preference management.

`new`

Create new platform resources.

zarch new app [options] [path]
zarch new devbox [username]

new app options:

Option	Purpose
`--region`	Initial deployment region
`--project-id-prefix`	Prefix for generated cloud project IDs
`--template-repo`	Source template repository (`owner/repo`)
`--new-repo`	New repository name
`--private-repo`	Create repository as private
`--signup-mode`	Signup policy
`--auth-methods`	Managed auth methods
`--mfa`	MFA mode
`--billing-account`	Billing account override
`--not-new-repo`	Use an existing repository

Examples:

zarch new app --template-repo my-org/my-template
zarch new app --region us-east1 --project-id-prefix prod-
zarch new app --not-new-repo
zarch new devbox alice

`deploy`

Deploy one or more declared resources.

zarch deploy gateway [--region] [--rotate-session-key]
zarch deploy service [--region] <name>
zarch deploy job [--region] <name>
zarch deploy topic [--region] <name>
zarch deploy scheduler [--region] <name>
zarch deploy edge
zarch deploy all [--region]

Usage notes:

If --region is omitted, deploy commands use iterate through all regions declared in config.
deploy all applies the full deployment sequence for declared resources.
--rotate-session-key on gateway deployment rotates session encryption key material and forces new sessions.

Examples:

zarch deploy service orders
zarch deploy gateway --region us-east1 --rotate-session-key
zarch deploy all

`devbox`

Manage Devbox lifecycle.

zarch devbox list
zarch devbox on <username>
zarch devbox off <username>
zarch devbox delete <username> [--delete-service-account]

`client`

Manage API clients and keys.

zarch client new <type> <id> [--no-edge] [--firebase]
zarch client all [--firebase]

Usage notes:

Client type for validated zarch.yaml configuration is web, ios, or android.
--no-edge prevents automatic key injection setup at the Edge Proxy layer.
--firebase provisions provider-side client configuration for managed auth flows.

Examples:

zarch client new web web-app
zarch client all --firebase

`connect` and `disconnect`

Manage third-party credentials in bootstrap/global or project scope.

zarch connect github [--project|--global]
zarch connect cloudflare [--project|--global]

zarch disconnect github [--project|--global]
zarch disconnect cloudflare [--project|--global]

Scope behavior:

--project stores credentials for the active project.
--global stores bootstrap credentials used as fallback when project-scoped credentials are not set.

`ext`

Manage Z-Arch Extensions.

zarch ext new <name>
zarch ext install <source> [--all] [--editable]
zarch ext trigger <hook_name> [--extension <name>] [--region <region>]

Usage notes:

ext new scaffolds a new extension package.
ext install --all installs all discoverable extensions in the active project.
ext trigger manually dispatches a lifecycle hook for testing and operations.

`mcp`

Start the local MCP server.

zarch mcp

Exposed MCP tools are focused on safe, validated config operations and scaffolding workflows. You do not need to manually invoke this command.

`update`

Update the deployed Control Plane environment.

zarch update

`register`

Purchase a Z-Arch license. Scope = license type.

zarch register [scope]

zarch.yaml Reference

zarch.yaml is the source of truth for your platform architecture:

It defines what exists.
It defines what is exposed.
It defines how resources are allowed to interact.

Compressed Architectural Language

zarch.yaml is not just configuration. It is a compact mapping of your backend architecture.

In one machine-legible document, it encodes:

Identity and authentication boundaries.
Authorization capabilities and roles.
IAM-relevant topology via explicit resource targets.
Infrastructure primitives (Services, Jobs, Topics, Schedulers).
Gateway exposure and route behavior.
Extension declarations and deployment controls.
Operational guardrails and runtime policy defaults.

Core architectural intent is centralized and structured. This compactness is useful for both engineers and LLMs:

Humans can review architecture intent quickly without chasing hidden dashboard state.
The grammar is small enough for LLMs to reason over the full backend topology without burning tokens or hallucinating.

Z-Arch is therefore opinionated, structured, and machine-reasonable by design: a deterministic serverless pattern expressed in a concise file that is practical for both production operations and AI-assisted engineering.

Top-Level Contract

Top-level keys:

Key	Required	Type	Notes
`platform`	Yes	`string`	Currently `gcp`
`domain`	Yes	`string	null`
`clients`	Yes	`array`	Frontend/API clients
`gateway`	Yes	`object	null`
`authn`	Yes	`object	null`
`security`	Yes	`object	null`
`project_id`	No	`string	null`
`firebase_project_id`	No	`string	null`
`firebase_tenant_id`	No	`string	null`
`regions`	No	`array[string]`	Deployment regions
`edge`	No	`object	null`
`authz`	No	`object	null`
`services`	No	`array`	Service resources
`jobs`	No	`array	null`
`topics`	No	`array	null`
`schedulers`	No	`array	null`

Gateway and Security Blocks

Gateway block:

Field	Type	Allowed Values / Behavior
`gateway.type`	`string	null`
`gateway.version`	`string	null`
`gateway.min_instance`	`integer`	`0+`
`gateway.session.service_id`	`string	null`
`gateway.session.stateful`	`boolean`	Stateful session toggle

Security block:

Field	Type	Allowed Values / Behavior
`security.session_samesite`	`string`	`Strict`, `Lax`, `None`
`security.session_ttl_secs`	`integer`	Session lifetime seconds, `0+`
`security.iat_skew_secs`	`integer`	JWT issued-at skew tolerance, `0+`
`security.cors.allowed_origins`	`array[string]`	Explicit allowed origins
`security.cors.allowed_headers`	`array[string]`	Allowed request headers
`security.cors.allowed_methods`	`array[string]`	`GET`, `POST`, `PUT`, `PATCH`, `DELETE`, `OPTIONS`, `HEAD`, `TRACE`, `CONNECT`
`security.cors.expose_headers`	`array[string]`	Headers exposed to browser JS
`security.cors.allow_credentials`	`boolean`	Credentialed browser requests
`security.cors.max_age_seconds`	`integer`	Preflight cache duration

Auth Blocks

authn supports two primary models:

Managed auth bootstrap path (auth_methods, signup_mode, mfa).
Generic OIDC validation path (discovery, client_id).

authn field values:

Field	Type	Allowed Values / Behavior
`authn.discovery`	`string	null`
`authn.client_id`	`string	null`
`authn.auth_methods`	`array[string]`	`Email/Password`, `Email Link`, `Google`, `Github`, `Microsoft`, `Apple`, `Facebook`
`authn.signup_mode`	`string	null`
`authn.mfa`	`string	null`

authz supports:

capabilities
roles
roles.<role>.grants

Use requires on routes for capability checks where applicable.

Resource Blocks

clients:

Field	Type	Allowed Values / Behavior
`id`	`string`	Unique client ID
`type`	`string`	`web`, `ios`, `android`
`api_key`	`string	null`
`add_to_edge`	`boolean`	Edge-side key injection toggle

services:

Field	Type	Allowed Values / Behavior
`id`	`string`	Unique service ID
`endpoint`	`boolean`	Expose via Gateway
`authenticated`	`boolean`	Default route auth requirement
`routes`	`array`	Route definitions
`location`	`string	null`
`image`	`string	null`
`env`	`object`	Runtime env vars
`flags`	`array[string]`	Additional deploy flags
`targets`	`array[string]`	Allowed downstream targets

location and image are mutually exclusive in intended usage.

Service route fields:

Field	Type	Allowed Values / Behavior
`path`	`string`	Literal (`/path`) or regex (`^/path$`)
`method`	`string`	HTTP verb
`mode`	`string`	`http`, `stream`, `webhook`, `websocket`
`auth`	`string`	`dual`, `jwt-only`, `public`
`content_type`	`string`	Required request content type
`rate`	`integer`	Route limit, minimum `1`
`requires`	`array[string]`	Capability checks
`health`	`boolean`	Health endpoint designation

jobs:

Field	Type	Allowed Values / Behavior
`id`	`string`	Unique job ID
`location`	`string	null`
`image`	`string	null`
`env`	`object`	Runtime env vars
`flags`	`array[string]`	Additional deploy flags
`targets`	`array[string]`	Callable/publish targets

topics:

Field	Type	Allowed Values / Behavior
`id`	`string`	Unique topic ID
`sub`	`array	null`

Subscriber entry forms:

String form: subscriber ID.
Object form: id, optional mode (push/pull), optional ack_deadline_secs, optional max_delivery_attempts, optional dead_letter_topic.

schedulers:

Field	Type	Allowed Values / Behavior
`id`	`string`	Unique scheduler ID
`cadence`	`object	null`
`cron`	`string	null`
`paused`	`boolean`	Pause state
`targets`	`array[string]`	Trigger targets

Scheduler rule:

Exactly one of cadence or cron must be set.

Cadence fields:

every
unit (minute, hour, day, week, month)
optional at
optional on
optional timezone

Full Example

platform: gcp
project_id: my-project
firebase_project_id: my-project
domain: api.example.com
regions:
  - us-east1

edge:
  load_balancer:
    LB_METHOD: DIRECT
  rate_limiter:
    global_lockout: true
    rate: 60
    window: 10
    cooldown: 300

clients:
  - id: web
    type: web
    api_key: API_KEY_WEB
    add_to_edge: true

gateway:
  type: serverless
  min_instance: 0
  session:
    service_id: session
    stateful: false

authn:
  discovery: https://issuer.example.com/.well-known/openid-configuration
  client_id: my-client-id
  auth_methods: []
  signup_mode: null
  mfa: null

authz:
  capabilities:
    - orders:read
    - orders:write
  roles:
    admin:
      grants:
        - orders:read
        - orders:write

security:
  session_samesite: Lax
  session_ttl_secs: 1209600
  iat_skew_secs: 60
  cors:
    allowed_origins:
      - https://app.example.com
    allowed_headers:
      - Authorization
      - Content-Type
      - x-api-key
    allowed_methods:
      - GET
      - POST
      - PUT
      - PATCH
      - DELETE
    expose_headers: []
    allow_credentials: true
    max_age_seconds: 600

services:
  - id: session
    endpoint: true
    authenticated: true
    routes:
      - path: /session
        method: GET
      - path: /session/login
        method: POST
        auth: jwt-only
      - path: /session/logout
        method: POST
      - path: /session/health
        method: GET
        auth: public
        health: true
    location: /services/session
    env: {}
    flags: []
    targets: []

  - id: orders
    endpoint: true
    authenticated: true
    routes:
      - path: /orders
        method: GET
        requires: [orders:read]
      - path: /orders
        method: POST
        requires: [orders:write]
      - path: ^/orders/[a-zA-Z0-9_-]+$
        method: GET
    location: /services/orders
    env: {}
    flags: []
    targets: [order-events]

jobs:
  - id: nightly-sync
    location: /jobs/nightly-sync
    env: {}
    flags:
      - "--task-timeout=1200s"
    targets: [orders]

topics:
  - id: order-events
    sub:
      - id: orders
        mode: push

schedulers:
  - id: nightly-sync-scheduler
    cron: "0 2 * * *"
    paused: false
    targets: [nightly-sync]

Route Semantics

Path Semantics

Path rules:

Literal paths must start with / and contain no whitespace.
Regex paths are opt-in and must start with ^/ and end with $.

Method rules:

Methods must be valid HTTP verbs supported by the config contract.

Uniqueness rules:

Duplicate route keys are rejected within the same resource for the same method/path kind.
Resource IDs must be unique across major resource groups.
authz.roles.*.grants values must exist in authz.capabilities.

Auth Modes

Route auth mode is declared per route:

dual - JWT and session cookie required.
jwt-only - JWT required, no session cookie.
public - no end-user authentication required.

public routes remain behind gateway perimeter controls and are not direct service ingress endpoints.

Capability Requirements

Where authz is configured, route requires declarations enforce capability checks as part of gateway policy evaluation.

Conditional Requirements (`required_when`)

Route fields follow conditional requirement semantics in the config contract:

path and method are required for each route entry.
auth, content_type, rate, requires, and health are conditional policy modifiers.
Conditional requirement state is represented in schema metadata as required_when.

Runtime Effect Notes

Route matching and auth-mode enforcement occur at the gateway before forwarding.
For protected routes, failed JWT/session checks reject the request before service execution.
For internal calls, services verify short-lived S2S assertions to enforce caller/target trust constraints.

Edge Proxy Reference

Custom Edge Implementations

You can provide your own Edge Proxy implementation, as long as it satisfies the platform contract expected by deployment and runtime flows.

Deployment Assumptions

A compatible Edge Proxy should:

Sit in front of all public API ingress.
Enforce API-key checks for incoming traffic.
Forward requests to the Z-Arch Gateway endpoint.
Preserve request method, path, and required auth headers.
Support forwarding cookies for authenticated browser flows.
Prevent direct bypass patterns where Gateway endpoints are exposed outside intended routing.

Default Edge Implementation

The Z-Arch Edge Proxy Worker operates at the Cloudflare edge as the central routing, access-control, and traffic moderation layer for all /api/* requests. It performs authentication, configurable load-balancing, and includes a built-in per-IP rate-limiting system to protect backend services.

Logic Overview

Parses incoming /api/* request paths
Optionally detects an explicit API version segment (/api/v1/...)
Resolves the client API key:
- If a client segment is present and add_to_edge is enabled, injects the corresponding client API key
- Otherwise expects the client to supply x-api-key directly
Strips the client segment from the path (edge-only concern)
Preserves the API version segment if present
Applies configurable rate limits before forwarding
Determines backend origin using the selected load-balancing mode
Proxies the request to the chosen Z-Arch Gateway

Environment Variables

Variable	Description
`GW_URL`	Default backend gateway URL used in `DIRECT` mode
`API_KEY_<CLIENT>`	Client-specific API keys (e.g. `API_KEY_WEB`, `API_KEY_ADMIN`)
`LB_METHOD`	Load-balancing mode: `DIRECT`, `RANDOM`, `ROUND_ROBIN`, `REGION`, or `PING`
`LB_TARGETS`	Comma-separated gateway URLs used in `RANDOM`, `ROUND_ROBIN`, or `PING` modes
`LB_REGION_MAP`	JSON map of country or colo codes to gateway URLs for `REGION` mode
`LOADBALANCER_KV`	KV binding used for stateful load-balancer operations (round-robin counter, latency cache)
`RATELIMIT_KV`	Optional KV binding for global lockout flags when `global_lockout` is enabled
`RL_CONFIG`	JSON configuration object defining rate-limiting behavior (see below)
`PROJECT_NAME`	Project identifier for metadata/logging (optional)

Rate Limiting

Overview

The Worker includes a built-in per-IP rate limiter that executes before any other logic. In all cases, request counting is in-memory (per isolate / per PoP). Optionally, KV can be used only for global lockout flags.

Mode	Counter Storage	Global Lockout Storage	Description
Local-only (`global_lockout: false`)	Cloudflare isolate memory	None	Fast local burst/cooldown enforcement without any KV reads or writes.
Local + global lockout (`global_lockout: true`)	Cloudflare isolate memory	Cloudflare KV (`RATELIMIT_KV`) flag	Checks a global throttle flag before counting and writes a throttle flag on local limit trip.

Configuration

The rate limiter is controlled via the RL_CONFIG environment variable:

{
  "global_lockout": true,
  "rate": 60,
  "cooldown": 300,
  "window": 10
}

Field	Type	Description
`global_lockout`	boolean	Enable (`true`) or disable (`false`) KV global lockout flags. Counting remains in-memory either way.
`rate`	integer	Requests per time window allowed per IP. Default: 60.
`cooldown`	integer	Duration (seconds) for which an IP is throttled after exceeding limit. Default: 300.
`window`	integer	Size of the rate-limiting window in seconds. Default: 10. Shorter windows (10–30s) make the limiter more responsive to bursts.

Behavior Summary

Each request checks the IP’s usage bucket, grouped by the configured window interval.
Request counting always happens in isolate memory.
If global_lockout is true, the Worker checks RATELIMIT_KV for a throttle flag before local counting.
If local counting trips and global_lockout is true, the Worker writes a KV throttle flag with TTL = cooldown.
If global_lockout is false, no rate-limiter KV reads/writes are performed.
Exceeding the limit returns HTTP 429 Too Many Requests.
All limiter operations are wrapped in try/catch and never block normal execution on error.

Load-Balancing Behavior

If rate limiting passes, traffic is distributed using one of the following strategies:

Mode	Description	Required Variables
DIRECT	Sends all traffic to the primary gateway (`GW_URL`).	`GW_URL`
RANDOM	Randomly selects one origin from `LB_TARGETS`.	`LB_TARGETS`
ROUND_ROBIN	Sequentially rotates through origins, maintaining index in `LOADBALANCER_KV`.	`LB_TARGETS`, `LOADBALANCER_KV`
REGION	Routes requests based on Cloudflare’s `cf-ipcountry` or PoP (`cf.colo`).	`LB_REGION_MAP`
PING	Probes origins to measure latency, caching the fastest per Cloudflare PoP in `LOADBALANCER_KV`.	`LB_TARGETS`, `LOADBALANCER_KV`

Example Configurations

1. Round-Robin Example

GW_URL=https://us-east1-gateway.example.net
API_VERSION=v1
LB_METHOD=ROUND_ROBIN
LB_TARGETS=https://us-east1-gateway.example.net,https://europe-west1-gateway.example.net

2. Region-Based Example

LB_METHOD=REGION
LB_REGION_MAP={"US":"https://us-east1-gateway.example.net","EU":"https://europe-west1-gateway.example.net","DEFAULT":"https://us-east1-gateway.example.net"}

3. Latency-Based Example (PING)

LB_METHOD=PING
LB_TARGETS=https://us-east1-gateway.example.net,https://europe-west1-gateway.example.net,https://asia-southeast1-gateway.example.net

4. Rate Limiter Example (Global lockout enabled)

RL_CONFIG={"global_lockout":true,"rate":60,"cooldown":300,"window":30}

Client API Keys

Each client defined in zarch.yaml has a dedicated API key stored in Secret Manager.

Client API keys are exposed to the edge proxy only when add_to_edge: true is set for that client.

Example:

API_KEY_WEB="key-for-web-client"
API_KEY_ADMIN="key-for-admin-client"

If add_to_edge is disabled for a client, its API key is not exposed to the edge proxy. In that case, the client must supply its API key directly via the x-api-key request header.

Updates take effect immediately via the Cloudflare API—no redeploy required.

Behavior Summary

Request Path	API Key Source	Forwarded Path
`/api/web/session`	Injected (`WEB`)	`/api/session`
`/api/v1/web/session`	Injected (`WEB`)	`/api/v1/session`
`/api/session`	Client-supplied	`/api/session`
`/api/v1/session`	Client-supplied	`/api/v1/session`

Client Segment Semantics

The client path segment exists solely to support edge-side API key injection.

Rules:

A client segment (/api/{client}/...) is required only when the edge proxy is injecting the API key (e.g. add_to_edge: true).
When present, the client segment is stripped before forwarding to the gateway.
The backend never receives or routes on the client segment.
If the edge is not injecting, the client segment must not appear in the path and the API key must be supplied via x-api-key.

Summary

The Z-Arch Edge Proxy provides a secure, scalable, and configurable edge layer that:

Authenticates API requests using global or per-client keys
Rewrites requests with API versioning
Distributes traffic via multiple load-balancing modes (DIRECT, RANDOM, ROUND_ROBIN, REGION, PING)
Enforces per-IP rate limiting (RL_CONFIG) with in-memory counters and optional global KV lockout flags
Allows configurable time windows (window) for precise control over limiter sensitivity
Uses KV bindings (LOADBALANCER_KV, optional RATELIMIT_KV) for efficient, stateful coordination
Operates entirely at the Cloudflare edge for maximum speed, safety, and zero backend coupling

Z-Arch Runtime Library

Z-Arch Runtime Library (zarch) provides the authentication primitives (ZArchAuth) used by services running within the Z-Arch architecture. It exposes a stable Python API for encrypted session cookies, service-to-service trust, and the ZArchExtension interface with lifecycle hooks used during bootstrap and deployment workflows with the Z-Arch CLI.

Quick Start: `ZArchAuth`

Use ZArchAuth in real services to:

run the session service endpoints (/session, /session/login, /session/logout, /session/verify)
let the Z-Arch Gateway own end-user auth verification in normal Z-Arch deployments
add optional session hooks for revocation, backend session control, and /session response enrichment

from zarch import ZArchAuth

auth = ZArchAuth()

# Session service entrypoint.
# In a standard Z-Arch deployment, the gateway validates JWT + session cookie
# before protected traffic reaches your business services.
app = auth.session.start()

Common deployment pattern:

Keep the session service separate from business services.
Let Z-Arch Gateway enforce end-user auth; app services focus on business logic.
Use ZArchAuth.s2s.sign(...) and ZArchAuth.s2s.verify(...) for internal service trust.
Use direct ZArchAuth.session.verify(...) in application code only for custom/non-standard topologies.

Session Mode: Stateless by Default

Session cookies are stateless by default. If no hooks are registered, cryptographic cookie validation is enough and /session/verify defaults to valid after payload checks.

To enable stateful behavior (revocation, server-side deny lists, tenant-specific controls) and enrich the /session response, register these hooks:

on_login(sid, uid, tenant, iat, exp) to persist session state
on_logout(sid, uid, tenant) to revoke state
on_verify(sid, uid, tenant, iat, exp) -> bool to allow/deny each session
on_session(uid, email, tenant, sid, iat, exp, claims) -> dict | None to add or override fields returned by /session

Any dict returned by on_session is merged after the default uid/email/tenant fields, so matching keys override the built-in values.

For on_session, sid, iat, and exp come from the current encrypted session cookie when it is available and valid; otherwise they are None.

Real-world pattern:

hash sid before storage
persist session records on login
mark revoked_at on logout (idempotent)
deny in on_verify when revoked, missing, or expired
decorate /session with app-specific profile metadata when needed

from zarch import ZArchAuth
from google.cloud import firestore
from datetime import datetime, timezone
import hashlib
import time

auth = ZArchAuth()
db = firestore.Client()

def _hash_sid(sid: str) -> str:
    return hashlib.sha256(sid.encode()).hexdigest()

def on_login(sid: str, uid: str, tenant: str | None, iat: int, exp: int) -> None:
    db.collection("zarch_sessions").document(_hash_sid(sid)).set({
        "uid": uid,
        "tenant": tenant,
        "created_at": datetime.fromtimestamp(iat, tz=timezone.utc),
        "expires_at": datetime.fromtimestamp(exp, tz=timezone.utc),
        "revoked_at": None,
    })

def on_logout(sid: str, uid: str, tenant: str | None) -> None:
    db.collection("zarch_sessions").document(_hash_sid(sid)).set({
        "revoked_at": datetime.now(tz=timezone.utc),
    }, merge=True)

def on_verify(sid: str, uid: str, tenant: str | None, iat: int, exp: int) -> bool:
    doc = db.collection("zarch_sessions").document(_hash_sid(sid)).get()
    if not doc.exists:
        return False
    data = doc.to_dict()
    if data.get("revoked_at") is not None:
        return False
    expires_at = data.get("expires_at")
    return bool(expires_at and expires_at.timestamp() >= time.time())

def on_session(
    uid: str,
    email: str | None,
    tenant: str | None,
    sid: str | None,
    iat: int | None,
    exp: int | None,
    claims: dict,
) -> dict | None:
    return {
        "display_name": claims.get("name") or email or uid,
        "has_active_session": bool(sid),
    }

auth.session.register_hook("on_login", on_login)
auth.session.register_hook("on_logout", on_logout)
auth.session.register_hook("on_verify", on_verify)
auth.session.register_hook("on_session", on_session)
app = auth.session.start()

Service-to-Service (S2S) Mechanics

ZArchAuth.s2s exists so internal calls are explicitly authorized at the application-policy layer, not just network-reachable.

When you call auth.s2s.sign(req, target, url=...):

Z-Arch always adds x-zarch-s2s-token: <jwt> (short-lived Ed25519 JWT with iss, aud, iat, exp, typ).
On GCP (ZARCH_PLATFORM=gcp) and when url is provided, it also adds Authorization: Bearer <google-id-token>.

Why both are used in GCP deployments:

The Google ID token is the stronger platform authentication mechanism (Cloud Run/IAM identity boundary).
The Z-Arch token is a service-authorization policy mechanism (enforces caller identity, audience, and Z-Arch trust graph rules).
Using both gives layered control: Google proves caller identity to the platform, Z-Arch enforces project policy at the service layer.

The Z-Arch token can also be used by itself in non-GCP or non-IAM topologies. In that mode, services still get signed caller identity and audience/policy checks without requiring Google-authenticated services.

Caller example (adds both headers on GCP):

import json
import urllib.request
from zarch import ZArchAuth

auth = ZArchAuth()

def call_orders_service() -> dict:
    req = urllib.request.Request(
        "https://orders-abc-uc.a.run.app/internal/create",
        data=json.dumps({"sku": "A-100", "qty": 1}).encode("utf-8"),
        method="POST",
        headers={"Content-Type": "application/json"},
    )

    # Always injects x-zarch-s2s-token.
    # On GCP + url provided, also injects Authorization: Bearer <google-id-token>.
    auth.s2s.sign(req, target="orders", url="https://orders-abc-uc.a.run.app")

    with urllib.request.urlopen(req, timeout=30) as resp:
        return json.loads(resp.read().decode("utf-8"))

Receiver example (verifies Z-Arch policy token):

from flask import Flask, jsonify, request
from zarch import ZArchAuth

app = Flask(__name__)
auth = ZArchAuth()

@app.post("/internal/create")
def create_order():
    # Verifies x-zarch-s2s-token signature, audience, issuer trust, and freshness.
    claims = auth.s2s.verify(request)
    caller_service = claims["iss"]

    # Cloud Run IAM / Google token auth (if enabled) is evaluated by platform/gateway.
    return jsonify({"ok": True, "caller": caller_service}), 200

S2S verification data is deployment-derived and local at runtime:

SERVICE_ID: current service identity.
S2S_PUBLIC_KEYS_JSON: trusted caller public keys by service ID.
S2S_ALLOWED_TARGETS: mint-time policy for where this service may call.

Security Model Summary

Session cookies are encrypted and stateless by default; stateful controls are added explicitly via on_login, on_logout, and on_verify hooks, while /session response enrichment is opt-in via on_session.
In the intended Z-Arch platform flow, gateway/session components handle end-user auth so application services do not need to implement cookie auth logic directly.
ZArchAuth.s2s.sign(...) and ZArchAuth.s2s.verify(...) enforce short-lived signed service-to-service trust with explicit caller/target validation.
Auth helpers fail closed: invalid, expired, tampered, or unauthorized credentials raise errors that should map to 401/403.
Secret material (cookie encryption keys, S2S keys, API credentials) should come from secure secret management and never be hardcoded or logged.

Z-Arch Extensions: Project Context Interface

This document describes the extension-facing interface exposed to extensions through the project_context argument passed into lifecycle hooks. This is the stable API extensions should use. It is intentionally narrow, safe, and versioned by Z-Arch.

If you are authoring an extension, you should only access functionality via project_context (not internal modules).

Quick Start

A minimal extension looks like:

from typing import Any, Dict
from zarch.extensions.base import ZArchExtension

class Extension(ZArchExtension):
    def claim(self, extension_name: str, extension_block: Dict[str, Any]) -> bool:
        return extension_block.get("type") == "example"

    def on_post_deploy(self, project_context, extension_configuration: Dict[str, Any]) -> None:
        project_context.log("Hello from my extension!")

The project_context object is your primary tool. It provides:

project metadata (project ID, region, repo path)
config accessors
safe prompt helpers
GCP helpers (secrets, service URLs, env vars, service accounts)
GitHub and Cloudflare helpers

Lifecycle Hooks (from `ZArchExtension`)

Extensions can implement any subset of these methods. Each hook receives project_context and the extension-specific configuration block.

`claim(extension_name, extension_block) -> bool`

Return True if your extension should handle this extension block in zarch.yaml.

Example

    def claim(self, extension_name: str, extension_block: Dict[str, Any]) -> bool:
        return extension_block.get("type") == "my-extension"

`pre_project_bootstrap(project_context, extension_configuration)`

Runs before initial project bootstrap, but after prompting and repo cloning.

Example

    def pre_project_bootstrap(self, project_context, extension_configuration):
        project_context.log("Preparing custom bootstrap")

`post_project_bootstrap(project_context, extension_configuration)`

Runs after initial project bootstrap.

Example

    def post_project_bootstrap(self, project_context, extension_configuration):
        domain = project_context.config_get("domain")
        project_context.log(f"Project domain is {domain}")

`pre_service_deploy(project_context, extension_configuration)`

Runs before a Cloud Run service is deployed.

Example

    def pre_service_deploy(self, project_context, extension_configuration):
        project_context.log("Preparing service deployment")

`post_service_ensureSA(project_context, extension_configuration)`

Runs immediately after the service runtime service account has been ensured/created.

Example

    def post_service_ensureSA(self, project_context, extension_configuration):
        event = project_context.get_event_data() or {}
        sa = ((event.get("payload") or {}).get("service_account") or {}).get("email")
        project_context.log(f"Service SA ready: {sa}")

`post_service_deploy(project_context, extension_configuration)`

Runs after a Cloud Run service has been deployed.

Example

    def post_service_deploy(self, project_context, extension_configuration):
        project_context.log("Service deployed successfully")

`pre_gateway_deploy(project_context, extension_configuration)`

Runs before the Z-Arch gateway is deployed.

Example

    def pre_gateway_deploy(self, project_context, extension_configuration):
        project_context.log("Preparing gateway deployment")

`post_gateway_ensureSA(project_context, extension_configuration)`

Runs immediately after the gateway service account has been ensured/created.

Example

    def post_gateway_ensureSA(self, project_context, extension_configuration):
        payload = (project_context.get_event_data() or {}).get("payload") or {}
        project_context.log(f"Gateway SA: {payload.get('service_account', {}).get('email')}")

`post_gateway_deploy(project_context, extension_configuration)`

Runs after the Z-Arch gateway has been deployed.

Example

    def post_gateway_deploy(self, project_context, extension_configuration):
        project_context.log("Gateway deployed successfully")

`pre_job_deploy(project_context, extension_configuration)`

Runs before a Cloud Run job is deployed.

Example

    def pre_job_deploy(self, project_context, extension_configuration):
        project_context.log("Preparing job deployment")

`post_job_ensureSA(project_context, extension_configuration)`

Runs immediately after the job runtime service account has been ensured/created.

Example

    def post_job_ensureSA(self, project_context, extension_configuration):
        payload = (project_context.get_event_data() or {}).get("payload") or {}
        project_context.log(f"Job SA: {payload.get('service_account', {}).get('email')}")

`post_job_deploy(project_context, extension_configuration)`

Runs after a Cloud Run job has been deployed.

Example

    def post_job_deploy(self, project_context, extension_configuration):
        job_id = project_context.config_get("jobs[0].id")
        project_context.log(f"Job {job_id} deployed successfully")

`pre_scheduler_deploy(project_context, extension_configuration)`

Runs before a Cloud Scheduler job is deployed.

Example

    def pre_scheduler_deploy(self, project_context, extension_configuration):
        project_context.log("Preparing scheduler deployment")

`post_scheduler_ensureSA(project_context, extension_configuration)`

Runs immediately after the scheduler service account has been ensured/created.

Example

    def post_scheduler_ensureSA(self, project_context, extension_configuration):
        payload = (project_context.get_event_data() or {}).get("payload") or {}
        principal = payload.get("principal", {}).get("id")
        project_context.log(f"Scheduler principal with SA ready: {principal}")

`post_scheduler_deploy(project_context, extension_configuration)`

Runs after a Cloud Scheduler job has been deployed.

Example

    def post_scheduler_deploy(self, project_context, extension_configuration):
        scheduler_id = project_context.config_get("schedulers[0].id")
        project_context.log(f"Scheduler {scheduler_id} deployed successfully")

`pre_topic_deploy(project_context, extension_configuration)`

Runs before a Pub/Sub topic is deployed.

Example

    def pre_topic_deploy(self, project_context, extension_configuration):
        project_context.log("Preparing topic deployment")

`post_topic_deploy(project_context, extension_configuration)`

Runs after a Pub/Sub topic has been deployed.

Example

    def post_topic_deploy(self, project_context, extension_configuration):
        topic_id = project_context.config_get("topics[0].id")
        project_context.log(f"Topic {topic_id} deployed successfully")

Hook Payload Matrix

Lifecycle event payloads are additive schema-v1 summaries. Use .get(...) and tolerate unknown keys.

Hook	Key payload fields (summary)
`pre_project_bootstrap`	`project_id`, `module`, `principal`, `repo`, `create_gcp_project`, `regions`, `domain`, `edge_proxy`, `userbase`
`post_project_bootstrap`	`status`, `repo`, `domain`, `edge_proxy`, `userbase`, `clients`, `control_plane_ready`, `gateway_deployed`
`pre_service_deploy`	`principal`, `resource_type`, `source`, `endpoint`, `authenticated`, `flags`, `routes`, `targets`, `env`, `schema`, `control_plane_args` (wrapper)
`post_service_ensureSA`	`principal`, `service_account`, `resource_type`, `source`, `endpoint`, `authenticated`, `flags`, `targets`, `routes`, `env`, `schema`
`post_service_deploy`	`deployment`, `inbound_callers`, `outbound_targets`, `s2s`, `env`, `endpoint`, `authenticated`, `targets`
`pre_gateway_deploy`	`principal`, `rotate_session_key`, `min_instance`, `auth_profile`, `session`, `trial_mode`, `control_plane_args` (wrapper)
`post_gateway_ensureSA`	`principal`, `service_account`, `rotate_session_key`, `min_instance`, `auth_profile`, `session`, `trial_mode`
`post_gateway_deploy`	`deployment`, `gateway`, `session`, `s2s`, `env`
`pre_job_deploy`	`principal`, `source`, `flags`, `targets`, `env`, `control_plane_args` (wrapper)
`post_job_ensureSA`	`principal`, `service_account`, `source`, `targets`, `flags`, `env`
`post_job_deploy`	`deployment`, `targets`, `target_summary`, `s2s`, `env`
`pre_scheduler_deploy`	`principal`, `schedule_mode`, `schedule`, `timezone`, `paused`, `targets`, `target_count`
`post_scheduler_ensureSA`	`principal`, `service_account`, `schedule_mode`, `schedule`, `timezone`, `paused`, `targets`, `target_count`
`post_scheduler_deploy`	`service_account`, `schedule_mode`, `schedule`, `timezone`, `paused`, `targets`, `target_summary`, `created_scheduler_job_ids`
`pre_topic_deploy`	`principal`, `subscribers`, `subscriber_ids`, `subscriber_count`, `publisher_candidates`, `publisher_candidate_count`
`post_topic_deploy`	`principal`, `subscribers`, `subscriber_ids`, `subscriber_count`, `publishers`, `publisher_ids`, `publisher_count`

All payloads avoid secret values (for example: env var values, session keys, gateway URL/suffix secrets).

Manual Hook Triggering

Use the CLI to manually dispatch lifecycle hooks for configured extensions:

zarch ext trigger pre_service_deploy
zarch ext trigger post_service_ensureSA --extension my-extension
zarch ext trigger post_service_deploy --extension my-extension
zarch ext trigger post_gateway_deploy --extension audit --extension cache

hook_name must be one of the lifecycle hooks defined by ZArchExtension.
--extension is optional and can be repeated. Values must match extension block names under extensions: in zarch.yaml.
Without --extension, all configured extension blocks are considered, and only installed extensions that claim those blocks are invoked.
Dispatch follows the normal hook execution policy (local vs remote) used by live deployments.
Manual dispatches include minimal event metadata where source is "manual" and payload.extension_names lists any explicitly selected extension blocks.

`project_context` Interface

The sections below describe all available attributes and methods exposed to extensions. Use them as the primary API surface.

Core Attributes

These attributes represent the current project state in a safe, read-only form.

project_context.id (str)

The active GCP project ID.
Example: "my-gcp-project"

Example

project_id = project_context.id
project_context.log(f"Deploying project {project_id}")

project_context.region (str)
- The active region for this deployment run.
- Example: "us-east1"
Example
```
region = project_context.region
project_context.log(f"Active region: {region}")
```
project_context.project_root_path (pathlib.Path)
- Absolute path to the project root directory.
Example
```
root = project_context.project_root_path
project_context.log(f"Root path: {root}")
```

project_context.non_interactive (bool)

True if Z-Arch is running in non-interactive mode.

Example

if project_context.non_interactive:
    project_context.log("Running non-interactively")

project_context.config (zarch_cli.helpers.config.Config)
- The loaded Z-Arch config object.
- Most extensions should use the config_get, config_set, and config_save helpers instead of accessing config directly.
Example
```
cfg = project_context.config
project_context.log(f"Config loaded from: {cfg.root}")
```

Event Metadata

`get_event_data() -> dict[str, Any] | None`

Read optional metadata for the lifecycle hook currently being dispatched.

This may be None when metadata is unavailable.
Keys are additive and may grow over time; extensions should tolerate unknown keys.
Known envelope fields include:
- schema_version (integer)
- source ("live" or "manual")
- hook (hook name)
- timestamp (UTC ISO-8601)
- resource (e.g. kind/id/region)
- payload (hook-specific details, may be empty)

Example

event = project_context.get_event_data() or {}
payload = event.get("payload") or {}
principal = payload.get("principal") or {}
service_account = payload.get("service_account") or {}

project_context.log(
    f"Hook={event.get('hook')} principal={principal.get('kind')}:{principal.get('id')} "
    f"sa={service_account.get('email')}"
)

Logging

`log(message: str, level: str | None = None) -> None`

Write a styled message to the Z-Arch console.

level is optional and used only to tag the message (e.g. “INFO”, “WARN”).

Example

project_context.log("Preparing extension steps", level="info")

Command Execution

`run_command(command_parts: list[str]) -> tuple[str, int]`

Run a local shell command. Returns (stdout, exit_code).

Example

out, code = project_context.run_command(["echo", "hello"])
if code == 0:
    project_context.log(out.strip())

`gcloud(command_parts: list[str]) -> tuple[str, int]`

Run a gcloud command using the embedded or system gcloud binary. Returns (stdout, exit_code).

Example

out, code = project_context.gcloud(["projects", "list", "--format=value(projectId)"])
if code == 0:
    project_context.log("Projects:\n" + out)

Config Access

`config_get(key: str, default: Any = None) -> Any`

Fetch a config value using dotted path notation.

Example

domain = project_context.config_get("domain", "")
project_context.log(f"Domain: {domain}")

`config_set(key: str, value: Any) -> None`

Set a config value in memory (does not write to disk).

Example

project_context.config_set("gateway.session.stateful", False)

`config_save() -> None`

Persist config changes to zarch.yaml.

Example

project_context.config_set("gateway.session.stateful", False)
project_context.config_save()

Prompts

These are safe wrappers around Z-Arch’s prompt system.

`ask(message: str, default: str | None = None, required: bool = True, validate: Callable | None = None) -> str`

Prompt the user for a string value.

Example

name = project_context.ask("What is the service name?", default="session")

`choice(message: str, choices: list[str], default: str | None = None, sub_prompt: str = "") -> str`

Prompt the user to select a single option.

Example

region = project_context.choice("Select region", ["us-east1", "us-west1"], default="us-east1")

`multichoice(message: str, choices: list[str], default: list[str] | None = None, sub_prompt: str = "(space to toggle, enter to confirm)") -> list[str]`

Prompt the user to select multiple options.

Example

features = project_context.multichoice("Enable features", ["cdn", "auth", "logging"])

`yes_no(message: str, default: bool = True, sub_prompt: str = "") -> bool`

Prompt the user for a yes/no response.

Example

confirm = project_context.yes_no("Proceed with cleanup?", default=False)

`review_and_confirm() -> None`

Render the config and ask the user to confirm. Useful before sensitive operations.

Example

project_context.review_and_confirm()

GCP Helpers

These helpers wrap common GCP operations and automatically use the project context’s id and region where applicable.

`ensure_service_account(service_account_name: str, **kwargs) -> str`

Ensure a service account exists, creating it if missing.

service_account_name can be either a full email (name@project.iam.gserviceaccount.com) or just the short name.
Optional kwargs:
- project_id (str) override the current project ID
- display_name (str) override the display name

Example

sa = project_context.ensure_service_account("zarch-ext")
project_context.log(f"Service account: {sa}")

`secret_exists(secret_name: str) -> bool`

Check if a Secret Manager secret exists in the current project.

Example

if not project_context.secret_exists("my-secret"):
    project_context.log("Secret does not exist")

`store_secret(secret_name: str, secret_value: str) -> None`

Create or update a Secret Manager secret with a new version.

Example

project_context.store_secret("my-secret", "super-secure-token")

`get_secret(secret_name: str) -> str`

Fetch the latest version of a Secret Manager secret.

Example

token = project_context.get_secret("my-secret")

`get_service_url(service_name: str) -> str`

Fetch the Cloud Run service URL for a named service in the current region.

Example

session_url = project_context.get_service_url("session")
project_context.log(f"Session URL: {session_url}")

`get_env_var(service_name: str, env_var_key: str) -> str`

Read a specific environment variable from a deployed service or function.

Example

public_key = project_context.get_env_var("zarch-gateway", "S2S_PUBLIC_KEY")

`set_env_vars(service_name: str, env_vars: dict[str, str]) -> None`

Set or update environment variables on a deployed service or function.

Example

project_context.set_env_vars("session", {"SESSION_TTL": "1209600"})

GitHub

`github()`

Return an authenticated GitHub client (PyGitHub-style client used internally by Z-Arch).

Example

gh = project_context.github()
user = gh.get_user()
project_context.log(f"GitHub user: {user.login}")

`get_connected_repo() -> tuple[str, str]`

Return the connected repo fullname and branch as ("owner/repo", "branch").

Example

repo, branch = project_context.get_connected_repo()
project_context.log(f"Connected repo: {repo} ({branch})")

Cloudflare

These helpers manage Cloudflare workers and pages as used by Z-Arch.

`update_edge_proxy(project_name: str | None = None) -> None`

Update the edge proxy worker for the project.

If project_name is omitted, it is inferred from the connected repo name.

Example

project_context.update_edge_proxy()

`set_edge_proxy_envs(env_vars: dict[str, str], project_name: str | None = None) -> bool`

Set environment variables on the edge proxy worker.

Returns True on success, False on failure.

Example

ok = project_context.set_edge_proxy_envs({"API_VERSION": "v1"})
if not ok:
    project_context.log("Failed to update edge envs", level="warn")

`deploy_cf_worker(script_name: str, repo_root_dir: str, repo_full: str | None = None, branch: str | None = None, domain: str | None = None) -> None`

Deploy a Cloudflare Worker from the connected repo.

script_name: Worker script identifier
repo_root_dir: Root path in the repo to deploy
repo_full: Optional owner/repo override
branch: Optional branch override
domain: Optional custom domain

Example

project_context.deploy_cf_worker(
    script_name="my-worker",
    repo_root_dir="services/edge",
    branch="main",
)

`set_worker_route(script_name: str, domain: str, route: str = "/api/*") -> None`

Attach a route to a worker script.

Example

project_context.set_worker_route("my-worker", "example.com", "/api/*")

`deploy_cf_pages(domain: str, project_name: str | None = None, repo_full: str | None = None, branch: str | None = None) -> None`

Deploy a Cloudflare Pages project from the connected repo.

Example

project_context.deploy_cf_pages("example.com")

End-to-End Example

A realistic extension that uses multiple helpers:

from typing import Any, Dict
from zarch.extensions.base import ZArchExtension

class Extension(ZArchExtension):
    def claim(self, extension_name: str, extension_block: Dict[str, Any]) -> bool:
        return extension_block.get("type") == "my-ext"

    def post_service_deploy(self, project_context, extension_configuration: Dict[str, Any]) -> None:
        project_context.log("Post-deploy hook starting")

        # Read config
        domain = project_context.config_get("domain", "")
        if not domain:
            project_context.log("No domain configured", level="warn")
            return

        # Ensure a secret exists
        if not project_context.secret_exists("edge-api-key"):
            project_context.store_secret("edge-api-key", "replace-me")

        # Update edge proxy envs
        project_context.set_edge_proxy_envs({"API_VERSION": "v1"})

        # Deploy pages site
        project_context.deploy_cf_pages(domain)

        project_context.log("Post-deploy hook complete")

zarch.yaml

    extensions:
      {extension_name}:
        type: "{extension_name}"
        required_roles: []
        config:
          example_key: example_value

Add each extention to zarch.yaml or it will not run even if it is installed. The extension block is a dictionary of objects keyed by each extension’s name. type is the extension’s name. Include all GCP IAM roles that are required by the service account that will run the extension in the required_roles list. Values in config: are available to the extension code at runtime.

Notes and Best Practices

Prefer config_get/config_set over accessing project_context.config directly.
Use log() for all extension output to stay consistent with Z-Arch UX.
Avoid raw shell calls unless absolutely necessary; use provided helpers first.
Never log secrets or gateway URL suffixes.

If you need additional helpers, consider filing a request rather than importing internal modules directly.

Z-Arch Platform Documentation

Table of Contents

I - Understanding Z-Arch

Introduction

Why Z-Arch exists

In practice this means:

The Z-Arch Philosophy

Platform Architecture

Z-Arch Components

Control Plane

Gateway

Edge Proxy

Extension Runner

Where Your Code Runs

Services

Jobs

Topics and Schedulers

Request Flow

Gateway Enforcement Model

Routing and Request Matching

JWT Verification with OIDC

Session Enforcement and Dual Requirement

Route Auth Modes

CORS and Security Headers

Request Validation and Rate Controls

Internal Service Trust Propagation

What You Still Implement in Your Services

Authentication and Access Model

OIDC Model

JWT Validation and Session Binding

Client API Keys

Capability Enforcement Status

Perimeter vs Domain Authorization

Serverless Primitives

Services

Jobs

Topics

Schedulers

Targets and Trust Edges

II - Operating Guides

Getting Started

Project Bootstrap

Existing Repository Bootstrap

New Project from Any Template

Sharing Templates

Deployment Model

Deployment Model

Anti-Pattern

Convergence Principle

Devbox

Overview

Why Ephemeral Environments Matter

Quick Start

Prerequisites

First Login: Identity and Authorization (Recommended Standard)

Why this matters

Run this once per new devbox

Verify

Default Tooling Baseline

Platform baseline tools

Ubuntu base image utilities

Validation checklist (recommended)

Command Reference

Custom Provisioning with an Appended Startup Script

How it works

Recommended usage pattern

Example flow

Important notes

Naming and Access Conventions

Ephemeral Operations Model

Standard policy

Reset runbook (recommended for broken environments)

Operational Runbooks

Daily startup / shutdown

Environment recovery (nuke and recreate)

Developer offboarding

Incident triage flow

Team Operating Model

Security and Identity Notes

Troubleshooting

`set`

`new`

`deploy`

`devbox`

`client`

`connect` and `disconnect`

`ext`

`mcp`

`update`

`register`

Conditional Requirements (`required_when`)

Quick Start: `ZArchAuth`

Lifecycle Hooks (from `ZArchExtension`)

`claim(extension_name, extension_block) -> bool`

`pre_project_bootstrap(project_context, extension_configuration)`

`post_project_bootstrap(project_context, extension_configuration)`

`pre_service_deploy(project_context, extension_configuration)`

`post_service_ensureSA(project_context, extension_configuration)`

`post_service_deploy(project_context, extension_configuration)`

`pre_gateway_deploy(project_context, extension_configuration)`