#005 Replacing Authentik with Zitadel

#005 Replacing Authentik with Zitadel

As I want to build a multi-tenant ERP-like application and get more clarity on my own Auth requirements, I keep crosing architectural crossroads. For a long time, Authentik was my go-to choice. It’s powerful, flexible, and feature-rich. However, as my application got more well-defined I found myself looking for something designed from the ground up for multi-tenancy.


Why ZITADEL? The Multi-Tenant Advantage

Authentik is a fantastic general-purpose IAM. You can build almost anything with its "Flows" and "Stages," but that flexibility comes with a management overhead, especially when you start adding dozens (or hundreds) of independent customers (tenants).

My application will be effectively an ERP for businesses. Each customer needs their own isolated environment, their own users, and eventually, the ability to bring their own Identity Provider (IdP) for SSO.

ZITADEL won me over for three reasons:

  1. Virtual Instances (Isolation by Default): ZITADEL is built on the concept of "Virtual Instances." Each tenant feels like they have their own completely separate IAM server, but it’s all managed by a single backend.
  2. API-First Design: Everything in ZITADEL is built to be automated via API or gRPC. For an ERP where tenant onboarding should be seamless, this is a must.
  3. Human-Centric Permissions: The way ZITADEL handles organizations, projects, and roles mapped more naturally to my application's structure than Authentik’s more generic grouping system.

The Infrastructure Challenge: Scaling Down to Scale Up

Most "Production-Ready" guides for ZITADEL assume you have a Kubernetes cluster or at least a beefy VPS. I’m running a leaner operation on an 8GB RAM VPS, alongside other live apps like Ghost and custom Data Managers, all routed through Traefik.

Moving ZITADEL into a Docker container in this environment presented several unique challenges.

1. The Resource Tightrope

ZITADEL is a Go application, which is efficient, but its initialization and database migrations are intensive. I quickly learned that starting multiple Postgres instances and a ZITADEL setup process simultaneously would spike CPU and RAM, occasionally killing my SSH connection mid-deployment.

The Fix: I had to optimize the Postgres containers specifically for a smaller footprint:


command = [

"postgres",

"-c", "shared_buffers=128MB",

"-c", "max_connections=20"

]

By limiting the memory buffers and connection pool early, the VPS remained stable enough to complete the setup.

2. The gRPC/HTTP2 Multiplexing Puzzle

ZITADEL uses a single port (8080) for everything: the UI, the REST API, and the gRPC API. Traefik is great at handling web traffic, but it needs a nudge to handle this kind of multiplexing correctly without certificates on the backend.

The magic label that solved my connectivity issues was:

traefik.http.services.zitadel.loadbalancer.server.scheme=h2c

This tells Traefik to use HTTP/2 Cleartext to talk to ZITADEL, which is essential for the gRPC gateway to function behind the proxy.


The "Not Found" Mystery: Fighting the Documentation

The most frustrating part of the migration was hitting a cryptic {"code":5,"message":"Not Found"} error every time I tried to access my auth portal at https://auth.qcomb.com/ui/console/.

ZITADEL was running, the logs said it was healthy, but the UI was gone.

After some deep-diving into GitHub issues, I discovered a major change in ZITADEL v4: The Login UI was split into a separate Next.js application called "Login V2," and it’s enabled by default. Thanks a million to this Github ticket.

Because the official documentation hadn't quite caught up for non-Docker-Compose deployments, my standalone ZITADEL container was trying to redirect users to a Login V2 service that didn't exist in my stack.

The Solution? Revert to Login V1. By setting ZITADEL_DEFAULTINSTANCE_FEATURES_LOGINV2_REQUIRED=false and re-initializing the database, ZITADEL fell back to its stable, built-in Login V1 UI. Instant success.


Lessons Learned

Rebuilding your identity stack isn't easy, but it’s worth it if the tool fits your business model. Here are my top takeaways from this migration:

  • Check the Version Defaults: Major version bumps (like v3 to v4) often change architectural assumptions. If you see a "Not Found" error on a healthy service, check if the UI has been decoupled.
  • Targeted Applies are Life-Savers: When your VPS is resource-constrained, don't let Terraform try to refresh every live container at once. Use -target to deploy your changes incrementally.

ZITADEL is now the core of my tenant management system, and despite the "Not Found" detours, the result is a much cleaner, more scalable auth flow for my customers.