How I shipped a multi-model AI pipeline, a freemium billing engine, and a self-hosted CI/CD stack from scratch.
The Problem That Started It All
More than 75% of resumes are rejected by Applicant Tracking Systems before a human recruiter ever reads them. This isn't because candidates lack the right experience. It's because their resumes don't mirror the exact vocabulary, keyword density, and structure that ATS software scores against. A qualified candidate applying for a "Senior Operations Manager" role with a resume that calls their work "managing daily logistics" instead of "directing cross-functional operations" can get filtered out automatically, even though the underlying experience is identical.
The common solution is to manually rewrite your resume for every job you apply to. That takes anywhere from 30 to 60 minutes per application. For someone applying to 10 or 15 roles a week, that math is brutal. And most people skip it, submitting a generic resume and hoping for the best.
I built Orbinix to close that gap. The idea is straightforward: upload your complete, honest work history as a master CV, paste a job description, and let the platform intelligently rewrite, reorder, and optimize your resume to match. What made it technically interesting is the constraint I set for myself from the start. The system must never fabricate experience. Every output bullet has to trace back to something the user actually provided. That single requirement shaped every design decision in the project.
What Orbinix Does
The core user flow looks like this:
Upload Master CV -> Add Job Description -> AI Tailors -> Interactive Review -> Export ATS-ready PDF
Users upload a PDF or DOCX master CV, which gets parsed server-side and stored. They paste a job description into a text field. They click "Tailor My CV" and the platform analyzes the job description, identifies hard skills, soft skills, required qualifications, industry terminology, and tone. It then rewrites and reorders the CV content to emphasize the most relevant experience using the employer's own vocabulary.
After tailoring, the user sees a side-by-side editor with the original and rewritten versions. They can accept, reject, or manually edit any bullet. They can reorder sections, re-tailor from scratch, or use the gap analysis feature to surgically apply missing skills. When they're satisfied, they export an ATS-friendly PDF in seconds.
The platform launched with four tiers: Free (3 tailors per month), Pay-as-you-go (credits-based), Pro (unlimited), and Career (enterprise-facing). Billing is handled through Stripe subscriptions and a credit ledger system I built from scratch.
Architecture at a Glance
The stack is a full-stack TypeScript monorepo built on Next.js 15 with the App Router. Here's how the layers fit together:
- Frontend: Next.js 15, React 19, Tailwind CSS v4, shadcn/ui, Radix UI primitives
- API Layer: Next.js Route Handlers running as serverless-compatible Node.js functions
- AI Tier: Four distinct model integrations across three providers (more on this below)
- Database: Supabase PostgreSQL with Row-Level Security policies on every table
- Auth: NextAuth v5 for session management and middleware, with Supabase Auth handling OAuth and JWT claims for storage RLS
- File Storage: Supabase Storage for uploaded CVs and generated PDFs
- Billing: Stripe subscriptions, webhook handlers, and a custom credit ledger
- CMS: Contentful for the blog and marketing content, consumed via the Contentful Delivery API
- Infra: GitHub Actions, rsync, PM2, and Nginx on a self-hosted Azure VM
- Automation: A separate Docker Compose stack running n8n, SearXNG, and Crawl4AI
The architecture is deliberate about where to run things. Long-running AI requests that can take 20 to 40 seconds cannot live on Vercel's free tier, which enforces a hard function timeout. Hosting on a self-managed VM removes that constraint entirely and keeps infrastructure costs predictable.
Deep Dive 1: The Multi-Model AI Tailoring Pipeline
The hardest engineering work in this project was building an AI pipeline that produces reliable, structured output from a fundamentally non-deterministic system.
The tailoring endpoint lives in src/app/api/applications/[id]/tailor/route.ts, which grew to 1,219 lines. It accepts a model parameter that routes the request to one of three model tiers:
- Standard: Azure-hosted Kimi K2.5, accessed via an OpenAI-compatible endpoint. Good quality at low cost, used for most requests.
- Premium: Anthropic Claude Sonnet, accessed through the Anthropic Messages API with tool-use output. Better at nuanced writing and complex sections.
- GLM-5: An experimental third integration available for testing.
Each model gets the same carefully engineered system prompt, which enforces the core constraints:
RULES:
- NEVER fabricate or invent experience, skills, projects, or education
- Preserve all structural headings exactly as they appear
- Use keywords from the job description where they genuinely apply
- PUNCTUATION RULE: Do NOT use unicode em dash or en dash. Use plain ASCII hyphen-minus only.The system prompt also specifies a strict JSON output schema with a tailored_sections array, where each section includes original_bullets, tailored_bullets, and line_types arrays that must all have the same length. This index-aligned structure is what allows the frontend to render a precise diff between the original and tailored versions.
The problem is that language models don't always return clean, schema-compliant JSON. In practice, a model might merge a Summary section and a Core Competencies section into one blob, return headings as bullet items, or produce line_types arrays of the wrong length. Trusting the raw output would corrupt the database and break the UI.
So I built a multi-stage normalization layer that runs after every AI response. The postProcessTailoredSections function does the following:
- Deduplicates sections by canonicalizing their titles (so "Professional Experience" and "Work Experience" both map to a single section).
- Detects when a summary block contains embedded competency content and splits it into separate sections automatically.
- Applies a
inferLineTypefunction that determines whether each line should render as a heading, bullet, or paragraph, using a combination of section name, pipe-delimiter heuristics, and date-range regex patterns. - Normalizes contact line delimiters to consistent pipe-separated formatting.
- Pads or truncates
original_bullets,tailored_bullets, andline_typesarrays to equal length before any of them touch the database.
The database write also uses an optimistic retry pattern. Because the tailored_cvs table has a unique constraint on (application_id, version), concurrent requests can race to write version 1. The insert function retries once on a 23505 uniqueness violation, re-fetching the latest version number before retrying.
The result is a pipeline that produces the same well-shaped data structure regardless of which model generated it or how messy the raw output was.
Deep Dive 2: A Freemium Credit Ledger Built Like a Bank Account
Billing for an AI product is not a simple row counter. Every call to an LLM costs real money, and the usage tracking layer needs to be defensively correct. I built Orbinix's access control and billing system with that in mind.
The entry point is assertCanTailor in src/lib/account-access.ts. Before any tailoring request touches an AI model, this function runs a usage check. It reads the user's profile row, which stores their account tier (free, pay_as_you_go, pro, or career), their subscription status, and their credit balance.
The interesting part is how it calculates usage. Rather than relying solely on the tailor_credits_used_this_month column in the profiles table, it independently counts the number of successful rows in the llm_logs table for that user in the current calendar month. Then it takes the maximum of the two values.
tailorsUsedThisMonth = Math.max(
tailorsUsedThisMonth,
llmSuccessCountThisMonth
);This dual-verification pattern protects against a specific failure mode. If the profile counter fails to update due to a database error after a successful tailor run, the llm_logs count acts as an independent floor. The user can't accidentally get free extra usage because of a write failure.
When a tailor request succeeds, recordSuccessfulTailorUsage runs atomically. For pay-as-you-go users it decrements the tailor_credits_balance column and writes an append-only entry to the credit_ledger table. The ledger records the delta, the reason, and the account tier at the time of the transaction. This structure supports retroactive auditing, dispute resolution, and usage analytics without touching destructive updates.
Stripe handles the purchasing side. When a user pays for a credit pack or subscribes to Pro, Stripe sends a webhook to /api/billing/webhook. The handler verifies the signature, identifies the event type, and calls billing-sync.ts to synchronize the subscription status and replenish the credit balance. Credits are never simply set to a value. They're calculated by adding the purchased amount to the existing balance, which means concurrent purchases compose correctly instead of overwriting each other.
The function that gates access returns a typed TailorAccessDecision object with a machine-readable code field:
return {
allowed: false,
statusCode: 403,
code: "FREE_TIER_LIMIT_REACHED",
reason: "Free includes 3 CV tailors per month. Buy credits or upgrade to Pro.",
snapshot,
};The frontend reads the code field to decide exactly which upgrade prompt to show. This is a much cleaner pattern than parsing error message strings.
Deep Dive 3: Targeted Gap Rewrite with the Right Model for the Job
One of the more product-forward features I built is the gap analysis rewrite. After the initial tailoring, the JD analysis surfaces skills and requirements from the job description that aren't addressed anywhere in the user's CV. These are "gaps." The question is: what should a user do with them?
A brute-force approach would be to re-tailor the entire CV with instructions to address the gaps. But that's expensive, slow, and replaces edits the user already accepted. Instead, I built a targeted endpoint at /api/applications/:id/gap-rewrite that takes a list of selected gap skills and the current state of the CV sections, then asks the AI to rewrite specific existing bullets to more naturally reflect those skills, without creating new content or altering unrelated bullets.
The response from the AI is a list of precise edit objects:
type RewriteEdit = {
section_index: number;
bullet_index: number;
skill: string;
rewritten_text: string;
reason?: string;
};The frontend receives these edits and applies them surgically to the current editor state. Only the targeted bullets change. Everything else the user has already reviewed and accepted stays intact.
For this endpoint I chose the lightweight Azure OpenAI o3-mini deployment rather than the heavier models used for full tailoring. Speed matters here because users are actively waiting in the editor, and the targeted task doesn't require the same level of nuanced writing ability. The code also handles a quirk in the o3 model family: these deployments reject the temperature parameter that other models accept. I handle that with a runtime regex check on the deployment name:
if (!/^o\d/i.test(standardDeployment)) {
rewriteRequest.temperature = 0.25;
}It's a small thing but it matters in production. Sending temperature to an o3 endpoint causes the request to fail entirely.
Every AI call, whether it succeeds or fails, writes a row to the llm_logs table with the model name, status, prompt token count, and completion token count. This gives a live cost-monitoring feed without requiring any third-party observability tool.
Deep Dive 4: Self-Hosted CI/CD Without Vercel or Docker
Most portfolio projects end at the code. Deployment is an afterthought, or it's a one-click Vercel deploy that hides the ops complexity entirely. For Orbinix, I built a real deployment pipeline on a self-hosted Azure VM because I needed it, not just to demonstrate it.
The pipeline has two workflows in .github/workflows/:
ci.ymlruns on every pull request. It installs dependencies, runs ESLint, and runsnext build. If any of these fail, the PR is blocked.cd-self-hosted.ymlruns on every merge tomain. It connects to the Azure VM over SSH, rsyncs the application source (with.env*files explicitly excluded from the transfer), then runsnpm ci,npm run build, andpm2 startOrReload.
The secret isolation is intentional. Environment variables, including Stripe keys, AI API keys, and database credentials, are never transferred through the wire by the deployment process. They live only in .env.production on the server, maintained by hand. PM2 starts the application through scripts/pm2-start.sh, which explicitly sources that file before starting next start. This means even if GitHub Actions credentials were somehow compromised, the attacker would have no production secrets to retrieve from them.
Nginx sits in front of the Node process on port 3000, handling TLS termination and forwarding the correct X-Forwarded-* headers so the application sees real client IPs.
Rollback is straightforward because the deployment model is simple. Reverting a bad deploy means reverting the Git commit, pushing to main, and letting the CD workflow redeploy the previous version. There's no Docker image registry, no Kubernetes rollout to manage. The approach trades some sophistication for operational transparency.
Content Layer: Headless CMS with Contentful
One of the less obvious complexity layers in Orbinix is how much of the product surface is driven by a headless CMS rather than hardcoded React components. The entire marketing site and blog are powered by Contentful, accessed via the Contentful Delivery API on the server side and the Preview API in draft mode.
The homepage alone fetches from ten separate content types in parallel on every request:
const [hero, logos, featureSection, featureTabs, steps,
testimonials, pricingPlans, cta, seo, homepageConfig] =
await Promise.all([
getHeroSection(preview),
getLogoBarItems(preview),
getFeatureSection(preview),
getFeatureTabs(preview),
getHowItWorksSteps(preview),
getTestimonials(preview),
getPricingPlans(preview),
getCtaSection(preview),
getSeoMeta(preview),
getHomepageConfig(preview),
]);This means the hero headline, feature tab copy, testimonial quotes, pricing plan details, and CTA text can all be updated by a non-developer without touching any code or triggering a deploy. Next.js ISR (Incremental Static Regeneration) with a one-hour revalidation window keeps the page fast while staying fresh.
The blog system is more involved. Each article entry in Contentful resolves linked Author and Category entries at depth 2, pulls a featured image from Contentful's Media library, and carries an optional linked SEO meta entry for custom Open Graph tags per post. The body field is a Contentful Rich Text Document, rendered through a custom RichTextRenderer component that maps each node type to styled React elements.
The blog page also generates structured JSON-LD schema at the article level, including TechArticle, BreadcrumbList, and FAQPage types derived from the article body content. This was a deliberate SEO investment. Search engines reward structured data with rich results, and wiring it automatically from the CMS means every article gets it without any manual markup work.
Draft preview mode is wired through Next.js draftMode(). When enabled, the Contentful client switches its host from cdn.contentful.com to preview.contentful.com and uses the Preview API token instead of the Delivery token, returning unpublished drafts. This gives content editors a live preview URL to review articles before publishing without any additional tooling.
The FAQ page uses the same pattern, fetching items from a faqItem content type filterable by category and sorted by an explicit order field. All of this is strongly typed end to end, with TypeScript interfaces defined for every Contentful content model.
The practical benefit is separation of concerns. The engineering layer handles rendering, performance, and data safety. The content layer handles copy, imagery, and SEO metadata. Neither has to wait on the other.
Key Decisions and Trade-offs
Every significant technical decision in Orbinix was made deliberately. Here's a summary of the most important ones:
| Decision | Rationale |
|---|---|
| Multi-model routing (standard, premium, cover letter) | Different features have different quality and cost requirements. A lightweight model on the gap rewrite is faster and cheaper. Claude Sonnet on premium tailoring produces noticeably better prose. |
| Supabase over self-managed Postgres | Row-Level Security policies automatically enforce data isolation at the database layer. Every user can only query their own rows without application-level checks on every query. |
| Self-hosted over Vercel | AI tailoring can take 25 to 40 seconds. Vercel's hobby tier enforces a 10-second limit. Self-hosting removes that constraint entirely and gives cost predictability. |
| Credit ledger as append-only log | Audit trail for disputes, refunds, and fraud detection. The same pattern financial systems use because it's correct. |
| Custom AI output normalization | Language models produce non-deterministic output. Trusting raw output and writing it directly to the database would produce corrupted records. Validate first, always. |
| NextAuth v5 + Supabase Auth dual-layer | NextAuth manages server-side session cookies and middleware. Supabase Auth issues JWT tokens used to enforce RLS on the database and storage. Each layer handles what it's best at. |
The Tech Stack
| Layer | Technology |
|---|---|
| Frontend | Next.js 15, React 19, Tailwind CSS v4, shadcn/ui, Radix UI |
| Language | TypeScript 5 |
| AI Models | Kimi K2.5 (Azure), Claude Sonnet (Anthropic), Llama 4 Maverick (Azure), o3-mini (Azure) |
| Auth | NextAuth v5, Supabase Auth, Google OAuth |
| Database | Supabase PostgreSQL with Row-Level Security |
| Storage | Supabase Storage |
| Billing | Stripe (subscriptions, webhooks, credit packs) |
| PDF Export | @react-pdf/renderer |
| CV Parsing | unpdf (PDF), mammoth (DOCX) |
| CMS | Contentful |
| CI/CD | GitHub Actions, rsync, PM2, Nginx |
| Infrastructure | Azure VM, Cloudflare DNS |
| Automation | n8n, SearXNG, Crawl4AI on Docker Compose |
What Shipped
Here's an honest accounting of where the project stands:
- All six MVP phases shipped: Foundation, Core Ingestion, AI Engine, Interactive Editor, PDF Export, and Polish
- Full freemium billing pipeline is live, including Stripe webhooks, subscription sync, and credit pack purchases
- Four AI model integrations across four distinct use cases (tailoring, premium tailoring, gap rewrite, cover letter)
- Self-hosted CI/CD with secret isolation and a documented rollback procedure
- Contentful-powered blog and marketing content layer
- Cover letter generation, which was originally a post-MVP item, shipped during the polish phase
Still in progress: word-level inline diff highlighting in the editor, dynamic match score recalculation as users make edits, and a full unit and integration test suite. These are the next items on the roadmap.
What I'd Do Differently
The biggest thing I'd change is building the test suite earlier. Unit tests for the CV parsing logic and integration tests for the AI pipeline would have caught a handful of normalization edge cases before they reached the main branch. I treated testing as a polish-phase task and paid for that with some late debugging sessions.
I'd also think harder about the PDF export architecture earlier in the project. The @react-pdf/renderer approach works well for ATS-friendly single-column documents, but it has real limitations around dynamic font loading and complex layout flexibility. If I were starting over, I'd evaluate a headless Chrome approach from the beginning rather than switching mid-build.
On the prompt engineering side, I'd invest more time upfront in a structured evaluation framework. The current system prompt for CV tailoring was refined through iteration and manual testing. A more systematic approach, comparing outputs across a diverse set of CV types before locking in the prompt, would have produced a more robust result faster.
Those lessons aside, Orbinix is the project I'm most proud of because it forced me to think across the full stack at once. The AI layer alone wasn't interesting. The billing alone wasn't interesting. The CI/CD setup alone wasn't interesting. What made it a good engineering challenge was making all of those layers work together reliably, in production, with real money and real AI costs on the line.
Orbinix is actively developed. The codebase, architecture decisions, and prompt engineering approach all reflect production trade-offs from building an AI SaaS in early 2026.
