Production-ReadySenior

TOPIA — Job Aggregator Pipeline

Production-deployed full-stack job aggregation pipeline that collects, normalizes, deduplicates, and serves remote job listings using a distributed architecture with MongoDB Atlas, FastAPI, and a locally scheduled scraper running on a residential IP to bypass cloud IP bans.

Description

Building a fault-tolerant, idempotent data pipeline that separates write and read paths while handling external API inconsistencies and cloud deployment limitations.

Problem Solved

Building a fault-tolerant, idempotent data pipeline that separates write and read paths while handling external API inconsistencies and cloud deployment limitations.

Architecture

Local Scheduler (Windows Task Scheduler) -> Python Scraper (Residential IP) -> MongoDB Atlas (Cloud DB) -> FastAPI (Railway - Read Only) -> React Frontend (Vercel). Architecture highlights full separation of concerns, stateless API, and persistent external DB.

Tech Stack

PythonFastAPIMongoDB AtlasReactViteTypeScriptRailwayVercelWindows Task Scheduler

Tags

Distributed Data PipelineRead-Optimized Backend SystemProduction deployedStateless backendIdempotent pipelineExternalized databaseFault-tolerant designRead/write separation

Scalability

Stateless backendIdempotent pipelineFault-tolerant designRead-Optimized Backend System

Designed for fault tolerance and distributed workload segregation. By detaching the write path (residential scraper) from the read path (ephemeral cloud API), the system scales API instances independently while maintaining strict idempotency against a centralized Atlas cluster.

Architecture Breakdown

System Architecture

Local Scheduler (Windows Task Scheduler) [Write Path]
Python Scraper (Residential IP) [Write Path]
MongoDB Atlas (Cloud DB)
FastAPI (Railway - Read Only) [Read Path]
React Frontend (Vercel) [Read Path]

Engineering Decisions

  • Architecture Transformations: SQLite → MongoDB Atlas (fix data loss)
  • Architecture Transformations: Cloud scraping → local scraping (fix IP ban)
  • Architecture Transformations: Monolithic flow → separated pipeline

Production Readiness

Production deployed
Stateless backend
Idempotent pipeline
Externalized database
Fault-tolerant design
Read/write separation
No Redis caching
No CI/CD pipeline
Local scraper dependency
No APM metrics

Media Proof

Project Videos