TOPIA — Job Aggregator Pipeline
Production-deployed full-stack job aggregation pipeline that collects, normalizes, deduplicates, and serves remote job listings using a distributed architecture with MongoDB Atlas, FastAPI, and a locally scheduled scraper running on a residential IP to bypass cloud IP bans.
Description
Building a fault-tolerant, idempotent data pipeline that separates write and read paths while handling external API inconsistencies and cloud deployment limitations.
Problem Solved
Building a fault-tolerant, idempotent data pipeline that separates write and read paths while handling external API inconsistencies and cloud deployment limitations.
Architecture
Local Scheduler (Windows Task Scheduler) -> Python Scraper (Residential IP) -> MongoDB Atlas (Cloud DB) -> FastAPI (Railway - Read Only) -> React Frontend (Vercel). Architecture highlights full separation of concerns, stateless API, and persistent external DB.
Tech Stack
Tags
Scalability
Designed for fault tolerance and distributed workload segregation. By detaching the write path (residential scraper) from the read path (ephemeral cloud API), the system scales API instances independently while maintaining strict idempotency against a centralized Atlas cluster.
Architecture Breakdown
System Architecture
Engineering Decisions
- ▹Architecture Transformations: SQLite → MongoDB Atlas (fix data loss)
- ▹Architecture Transformations: Cloud scraping → local scraping (fix IP ban)
- ▹Architecture Transformations: Monolithic flow → separated pipeline