Building OpenStatus: A Deep Dive into Our Infrastructure Architecture
Dec 12, 2024•3 min read
Infrastructure Overview
OpenStatus is a synthetic monitoring platform designed with resilience, scalability, and efficiency in mind. Our infrastructure is a carefully orchestrated ecosystem of multiple applications and services, each playing a crucial role in our system.
Application Landscape
Our platform comprises several interconnected applications, each serving a specific purpose:
- Frontend Ecosystem The core of our user interaction is built on a robust, multi-faceted frontend architecture. We've strategically chosen technologies that provide optimal performance and developer experience:
- A NextJS application that powers our marketing site, user dashboard, and status page hosted on Vercel.
- An Astro + Starlight-powered documentation platform, ensuring our users have comprehensive, easily navigable documentation hosted on Cloudfare Pages.
- Backend Infrastructure All our backend services are hosted on Fly.io.
- API server: Our public API and our alerting engine
- Probes/Checker: a golang app deployed globally to monitor your service
- Screenshot app: a service that takes screenshot of your website when we detect an downtime (Playwright)
Managed Services
We also rely heavily on managed service to avoid handling it by ourselves. Here are some of the services we use
Scheduling and Job Management
Recognizing the critical nature of monitoring, we've heavily rely on managed services for scheduling and job management:
- Cron Jobs: Currently using Vercel Cron, with plans to migrate to Google Cron for an enhanced user experience.
Queue Architecture
Every check is pushed to a queue and processed by our probes. The probes are responsible to check the status of your service.
- Job Queue: Google Task Queues provide our distributed task management, with strategically segmented queues for different check frequencies
We've implemented a granular queue system to ensure efficient task processing:
- Separate queues for frontend services
- Dedicated queues for API server and alerting engine
- Specialized queues for probes and screenshot services
Hosting Strategy
Our multi-cloud approach ensures flexibility and optimal performance:
- Frontend: Hosted on Vercel for seamless deployment and edge networking
- Probes: Currently on Fly.io, with plans to add more providers for our global monitoring system.
- Queue Management: Leveraging Google Cloud Platform (benefiting from Google credits)
Data Infrastructure
We also don't want to handle the data infrastructure by ourselves. We rely on managed services for that:
- Primary Database: Turso, providing a cost efficient data storage solution.
- Analytics Database: Tinybird, enabling complex analytical queries and insights.
Design Philosophy
Our infrastructure design is driven by several key principles:
- Resilience: Ensuring high availability and fault tolerance
- Scalability: Architectural choices that allow seamless growth
- Cost-Efficiency: Leveraging managed services and cloud credits
- Performance: Optimizing each component for maximum efficiency.
How much does it cost us?
Our current monthly cost is around $319. This includes:
- Vercel: $40
- Fly.io: $150 36*4 (All our probes) + 10 (for the api server)
- Google Cloud Platform: $0 (We are still using the free credits, but we expect to pay around $50 for the queue)
- Tinybird: $100
- Turso: $29
- Cloudfare: $5
Conclusion
Building a resilient synthetic monitoring platform requires a thoughtful approach to infrastructure design.
The drawback of this approach is the complexity of providing an easy self hostable services.