Backend Architecture
Built with Flask framework using blueprints for modular design. Implements SQLAlchemy ORM for database operations with migration support via Alembic.
Creating a searchable archive and API of the Dutch public broadcaster NOS
Until 2024, the NOS had a great archive. You could choose a date in a category and see what happened on that day. At the end of 2023, the NOS announced that old articles would only be accessible by their search function. The only problem is that this search function is terrible. You cannot filter by date or category. If you are looking for a topic that has several articles about it, it is like looking for a needle in a haystack. Using the things I've learned in my first (failed) project, I scraped the Internet Archive's Wayback Machine until 2010 and categorised and indexed all articles I could get until 2010. Additionally, I am ingesting rich data going forward from June 2024. To make it more useful than it was even before, I added variable archive windows (day/week/month) and categories, AI summarisation, search, and a public API.
Goals
Backend Architecture
Built with Flask framework using blueprints for modular design. Implements SQLAlchemy ORM for database operations with migration support via Alembic.
Frontend Design
Responsive UI built with Bootstrap CSS framework. Jinja2 templates provide server-side rendering with minimal JavaScript for enhanced interactions.
Deployment Infrastructure
Deployment on a VPS served with the Gunicorn HTTP server and routed with NGINX. Data is served throught a MySQL instance coupled with Elasticsearch.
Status
Start Date
2025-10-07
Features
| Historic data until 2010 | Completed |
| Automatic data ingestion | Completed |
| Categorisation & labeling | Completed |
| Implement search engine | Completed |
| Day/Week/Month archive | Completed |
| AI summarisations | Completed |
| Public rate-limited API | Completed |