Macaroni

From Spreadsheets to SaaS

I joined Blink SEO as the only technical team member, with the brief: "use data to improve our processes." When I arrived, the SEO team was spending most of their time on manual data engineering. They'd pull data from Google Analytics, Shopify, Search Console, and site crawls, then clean and analyse it all in spreadsheets. Rinse and repeat for every client, every month.

I built a Python backend to automate the ingestion pipeline, wrapping several web APIs and streaming the cleaned data into a BigQuery warehouse with a schema designed around SEO workflows. Views and stored procedures made it easy to query key metrics without writing SQL from scratch each time.

Once the data engineering was automated, I started adding ML features: keyword clustering to identify content gaps, classification algorithms to suggest site taxonomy improvements, and eventually LLM integrations to generate draft content. The frontend evolved from Looker Studio dashboards to a custom Retool app with interactive Plotly visualisations.

The result was a 20x productivity increase. Campaigns that used to take a year could now be delivered in weeks. Management saw the potential and decided to spin it out as a SaaS product: Macaroni Software.

The Stack

The system ran on GCP Compute Engine, processing 50M+ data points daily from external APIs. I wrote the PyGoogalytics library to standardise data ingestion from Google Analytics, Search Console, and Google Ads. Client onboarding, data imports, and ML tasks ran asynchronously via JobMaster, the PostgreSQL-based job queue I built for this purpose.

LayerTechnologies
Data ingestionPython, PyGoogalytics, Shopify GraphQL API, web scraping
StorageBigQuery (warehouse), PostgreSQL (job queue)
ML / NLPScikitLearn, NLTK, Huggingface
FrontendRetool, Plotly (JavaScript), Looker Studio
InfrastructureGCP Compute Engine, Docker, Git

ML Features

The ML components were built through constant iteration with the SEO team. I'd sit with them, watch how they worked, and figure out where automation could help most.

  • Keyword clustering: Combined NLP-based semantic similarity with quantitative metrics (clicks, impressions, rankings) to group thousands of keywords into actionable topics. Work that took days now finished in minutes.
  • Content gap analysis: Cross-referenced clustered keywords against existing site content to surface opportunities the team would have missed manually.
  • Taxonomy suggestions: Classification algorithms proposed site structure improvements based on how keywords naturally grouped together.
  • LLM content generation: Integrated Huggingface models to draft content suggestions, refined in consultation with Blink's copywriters.

Scaling Up

When Macaroni became a product, we hired a data engineer, frontend developer, and product manager. I onboarded the new team and handed over parts of the codebase, freeing up time to focus on the ML features. We narrowed the product scope to Shopify stores, which let us integrate directly with the Shopify GraphQL API for better data quality and real-time catalogue updates.

I stayed involved with the Blink SEO delivery team throughout, running ad-hoc analyses and building visualisations for client presentations and investor meetings.

Summary

  • Full-stack ML platform built from scratch over 3 years
  • 20x productivity improvement for SEO workflows
  • 50M+ data points processed daily
  • Spun out as SaaS product (Macaroni Software)
  • Open-sourced PyGoogalytics library