Automating Data Extraction from Substack Posts Using LLMs

Hey! I’ve been experimenting with LLMs and wanted to share a use case where they perform exceptionally well: data extraction 🤖 📊

I’m a big fan of Substack, where I follow many different writers, such as Jorge Bosch (Cosas de Freelance), Sara Enrique (La Psicoletter), and Daniel Primo (Web Reactiva).

In each edition, there are many recommended resources: people to follow, interesting tools, podcasts, etc.

Sometimes, I take notes and save those resources in Notion, but other times I just quickly read the posts while traveling or waiting for something ⏳.

So, I needed to automate this instead of re-reading the posts to pick up all the resources 🤖.

That’s when I started thinking about how LLMs could help extract this data and present it in a more accessible way 📝💡.

This is something I built to learn while solving a problem. You just need to input the link to a Substack post, and all the relevant data will be extracted and displayed so you can easily access the resources.

This could easily be turned into a ChromeExtension to extract the data directly from the email, instead of having to manually enter the link.

It can also be integrated with other tools, such as Notion, to keep track of the resources.

This small side project doesn’t just utilize the OpenAI API at the application level. I also used CursorIDE for development and Vercel v0 for the components creation, which made me significantly more productive.

It also allowed me to focus on what I really wanted to learn: python, llms, prompting, function calling, OpenAI API, etc...

Have a lovely day! 🌅