Finding Common Connections: Building a Tim Ferriss Knowledge Graph

For a while now I have had a project in mind to explore connections across my favorite podcasts, delving beyond individual episode summaries. I wanted to know which books were mentioned most, which topics recurred, or which guests appeared multiple times.

I started with The Tim Ferriss Show. Here’s what I did:

  1. Acquiring Transcripts – Tim Ferriss has all his transcripts on his website. With the show’s long run, the formats varied between PDF and HTML. I gathered the links and used scripts to convert and download the content. Later, I found that YouTube transcripts might be an easier starting point for future projects.
  2. Extraction of Key Data – I used the OpenAI GPT-3.5 Turbo 16k LLM model to process the transcripts. This helped identify attributes in each episode. By focusing on the transcripts, I reduced inaccuracies. The results came in a parseable JSON format. Next time, I might use Lang Chain for chunking rather than making my own method. Processing all episodes cost about $40. While GPT-4 might offer better results, it’s likely more expensive.
  3. Data Post-Processing – Some data cleanup was necessary. For instance, topics had varied capitalization or were too similar, like “investing” and “investments.” I’m looking to automate this more in the future, possibly linking terms to master terms.
  4. Data Presentation – With around 10,000 rows linking episodes to topics, guests, books, and more, the challenge was presentation. I initially considered Python tools like Streamlit or Flask. But simplicity won, and I used JavaScript. I tried different methods, including a D3.js graph, but settled on a CSS-based hierarchy from a previous project. This always shows how an episode connects to other elements. While links can be circular, you can choose how you navigate. I’m considering a tabular format for the next version.

Below is a preview of the the page and you can visit it here.

A screenshot of the page to understand the relationships of topics, guests, books, etc across The Tim Ferriss Show episodes.

Posted

in

,

by