How to Scrape (Almost) Anything

Welcome back to Edition #2 of the GTM Cookbook 👨🏼‍🍳

I wanted to outline a problem this week that I think just about any GTM operator has faced at some point- scraping the exact leads you want to aggregate.

Not every lead list is in Apollo, ZoomInfo, 6Sense, or any of the other traditional data providers. Sometimes, they’re on niche websites, in government databases, and across the internet in a semi-structured way. This guide is going to show you how we scrape leads for our clients from wherever they want us to scrape.

At The Kiln, we recently scraped the entire FMCSA database for a client using these methods, which resulted in more than 1 million records.

In short, the simplicity or complexity of a scrape is determined by how structured the data is, as well as how badly a website doesn’t want you to scrape it. We’ll list every method and when you should use it, just to make things easier.

Also, our very own Ankit Singh put the entire technical side of this guide together. He’s our in-house scraping expert, a technical wizard, and someone you should definitely follow if you’re interested in learning about the deeply technical side of GTM.

Without further ado, here’s the guide to scraping just about anything:

1. Clay Chrome Extension

Yep, Clay has a Chrome extension that allows you to scrape websites and import the data straight into Clay. In my early days at Clay, I was tasked with mapping out hundreds of pages to help users get more out of the tool. It’s quite simple to use, and is helpful for basic lists like the example above (Y Combinator’s Website). It’s also free, so worth a try before going to more complex cases.

More Information:Clay for chrome

2. Phantombuster

Phantombuster is a great tool for scraping specialized things such as LinkedIn followers and engagement. They have individual scrapers called “phantoms” that serve a specific scraping task, which you choose from, follow the connection instructions, and just let it run. It’s super easy to use and great for specific kinds of scraping tasks. I highly recommend you check out their Phantoms List to see if it could help for your use case.

More Information: Phantombuster

3. Apify

Apify is another marketplace of scrapers that allow you to complete specific scraping tasks. They call these “Actors” and it’s very easy to run them as well as connect them to Clay via their native integration. Check out their store here, or you can create your own.

More Information: Apify

 4. Octoparse

Octoparse is the first true scraping tool that can be used for complex scrapes and essentially any website. It allows you to create custom scraping actions that can get past specific barriers such as 2FA, weird clicking patterns, and more. I highly recommend you check out Ankit’s loom below for a quick rundown on how to use it.

4. Python (Selenium & Beautiful Soup)

Where all else fails, a well-executed Python script can almost always do the trick.

Use Case: Provides maximum flexibility in web scraping by allowing full control over:

  • Automating browser interactions using Selenium.

  • Parsing HTML content with Beautiful Soup.

  • Running Chromedriver for handling dynamic content.

  • Example: Scraping GSMA.com for structured data extraction.

  • More Information:

    • Selenium: Selenium

    • Beautiful Soup: Beautiful Soup Documentation

Loom: https://www.loom.com/share/e7136d5ab9a943c6bf428132feed8c91?sid=f8a650fc-400c-4746-b04b-20bbb1fbaf88

I hope this added some value for you, and feel free to reach out with any questions!

and to wrap things up, if you’re looking for the influencers template I posted about two days ago, here you go! → https://app.clay.com/shared-table/share_F9RWGq4bDbhB?via=b8a689