This week’s assignment is to grab text from a page. So I try to grab all the title of TechCrunch :  https://techcrunch.com/popular/

I open the page and find all the news titles are under h2 tag,  so the CSS selector would be “h2.post-title a”. Here is the code:

from bs4 import BeautifulSoup
import urllib

start_url = 'https://techcrunch.com/popular/'
html = urllib.urlopen(start_url).read()

soup = BeautifulSoup(html, 'html.parser')

titles = soup.select('h2.post-title a')

for title in titles:
    print title.text

 

and results:

 

Leave a Reply

Your email address will not be published. Required fields are marked *