About
This is part of a series of informal talks, hosted by the Electronics and Technical Arts studio. It is a way for a attendees to get a taste of a new subject. Admission is free! If there is enough interest, some of these talks may become starting points for classes in the future.
Title: Introduction To Web Scraping - Make Your Computer Go To Work For You!
Speaker: Ehren J. Brav
Description: Web scraping is the art of automatically collecting information from the internet. Much of humanity's collective knowledge is available online and being able to automatically gather and analyze it is remarkably powerful (and not that hard). So make your computer go to work for you!
This talk will teach the basics of web scraping and demo two applications I created:
The first automatically searches the Seattle Mountaineers web page for trips that satisfy a couple of criteria (climbs of Rainier or Baker, trips led by leaders I like, etc.), and emails me an alert if it finds one. Putting my computer to work thus eliminated the need for me to remember to check the site!
The second I used (with permission) to systematically download every single recording of North American bird calls that the web page Xeno-Cano collects. This ran over three consecutive nights and mirrored their database, consisting of tens of thousands of recordings. I put these into a local database so I could run machine learning applications on them.
Agenda:
- Introduction to web scraping in Python
- The "With great power comes great responsibility" talk
- Walkthrough of the two demo apps - step-by-step how they work
- Tools for web scraping: beautifulsoup, postgresql, smtplib, mechanize
Familiarity with Python is a plus but not absolutely necessary.