PhantomJS and Selenium to the Rescue
First we’ll set up our environment:
Let’s make python file named scraper.py
and fill it up with our script:
NB! In order for this script to work we must specify where to find PhantomJS. The phantomjs file can be found from here. Just download the correct version according to your operating system. Then unpack it and the file named ‘phantomjs’ can be found under the bin folder. Copy the file into the same folder as the scraper.py script.
Now lets run our script:
and we get the html for the first match, which happened between Djokovic and Federer:
As we saw it’s relatively easy to get dynamically loaded content, although it’s a bit slower than the traditional method of just firing a request to the server. You can clone the github repository from here.