Using Firefox for scraping¶
Here is a list of tips and advice on using Firefox for scraping, along with a list of useful Firefox add-ons to ease the scraping process.
Caveats with inspecting the live browser DOM¶
Since Firefox add-ons operate on a live browser DOM, what you’ll actually see
when inspecting the page source is not the original HTML, but a modified one
in particular, is known for adding
<tbody> elements to tables. Scrapy, on
the other hand, does not modify the original page HTML, so you won’t be able to
extract any data if you use
<tbody in your XPath expressions.
Therefore, you should keep in mind the following things when working with Firefox and XPath:
- Never use full XPath paths, use relative and clever ones based on attributes
width, etc) or any identifying features like
- Never include
<tbody>elements in your XPath expressions unless you really know what you’re doing
Useful Firefox add-ons for scraping¶
Firebug is a widely known tool among web developers and it’s also very useful for scraping. In particular, its Inspect Element feature comes very handy when you need to construct the XPaths for extracting data because it allows you to view the HTML code of each page element while moving your mouse over it.
See Using Firebug for scraping for a detailed guide on how to use Firebug with Scrapy.