Today I read some HTML data from a table using Python. The tools I ended up using to get the data were:
import urllib f = urllib.urlopen(<URL_string>) # Read from the object, storing the page's contents in 's'. s = f.read() f.close()
Then to parse tables in the HTML I used lxml which can read HTML and provide an ElementTree like API.
from lxml import etree html = etree.HTML(s)
I was then able to do everything I needed using list() on the element to get the children and using the attributes .text and .tag to get the text out of tables and links as desired.