On Tue, 12 Aug 2014 15:44:58 -0700 (PDT), Simon Evans wrote: [snip] > Dear Programmers, Thank you for your responses. I have installed > 'Beautiful Soup' and I have the 'Getting Started in Beautiful Soup' > book, but can't seem to make any progress with it, I am too thick to > make much use of it. I was hoping I could scrape specified stuff off > Web pages without using it.
I've only used BeautifulSoup a little bit, and am no expert, but with it one can do wonderfully complex things with simple code. Perhaps you can find some examples online; this newsgroup sometimes has awesome demonstrations of BS prowess. At the risk of embarrassing myself in public, I'll show you some code I wrote that scrapes data from a web page containing a description of a drug. The drug's web page contains the desired data in tags that look like this: <input id="form-widgets-minconcentration" name="form.widgets.minconcentration" class="text-widget float-field" value="1.0" type="text" /> The following code finds all these tags and builds a dict by which you can lookup the "value" for any given "name". from BeautifulSoup import BeautifulSoup as BS ... def dump_drug_data(url): """Fetch data from one drug's URL and print selected fields in columns. """ contents = urllib2.urlopen(url=url).read() soup = BS(contents) inputs = soup.findAll("input") input_dict = dict((i.get("name"), i.get("value")) for i in inputs) print(" ".join(f.format(input_dict[n]) for f, n in ( ("{0:5s}", "form.widgets.absorption_halflife"), ("{0:5s}", "form.widgets.elimination_halflife"), ("{0:5s}", "form.widgets.minconcentration"), ("{0:5s}", "form.widgets.maxconcentration"), ("{0:13s}", "form.widgets.title"), ))) Try giving a more specific picture of your quest, and it's very likely that people smarter than me will give you good help. -- To email me, substitute nowhere->spamcop, invalid->net. -- https://mail.python.org/mailman/listinfo/python-list