just trying to think logic through here. if i set the scrape up so the results are:
SB9, Anderson, 1 SB9, Andes, 1 SB9, Brown, 2 Where SB9 stands for Senate Bill 9, the middle field is the voting last name (or last name + first initial if two people have the same last name), and the last field is 1 for a yes, 2 for a no, and 3 for a no vote. Then assuming I setup the models like this (generic setup) PERSON first name last name party BILL title description full content VOTES person bill vote (vote_choices = yes, no, no-vote) I'm trying to figure out the best way to link-up everything. Any suggestions? Chris On Mar 28, 9:52 pm, [EMAIL PROTECTED] wrote: > James, > > Thnx. I would prefer scraping it into a CSV as well. I had a scraper > that got NCAA football scores from a site and output them in CSV to > drop into a db, it was in PHP though and scraped .html files. > > Also, love your blog, a lot of great stuff there. > > Thnx again, > > C > > On Mar 28, 9:34 pm, "James Bennett" <[EMAIL PROTECTED]> wrote: > > > On Fri, Mar 28, 2008 at 10:15 PM, <[EMAIL PROTECTED]> wrote: > > > our state legislature has all their reports online in PDF format, i > > > was hoping to scrape 'em and get them and use them with django to > > > create something similar to what adrian did with the w-p and others > > > have done. > > > There are a couple freely-available libraries that can scrape PDF; > > pyPdf [1], for example, is BSD licensed and seems to be actively > > maintained, and can read the text out of a PDF for you. From there you > > can pretty easily fiddle with the text; the Python Cookbook has a > > recipe [2] for reading the text from a PDF programmatically, for > > example. > > > For getting data from PDF into a database, I (personally) generally > > convert to an intermediate format like CSV, which has the advantage of > > also working in a lot of spreadsheet tools for people to browse while > > you're getting the DB import going. > > > [1]http://pybrary.net/pyPdf/ > > [2]http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511465 > > > -- > > "Bureaucrat Conrad, you are technically correct -- the best kind of > > correct." --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---