just trying to think logic through here.

if i set the scrape up so the results are:

SB9, Anderson, 1
SB9, Andes, 1
SB9, Brown, 2

Where SB9 stands for Senate Bill 9, the middle field is the voting
last name (or last name + first initial if two people have the same
last name), and the last field is 1 for a yes, 2 for a no, and 3 for a
no vote.

Then assuming I setup the models like this (generic setup)

PERSON
first name
last name
party

BILL
title
description
full content

VOTES
person
bill
vote (vote_choices = yes, no, no-vote)

I'm trying to figure out the best way to link-up everything. Any
suggestions?

Chris

On Mar 28, 9:52 pm, [EMAIL PROTECTED] wrote:
> James,
>
> Thnx. I would prefer scraping it into a CSV as well. I had a scraper
> that got NCAA football scores from a site and output them in CSV to
> drop into a db, it was in PHP though and scraped .html files.
>
> Also, love your blog, a lot of great stuff there.
>
> Thnx again,
>
> C
>
> On Mar 28, 9:34 pm, "James Bennett" <[EMAIL PROTECTED]> wrote:
>
> > On Fri, Mar 28, 2008 at 10:15 PM,  <[EMAIL PROTECTED]> wrote:
> > >  our state legislature has all their reports online in PDF format, i
> > >  was hoping to scrape 'em and get them and use them with django to
> > >  create something similar to what adrian did with the w-p and others
> > >  have done.
>
> > There are a couple freely-available libraries that can scrape PDF;
> > pyPdf [1], for example, is BSD licensed and seems to be actively
> > maintained, and can read the text out of a PDF for you. From there you
> > can pretty easily fiddle with the text; the Python Cookbook has a
> > recipe [2] for reading the text from a PDF programmatically, for
> > example.
>
> > For getting data from PDF into a database, I (personally) generally
> > convert to an intermediate format like CSV, which has the advantage of
> > also working in a lot of spreadsheet tools for people to browse while
> > you're getting the DB import going.
>
> > [1]http://pybrary.net/pyPdf/
> > [2]http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511465
>
> > --
> > "Bureaucrat Conrad, you are technically correct -- the best kind of 
> > correct."
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to