Frederick Cheung wrote:
>
> Does Scraper need to be an activerecord class at all ? you could pass
> to it the class whose table needs to be updated ie
>
> def do_something(some_klass)
> some_klass.update_all(...)
> end
>
> or perhaps you might want to couple things a little more loosely
>
> def do_something(some_klass)
> some_klass.handle_scraper_data(...)
> end
>
> Fred
Hi Fred:
Here's what I managed to do on my own (believe it or not - lol ):
My Rake Task:
Basically calling the RushingOffense class from models
desc "Parse Rushing Offenses data from ncaa.org"
task :parse_rushing_offenses => :environment do
update_rushing = RushingOffense.new
update_rushing.scrape
end
My Model for Rushing Offense:
Which basically I created a method for "scrape" to scrape data utilizing
the Scraper class. Since this model has inheritance with ActiveRecord
it should be able to update...
class RushingOffense < ActiveRecord::Base
def scrape
offensive_rushing =
Scraper.new('http://web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org',
'table', 'statstable', '//tr')
offensive_rushing.scrape_data
offensive_rushing.clean_celldata
for i in 0..offensive_rushing.numrows-1
puts "Updating Team Name = #{offensive_rushing.rows[i][1]}."
RushingOffense.update_all(:name => offensive_rushing.rows[i][1],
:games => offensive_rushing.rows[i][2])
end
end
end
Then finally, I have my scraper.rb file
#== Scraper Version 1.0
#
#*Created By:* _Elricstorm_
#
# _Special thanks to Soledad Penades for his initial parse idea which I
worked with to create the Scraper program.
# His article is located at
http://www.iterasi.net/openviewer.aspx?sqrlitid=wd5wiad-hkgk93aw8zidbw_
#
require 'hpricot'
require 'open-uri'
# This class is used to parse and collect data out of an html element
class Scraper #< ActiveRecord::Base
#class Scraper
attr_accessor :url, :element_type, :clsname, :childsearch, :doc,
:numrows, :rows
# Define what the url is, what element type and class name we want to
parse and open the url.
def initialize(url, element_type, clsname, childsearch)
@url = url
@element_type = element_type
@clsname = clsname
@childsearch = childsearch
@doc = Hpricot(open(url))
@numrows = numrows
@rows = rows
end
# Scrape data based on the type of element, its class name, and define
the child element that contains our data
def scrape_data
@rows = []
(doc/"#...@element_type}.#{@clsname...@childsearch}").each do |row|
cells = []
(row/"td").each do |cell|
if (cell/" span.s").length > 0
values = (cell/"span.s").inner_html.split('<br />').collect{
|str|
pair = str.strip.split('=').collect{|val| val.strip}
Hash[pair[0], pair[1]]
}
if(values.length==1)
cells << cell.inner_text.strip
else
cells << values.strip
end
elsif
cells << cell.inner_text.strip
end
end
@rows << cells
end
@rows.shift # Shifting removes the row containing the <th> table
header elements.
@rows.delete([]) # Remove any empty rows in our array of arrays.
@numrows = @rows.length
end
def clean_celldata
@ro...@numrows-1][0] = 120
end
# Print a joined list by row to see our results
def print_values
puts "Number of rows = #{numrows}."
for i in 0...@numrows-1
puts @rows[i].join(', ')
end
end
end
--------------------------------
Now the only problem I have now is when I run the rake task, I don't get
any errors and I see the puts for each team as it's being updated (or
supposed to be updated). So, it's counting each row as I expected.
I only tried to update 2 fields just for a test.. but no data is being
listed in the database..
Any ideas of what I might be doing wrong?
This still has been a great day because even though I've seen tons of
errors, I'm learning..
--
Posted via http://www.ruby-forum.com/.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---