If the website serves RDF, you can use Plaza
https://github.com/antoniogarrote/clj-plaza

<https://github.com/antoniogarrote/clj-plaza>Just from a casual browse of UM
Online LIbrary, some of their pages are in RDFa.

Ambrose

On Mon, Mar 7, 2011 at 12:48 AM, Jonathan Mitchem <jmitc...@gmail.com>wrote:

> Are there any libraries that can extract structured web content as
> represented visually in the browser?
>
> I realize I could write regexes and extract using the HTML, but I was
> wondering if there was something that worked with the browser-rendered
> representation.  I.e., something a tad more human-readable: I'd like to have
> a simple syntax to represent the input pattern as presented in the browser
> and have it map into a structured list/map/array.
>
> For instance, (from the UMichigan Online Library)
>
> Poetry Here and Then
> A sampling of the papers of Michigan poets from various collections housed
> at the Bentley Historical Library, featuring handwritten and typed
> manuscripts, letters and essays as well as photographs, sketches,
> certificates and other personal items.
> Format: Image Collections
> Access: public
> Search within group: University of Michigan Collections
> Sponsor: Digital Library Production Service
> Statistics Detail: statistics detail
>
>
> [Next record, same format]
>
>
> That presentation is standardized, and repeated for a hundred+ items.
>
> I'd like to be able to easily turn it into something like:
>
> '(:title "Poetry Here and Then" :description "A sampling..." :format "Image
> Collections" :access "public" ... etc.)
>
>
> Again, I know I can do it with regex parsing of the HTML itself, but I was
> wondering if there were any libraries to make that process smoother.
>
>
> Thanks,
> Jonathan
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to