If the website serves RDF, you can use Plaza https://github.com/antoniogarrote/clj-plaza
<https://github.com/antoniogarrote/clj-plaza>Just from a casual browse of UM Online LIbrary, some of their pages are in RDFa. Ambrose On Mon, Mar 7, 2011 at 12:48 AM, Jonathan Mitchem <jmitc...@gmail.com>wrote: > Are there any libraries that can extract structured web content as > represented visually in the browser? > > I realize I could write regexes and extract using the HTML, but I was > wondering if there was something that worked with the browser-rendered > representation. I.e., something a tad more human-readable: I'd like to have > a simple syntax to represent the input pattern as presented in the browser > and have it map into a structured list/map/array. > > For instance, (from the UMichigan Online Library) > > Poetry Here and Then > A sampling of the papers of Michigan poets from various collections housed > at the Bentley Historical Library, featuring handwritten and typed > manuscripts, letters and essays as well as photographs, sketches, > certificates and other personal items. > Format: Image Collections > Access: public > Search within group: University of Michigan Collections > Sponsor: Digital Library Production Service > Statistics Detail: statistics detail > > > [Next record, same format] > > > That presentation is standardized, and repeated for a hundred+ items. > > I'd like to be able to easily turn it into something like: > > '(:title "Poetry Here and Then" :description "A sampling..." :format "Image > Collections" :access "public" ... etc.) > > > Again, I know I can do it with regex parsing of the HTML itself, but I was > wondering if there were any libraries to make that process smoother. > > > Thanks, > Jonathan > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en