Are there any libraries that can extract structured web content as represented visually in the browser?
I realize I could write regexes and extract using the HTML, but I was wondering if there was something that worked with the browser-rendered representation. I.e., something a tad more human-readable: I'd like to have a simple syntax to represent the input pattern as presented in the browser and have it map into a structured list/map/array. For instance, (from the UMichigan Online Library) Poetry Here and Then A sampling of the papers of Michigan poets from various collections housed at the Bentley Historical Library, featuring handwritten and typed manuscripts, letters and essays as well as photographs, sketches, certificates and other personal items. Format: Image Collections Access: public Search within group: University of Michigan Collections Sponsor: Digital Library Production Service Statistics Detail: statistics detail [Next record, same format] That presentation is standardized, and repeated for a hundred+ items. I'd like to be able to easily turn it into something like: '(:title "Poetry Here and Then" :description "A sampling..." :format "Image Collections" :access "public" ... etc.) Again, I know I can do it with regex parsing of the HTML itself, but I was wondering if there were any libraries to make that process smoother. Thanks, Jonathan -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en