There are a number of options, depending on your needs: - the standard JRE libraries for xml parsing / xpath (javax.xml.*). These have the benefit of having seen wide usage (outside of clojure), and would allow you to migrate existing xpaths over unchanged.
- clojure.xml - a more clojuresque way of parsing and working with xml - clojure.zip - which can take the xml from above (in addition to many other things) and provides a functional way of traversing and editing the resulting tree of elements. - clojure.contrib.zip_filter.xml - provides a means to extract data from clojure.xml structures using a syntax loosely similar to xpath. For working with html, I've had good experiences with c.x / c.c.zf.x, using tagsoup (http://home.ccil.org/~cowan/XML/tagsoup/) as the SAXParser in order to deal with non-xml compliant documents. If performance is your aim, you might want to investigate the clojure/ saxon library (http://github.com/pjt/saxon/tree/master), possibly combined with tagsoup again to deal with dodgy html; your message implies that you mainly want to retrieve documents and extract a set of data from each using relatively static expressions (presumably the bulk of your business logic deals with processing this data); if this is indeed the case, then you could use saxon to load the documents returned by your http client and execute the XPaths, which I would imagine will be faster than using zippers. You could also, of course, simply use the javax.xml.* libraries above directly to load the document and evaluate the xpath. -DTH On Aug 23, 2:02 am, dmix <liftedme...@gmail.com> wrote: > I am planning on migrating an app from ruby to clojure (for > performance and to learn clojure) and before I proceed I wanted to > make sure a few libraries are available. > > One crucial part of the app is parsing a URL to return the pages HTML > (<html><body>...etc). Then I need to grab a certain element off the > page using an xpath. For example a specific images src=" ". > > I found an http client on github but I haven't found any HTML parser, > does anyone know if one exists? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---