I've used the Java code in TextExtractor http://stackoverflow.com/questions/10250617/java-apache-poi-can-i-get-clean-text-from-ms-word-doc-files with good success in Clojure projects. Either throw it in as Java source or convert to Clojure code. You'll probably want the tika-parsers jar instead of the tika-app jar, though.
Brendan On Thursday, January 2, 2014 6:33:11 PM UTC-5, Ron Toland wrote: > > If all you need is the text, you could use Apache Tika to extract it: > http://tika.apache.org/ > > There's a simple clojure lib to get you started: > https://github.com/alexott/clj-tika > > I've used it to pull text out of .doc, .pdf, and .odt files. > > Ron > > On Wednesday, January 1, 2014 11:49:30 PM UTC-8, Joshua Mendoza wrote: >> >> Hi!, >> >> I've been looking for libraries or resources to read MS .doc files in >> Clojure, but found none. Does anyone have tried, used, encountered or >> witnessed such a thing to read them? >> >> I found a lot of info publicly available by the government in .doc files >> but I want to process them automatically with Clojure. >> >> The closest thing I know is using Incanter but to read XLS files, which >> is not useful at all for this... >> >> Well, any help would be great. >> >> Thank you! >> > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.