Divya,

Here's a simple example for converting text from an input stream (which you can 
convert any file into):

(ns sample.tika
  (:require [clj-tika.core :as tika])

(defn extract-text
  "Extracts the text from the input stream"
  [input-stream]
  (tika/parse input-stream))


Ron 

-- 
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, December 5, 2014 at 2:32 AM, Divya Shravanthi wrote:

> Hi Ron,
> 
> Could you please share an example of how to pull simple text from pdf/doc 
> files. I couldn't find a proper tutorial for clj-tika. 
> 
> Thanks
> 
> On Friday, 3 January 2014 05:03:11 UTC+5:30, Ron Toland wrote:
> > If all you need is the text, you could use Apache Tika to extract it: 
> > http://tika.apache.org/
> > 
> > There's a simple clojure lib to get you started: 
> > https://github.com/alexott/clj-tika
> > 
> > I've used it to pull text out of .doc, .pdf, and .odt files.
> > 
> > Ron
> > 
> > On Wednesday, January 1, 2014 11:49:30 PM UTC-8, Joshua Mendoza wrote:
> > > Hi!,
> > > 
> > > I've been looking for libraries or resources to read MS .doc files in 
> > > Clojure, but found none. Does anyone have tried, used, encountered or 
> > > witnessed such a thing to read them?
> > > 
> > > I found a lot of info publicly available by the government in .doc files 
> > > but I want to process them automatically with Clojure.
> > > 
> > > The closest thing I know is using Incanter but to read XLS files, which 
> > > is not useful at all for this...
> > > 
> > > Well, any help would be great.
> > > 
> > > Thank you! 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com 
> (mailto:clojure@googlegroups.com)
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com 
> (mailto:clojure+unsubscr...@googlegroups.com)
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "Clojure" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/clojure/iKDl6NHv4DU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> clojure+unsubscr...@googlegroups.com 
> (mailto:clojure+unsubscr...@googlegroups.com).
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to