Re: Can lucene index tokenized files?

2014-09-14 Thread Sachin Kulkarni
Hi Uwe, Thank you. I do not have the tokens serialized, so that reduces one step. I am reading the javadocs and will try it the way you mentioned. Regards, Sachin On Sun, Sep 14, 2014 at 5:11 PM, Uwe Schindler wrote: > Hi, > > If you have the serialized tokens in a file, you can write a custom

RE: Can lucene index tokenized files?

2014-09-14 Thread Uwe Schindler
Hi, If you have the serialized tokens in a file, you can write a custom TokenStream that unserializes them and feeds them to IndexWriter as a Field instance in a Document instance. Please read the javadocs how to write your own TokenStream implementation and pass it using "new TextField(name, y

Can lucene index tokenized files?

2014-09-14 Thread Sachin Kulkarni
Hi, I have a dataset which has files in the form of tokens where the original data has been tokenized, stemmed, stopworded. Is it possible to skip the lucene analyzers and index this dataset in Lucene? So far the dataset I have dealt with was raw and used Lucene's tokenization and stemming schem