Hi,

If you have the serialized tokens in a file, you can write a custom TokenStream 
that unserializes them and feeds them to IndexWriter as a Field instance in a 
Document instance. Please read the javadocs how to write your own TokenStream 
implementation and pass it using "new TextField(name, yourTokenStream)".

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Sachin Kulkarni [mailto:kulk...@hawk.iit.edu]
> Sent: Sunday, September 14, 2014 10:06 PM
> To: java-user@lucene.apache.org
> Subject: Can lucene index tokenized files?
> 
> Hi,
> 
> I have a dataset which has files in the form of tokens where the original data
> has been tokenized, stemmed, stopworded.
> 
> Is it possible to skip the lucene analyzers and index this dataset in Lucene?
> 
> So far the dataset I have dealt with was raw and used Lucene's tokenization
> and stemming schemes.
> 
> Thank you.
> 
> Regards,
> Sachin


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to