Hi, If you have the serialized tokens in a file, you can write a custom TokenStream that unserializes them and feeds them to IndexWriter as a Field instance in a Document instance. Please read the javadocs how to write your own TokenStream implementation and pass it using "new TextField(name, yourTokenStream)".
Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Sachin Kulkarni [mailto:kulk...@hawk.iit.edu] > Sent: Sunday, September 14, 2014 10:06 PM > To: java-user@lucene.apache.org > Subject: Can lucene index tokenized files? > > Hi, > > I have a dataset which has files in the form of tokens where the original data > has been tokenized, stemmed, stopworded. > > Is it possible to skip the lucene analyzers and index this dataset in Lucene? > > So far the dataset I have dealt with was raw and used Lucene's tokenization > and stemming schemes. > > Thank you. > > Regards, > Sachin --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org