How are they delimited? If they're just a text stream, it seems all you need is a whitespace tokenizer. Won'
How are you going to search them though? Is your query submission process going to _also_ do the transformations or will you have to construct a query-time analysis chain that mimics the pre-tokenization you have at index time? Best, Erick On Sun, Sep 14, 2014 at 8:34 PM, Sachin Kulkarni <kulk...@hawk.iit.edu> wrote: > Hi Uwe, > > Thank you. > I do not have the tokens serialized, so that reduces one step. > I am reading the javadocs and will try it the way you mentioned. > > Regards, > Sachin > > On Sun, Sep 14, 2014 at 5:11 PM, Uwe Schindler <u...@thetaphi.de> wrote: > >> Hi, >> >> If you have the serialized tokens in a file, you can write a custom >> TokenStream that unserializes them and feeds them to IndexWriter as a Field >> instance in a Document instance. Please read the javadocs how to write your >> own TokenStream implementation and pass it using "new TextField(name, >> yourTokenStream)". >> >> Uwe >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >> > -----Original Message----- >> > From: Sachin Kulkarni [mailto:kulk...@hawk.iit.edu] >> > Sent: Sunday, September 14, 2014 10:06 PM >> > To: java-user@lucene.apache.org >> > Subject: Can lucene index tokenized files? >> > >> > Hi, >> > >> > I have a dataset which has files in the form of tokens where the >> original data >> > has been tokenized, stemmed, stopworded. >> > >> > Is it possible to skip the lucene analyzers and index this dataset in >> Lucene? >> > >> > So far the dataset I have dealt with was raw and used Lucene's >> tokenization >> > and stemming schemes. >> > >> > Thank you. >> > >> > Regards, >> > Sachin >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org