How are they delimited? If they're just a text stream, it seems
all you need is a whitespace tokenizer. Won'

How are you going to search them though? Is your query submission
process going to _also_ do the transformations or will you have
to construct a query-time analysis chain that mimics the pre-tokenization
you have at index time?

Best,
Erick

On Sun, Sep 14, 2014 at 8:34 PM, Sachin Kulkarni <kulk...@hawk.iit.edu> wrote:
> Hi Uwe,
>
> Thank you.
> I do not have the tokens serialized, so that reduces one step.
> I am reading the javadocs and will try it the way you mentioned.
>
> Regards,
> Sachin
>
> On Sun, Sep 14, 2014 at 5:11 PM, Uwe Schindler <u...@thetaphi.de> wrote:
>
>> Hi,
>>
>> If you have the serialized tokens in a file, you can write a custom
>> TokenStream that unserializes them and feeds them to IndexWriter as a Field
>> instance in a Document instance. Please read the javadocs how to write your
>> own TokenStream implementation and pass it using "new TextField(name,
>> yourTokenStream)".
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Sachin Kulkarni [mailto:kulk...@hawk.iit.edu]
>> > Sent: Sunday, September 14, 2014 10:06 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Can lucene index tokenized files?
>> >
>> > Hi,
>> >
>> > I have a dataset which has files in the form of tokens where the
>> original data
>> > has been tokenized, stemmed, stopworded.
>> >
>> > Is it possible to skip the lucene analyzers and index this dataset in
>> Lucene?
>> >
>> > So far the dataset I have dealt with was raw and used Lucene's
>> tokenization
>> > and stemming schemes.
>> >
>> > Thank you.
>> >
>> > Regards,
>> > Sachin
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to