Thanks!
On Tue, Jul 9, 2013 at 2:34 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > You can replace the term by their hash directly in the analyzer chain. > Just write a custom TermToBytesRef attribute that hashes the term to a > constant-length byte[] (using a AttributeFactory)! :-) This would give you > all features of hashed, constant length terms, but you would lose prefix > and wildcard queries. In fact, NumericTokenStream is doing this for numeric! > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Adrien Grand [mailto:jpou...@gmail.com] > > Sent: Tuesday, July 09, 2013 11:25 PM > > To: java-user@lucene.apache.org > > Subject: Re: posting list strings > > > > Hi, > > > > Lucene stores the string because it may need it to run prefix or range > > queries. We don't have a hash-based terms dictionary right now but I know > > some people wrote one since they don't need support for these queries, > see > > for instance the Earlybird paper[1]. Then if you can find a perfect > hashing > > function, you can just replace your terms by their hash. > > > > [1] > > http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012. > > pdf > > > > -- > > Adrien > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >