Well, it seems that this may be a solution for me too. But I'm afraid that someone one day will change this string. And then my app will not work anymore...
> -----Original Message----- > From: Adrian Smith [mailto:[EMAIL PROTECTED] > Sent: Freitag, 15. Februar 2008 13:02 > To: java-user@lucene.apache.org > Subject: Re: Design questions > > Hi, > > I have a similar sitaution. I also considered using $. But > for the sake of > not running into (potential) problems with Tokenisers, I just > defined a > string in a config file which for sure is never going to > occur in a document > and will never be searched for, e.g. > > dfgjkjrkruigduhfkdgjrugr > > Cheers, Adrian > -- > Java Software Developer > http://www.databasesandlife.com/ > > > > On 15/02/2008, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > > > > I haven't really been following this thread that closely, but... > > > > : Why not just use $$$$$$$$? Check to insure that it makes > > > > : it through whatever analyzer you choose though. For instance, > > : LetterTokenizer will remove it... > > > > > > 1) i'm 99% sure you can do something like this... > > > > Document doc = new Document() > > for (int i = 0; i < pages.length; i++) { > > doc.add(new Field("text", pages[i], Field.Store.NO, > > Field.Index.TOKENIZED)); > > doc.add(new Field("text", "$$", Field.Store.NO, > > Field.Index.UN_TOKENIZED)); > > } > > > > ...and you'll get your magic token regardless of whether it > would normally > > make it through your analyzer. In fact: you want it to be > something your > > analyzer could never produce, even if it appears in the > orriginal text, so > > you don't get false boundaries (ie: if you use an Analzeer > that lowercases > > everything, then "A" makes a perfectly fine boundary token. > > > > 2) if your goal is just to be able to make sure you can > query for phrases > > without crossing page boundaries, it's a lot simpler just to use are > > really big positionIncimentGap with your analyzer (and add > each page as a > > seperate Field instance). boundary tokens like these are > relaly only > > neccessary if you want more complex queries (like "find X and Y on > > the same page but not in the same sentence") > > > > > > > > > > -Hoss > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]