Hi, I have a similar sitaution. I also considered using $. But for the sake of not running into (potential) problems with Tokenisers, I just defined a string in a config file which for sure is never going to occur in a document and will never be searched for, e.g.
dfgjkjrkruigduhfkdgjrugr Cheers, Adrian -- Java Software Developer http://www.databasesandlife.com/ On 15/02/2008, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > I haven't really been following this thread that closely, but... > > : Why not just use $$$$$$$$? Check to insure that it makes > > : it through whatever analyzer you choose though. For instance, > : LetterTokenizer will remove it... > > > 1) i'm 99% sure you can do something like this... > > Document doc = new Document() > for (int i = 0; i < pages.length; i++) { > doc.add(new Field("text", pages[i], Field.Store.NO, > Field.Index.TOKENIZED)); > doc.add(new Field("text", "$$", Field.Store.NO, > Field.Index.UN_TOKENIZED)); > } > > ...and you'll get your magic token regardless of whether it would normally > make it through your analyzer. In fact: you want it to be something your > analyzer could never produce, even if it appears in the orriginal text, so > you don't get false boundaries (ie: if you use an Analzeer that lowercases > everything, then "A" makes a perfectly fine boundary token. > > 2) if your goal is just to be able to make sure you can query for phrases > without crossing page boundaries, it's a lot simpler just to use are > really big positionIncimentGap with your analyzer (and add each page as a > seperate Field instance). boundary tokens like these are relaly only > neccessary if you want more complex queries (like "find X and Y on > the same page but not in the same sentence") > > > > > -Hoss > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >