I haven't really been following this thread that closely, but... : Why not just use $$$$$$$$? Check to insure that it makes : it through whatever analyzer you choose though. For instance, : LetterTokenizer will remove it...
1) i'm 99% sure you can do something like this... Document doc = new Document() for (int i = 0; i < pages.length; i++) { doc.add(new Field("text", pages[i], Field.Store.NO, Field.Index.TOKENIZED)); doc.add(new Field("text", "$$", Field.Store.NO, Field.Index.UN_TOKENIZED)); } ...and you'll get your magic token regardless of whether it would normally make it through your analyzer. In fact: you want it to be something your analyzer could never produce, even if it appears in the orriginal text, so you don't get false boundaries (ie: if you use an Analzeer that lowercases everything, then "A" makes a perfectly fine boundary token. 2) if your goal is just to be able to make sure you can query for phrases without crossing page boundaries, it's a lot simpler just to use are really big positionIncimentGap with your analyzer (and add each page as a seperate Field instance). boundary tokens like these are relaly only neccessary if you want more complex queries (like "find X and Y on the same page but not in the same sentence") -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]