> Document doc = new Document() > for (int i = 0; i < pages.length; i++) { > doc.add(new Field("text", pages[i], Field.Store.NO, > Field.Index.TOKENIZED)); > doc.add(new Field("text", "$$", Field.Store.NO, > Field.Index.UN_TOKENIZED)); > }
UN_TOKENIZED. Nice idea! I will check this out. > 2) if your goal is just to be able to make sure you can query > for phrases > without crossing page boundaries, it's a lot simpler just to use are > really big positionIncimentGap with your analyzer (and add > each page as a > seperate Field instance). boundary tokens like these are relaly only > neccessary if you want more complex queries (like "find X and Y on > the same page but not in the same sentence") Hm. This is what Erik already recommended. I had to store the field with TermVector.WITH_POSITIONS, right? But I do not know the maximum number of terms per page and I do not know the maximum number of pages. I already had documents with more than 50.000 pages (A4) and documents with 1 page but 100 MB data. How many terms can 100 MB have? Hm... Since positions are stored as int I could have a maximum of 40.000 terms per page (50.000 pages * 40.000 term -> nearly Integer.MAX_VALUE). --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]