RE: Design questions

2008-02-15 Thread spring
> You need to watch both the positionincrementgap > (which, as I remember, gets added for each new field of the > same name you add to the document). Make it 0 rather than > whatever it is currently. You may have to create a new analyzer > by subclassing your favorite analyzer and overriding the >

Re: Design questions

2008-02-15 Thread Erick Erickson
You need to watch both the positionincrementgap (which, as I remember, gets added for each new field of the same name you add to the document). Make it 0 rather than whatever it is currently. You may have to create a new analyzer by subclassing your favorite analyzer and overriding the getPositionI

RE: Design questions

2008-02-15 Thread spring
gt; To: java-user@lucene.apache.org > Subject: Re: Design questions > > Hi, > > I have a similar sitaution. I also considered using $. But > for the sake of > not running into (potential) problems with Tokenisers, I just > defined a > string in a config file which for s

RE: Design questions

2008-02-15 Thread spring
> > Document doc = new Document() > > for (int i = 0; i < pages.length; i++) { > > doc.add(new Field("text", pages[i], Field.Store.NO, > > Field.Index.TOKENIZED)); > > doc.add(new Field("text", "$$", Field.Store.NO, > > Field.Index.UN_TOKENIZED)); > > } > > UN_TOKENIZED. Nice idea!

RE: Design questions

2008-02-15 Thread spring
> Document doc = new Document() > for (int i = 0; i < pages.length; i++) { > doc.add(new Field("text", pages[i], Field.Store.NO, > Field.Index.TOKENIZED)); > doc.add(new Field("text", "$$", Field.Store.NO, > Field.Index.UN_TOKENIZED)); > } UN_TOKENIZED. Nice idea! I will check this

RE: Design questions

2008-02-15 Thread spring
> Why not just use ? Because nearly every analyzer removes it (SimpleAnalyzer, German, Russian, French...) Just tested it with luke in the search dialog. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comman

Re: Design questions

2008-02-15 Thread Adrian Smith
Hi, I have a similar sitaution. I also considered using $. But for the sake of not running into (potential) problems with Tokenisers, I just defined a string in a config file which for sure is never going to occur in a document and will never be searched for, e.g. dfgjkjrkruigduhfkdgjrugr Cheers

Re: Design questions

2008-02-14 Thread Chris Hostetter
I haven't really been following this thread that closely, but... : Why not just use ? Check to insure that it makes : it through whatever analyzer you choose though. For instance, : LetterTokenizer will remove it... 1) i'm 99% sure you can do something like this... Document doc = new

Re: Design questions

2008-02-14 Thread Erick Erickson
Why not just use ? Check to insure that it makes it through whatever analyzer you choose though. For instance, LetterTokenizer will remove it... Erick On Thu, Feb 14, 2008 at 4:41 PM, <[EMAIL PROTECTED]> wrote: > > Rather than index one doc per page, you could index a special > > token b

RE: Design questions

2008-02-14 Thread spring
> Rather than index one doc per page, you could index a special > token between pages. Say you index $ as the special > token. I have decided to use this version, but... What token can I use? It must be a token which gets never removed by an analyzer or altered in a way that it not uniqu

RE: Design questions

2008-01-24 Thread spring
> Or, you could just do things twice. That is, send your text through > a TokenStream, then call next() and count. Then send it all > through doc.add(). Hm. This means read the content twice, doesn't matter using an own analyzer oder overriding/wrapping the main analyzer. Is there anywhere a hoo

RE: Design questions

2008-01-24 Thread spring
. > -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Donnerstag, 24. Januar 2008 20:56 > To: java-user@lucene.apache.org > Subject: Re: Design questions > > I think you'll have to implement your own Analyzer and count. > That is, every

Re: Design questions

2008-01-24 Thread Erick Erickson
t; > From: Erick Erickson [mailto:[EMAIL PROTECTED] > > Sent: Freitag, 11. Januar 2008 16:16 > > To: java-user@lucene.apache.org > > Subject: Re: Design questions > > > But you could also vary this scheme by simply storing in your document > > the offsets for the be

RE: Design questions

2008-01-24 Thread spring
> -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Freitag, 11. Januar 2008 16:16 > To: java-user@lucene.apache.org > Subject: Re: Design questions > But you could also vary this scheme by simply storing in your document > the offsets for

RE: Design questions

2008-01-13 Thread spring
OK, thank you! I will try this out. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Design questions

2008-01-11 Thread Erick Erickson
See below On Jan 11, 2008 9:36 AM, <[EMAIL PROTECTED]> wrote: > Hi, > > > > You could even store all of the page offsets in your > > meta-data document > > in a special field if you wanted, then lazy-load that field > > rather than > > dynamically counting. > > How can I lazy load a field? > See

RE: Design questions

2008-01-11 Thread spring
Hi, > You could even store all of the page offsets in your > meta-data document > in a special field if you wanted, then lazy-load that field > rather than > dynamically counting. How can I lazy load a field? > You'd have to be careful that your offsets > corresponded to the data *after* it

Re: Design questions

2008-01-09 Thread Erick Erickson
You can do several things: Rather than index one doc per page, you could index a special token between pages. Say you index $ as the special token. So your index looks like this: last of page 1 first of page 2 last of page 2 first of page 3 and so on. Now, if you use