> You need to watch both the positionincrementgap
> (which, as I remember, gets added for each new field of the
> same name you add to the document). Make it 0 rather than
> whatever it is currently. You may have to create a new analyzer
> by subclassing your favorite analyzer and overriding the
>
You need to watch both the positionincrementgap
(which, as I remember, gets added for each new field of the
same name you add to the document). Make it 0 rather than
whatever it is currently. You may have to create a new analyzer
by subclassing your favorite analyzer and overriding the
getPositionI
gt; To: java-user@lucene.apache.org
> Subject: Re: Design questions
>
> Hi,
>
> I have a similar sitaution. I also considered using $. But
> for the sake of
> not running into (potential) problems with Tokenisers, I just
> defined a
> string in a config file which for s
> > Document doc = new Document()
> > for (int i = 0; i < pages.length; i++) {
> > doc.add(new Field("text", pages[i], Field.Store.NO,
> > Field.Index.TOKENIZED));
> > doc.add(new Field("text", "$$", Field.Store.NO,
> > Field.Index.UN_TOKENIZED));
> > }
>
> UN_TOKENIZED. Nice idea!
> Document doc = new Document()
> for (int i = 0; i < pages.length; i++) {
> doc.add(new Field("text", pages[i], Field.Store.NO,
> Field.Index.TOKENIZED));
> doc.add(new Field("text", "$$", Field.Store.NO,
> Field.Index.UN_TOKENIZED));
> }
UN_TOKENIZED. Nice idea!
I will check this
> Why not just use ?
Because nearly every analyzer removes it (SimpleAnalyzer, German, Russian,
French...)
Just tested it with luke in the search dialog.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional comman
Hi,
I have a similar sitaution. I also considered using $. But for the sake of
not running into (potential) problems with Tokenisers, I just defined a
string in a config file which for sure is never going to occur in a document
and will never be searched for, e.g.
dfgjkjrkruigduhfkdgjrugr
Cheers
I haven't really been following this thread that closely, but...
: Why not just use ? Check to insure that it makes
: it through whatever analyzer you choose though. For instance,
: LetterTokenizer will remove it...
1) i'm 99% sure you can do something like this...
Document doc = new
Why not just use ? Check to insure that it makes
it through whatever analyzer you choose though. For instance,
LetterTokenizer will remove it...
Erick
On Thu, Feb 14, 2008 at 4:41 PM, <[EMAIL PROTECTED]> wrote:
> > Rather than index one doc per page, you could index a special
> > token b
> Rather than index one doc per page, you could index a special
> token between pages. Say you index $ as the special
> token.
I have decided to use this version, but...
What token can I use? It must be a token which gets never removed by an
analyzer or altered in a way that it not uniqu
> Or, you could just do things twice. That is, send your text through
> a TokenStream, then call next() and count. Then send it all
> through doc.add().
Hm.
This means read the content twice, doesn't matter using an own analyzer oder
overriding/wrapping the main analyzer.
Is there anywhere a hoo
.
> -Original Message-
> From: Erick Erickson [mailto:[EMAIL PROTECTED]
> Sent: Donnerstag, 24. Januar 2008 20:56
> To: java-user@lucene.apache.org
> Subject: Re: Design questions
>
> I think you'll have to implement your own Analyzer and count.
> That is, every
t; > From: Erick Erickson [mailto:[EMAIL PROTECTED]
> > Sent: Freitag, 11. Januar 2008 16:16
> > To: java-user@lucene.apache.org
> > Subject: Re: Design questions
>
> > But you could also vary this scheme by simply storing in your document
> > the offsets for the be
> -Original Message-
> From: Erick Erickson [mailto:[EMAIL PROTECTED]
> Sent: Freitag, 11. Januar 2008 16:16
> To: java-user@lucene.apache.org
> Subject: Re: Design questions
> But you could also vary this scheme by simply storing in your document
> the offsets for
OK, thank you! I will try this out.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
See below
On Jan 11, 2008 9:36 AM, <[EMAIL PROTECTED]> wrote:
> Hi,
>
>
> > You could even store all of the page offsets in your
> > meta-data document
> > in a special field if you wanted, then lazy-load that field
> > rather than
> > dynamically counting.
>
> How can I lazy load a field?
>
See
Hi,
> You could even store all of the page offsets in your
> meta-data document
> in a special field if you wanted, then lazy-load that field
> rather than
> dynamically counting.
How can I lazy load a field?
> You'd have to be careful that your offsets
> corresponded to the data *after* it
You can do several things:
Rather than index one doc per page, you could index a special
token between pages. Say you index $ as the special
token. So your index looks like this:
last of page 1 first of page 2 last of page 2 first of
page 3
and so on. Now, if you use
Hi,
I have to index (tokenized) documents which may have very much pages, up to
10.000.
I also have to know on which pages the search phrase occurs.
I have to update some stored index fields for my document.
The content is never changed.
Thus I think I have to add one lucene document with the in
19 matches
Mail list logo