Yes, this is a repeat... I mailed this a few days before and it never made
it to the list so I reposted. Now it suddenly appears... weird!
--- java-user@lucene.apache.org
wrote:
> On 21 Nov 2005, at 18:54, [EMAIL PROTECTED] wrote:
>
> > I'm using a StandardFilter and seeing some strange tokeni
I'm using a StandardFilter and seeing some strange tokenization.
Here's
the input:
apache.org hosts lucene at apache.org.
Here's the tokens it
outputs:
apache.org
hosts
lucene
at
apacheorg
Is this a bug
that apache.org and apache.org. don't convert to the same token?
-
Cool, I'll take a look at fixing this.
--- java-user@lucene.apache.org
wrote:
> On 21 Nov 2005, at 19:39, [EMAIL PROTECTED] wrote:
> > This is the results for the StandardTokenizer:
> >input - output
token -
> > output type
> > 1. 1.2 - 1.2 -
> > 2. 1.2.
- 1.2 -
> >
>
Sorry for the bad looking table. Retrying...
input string - output token
(output type)
1. 1.2 - 1.2 ()
2. 1.2. - 1.2 ()
3. a.b - a.b
()
4. a.b. - a.b. ()
5. www.apache.org - www.apache.org ()
6. www.apache.org. - www.apache.org. ()
--- java-user@lucene.apache.org
wrote:
This is the results for
This is the results for the StandardTokenizer:
input - output token -
output type
1. 1.2 - 1.2 -
2. 1.2. - 1.2 -
3. a.b - a.b -
4. a.b. - a.b. -
5.
www.apache.org - www.apache.org -
6. www.apache.org. - www.apache.org.
-
Number 6 should still be
how well does it work? does it provide the ability to search shortly after
adding a document?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
i want to use lucene to search shortly (within a second) after adding a
document.
closing a writer to ensure the new document is written and then opening an
index reader seems to be too slow on large indexes.
how do other people
handle this?
(i know this can be solved with a database but i'd
Thanks. I'll try that...
--- java-user@lucene.apache.org wrote:
Use HitCollector's
collect method:
>
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/HitCollector.html#collect(int,%20float)
>
> Otis
>
>
> --- [EMAIL PROTECTED] wrote:
>
> > hi,
> >
> > i need to retrieve th
hi,
i need to retrieve the raw scores (3.6, 2.8, etc) for a hit and not
the normalized score (1.0, 0.8, etc). commenting out the normalizing code
in Hits.java does what i want. is there a better way to do this?
i'm wondering
about adding a method to Similarity.java that looks like this:
boole
Is it possible to use a RAMDirectory to load a 5 GB index into RAM on Linux?
I have access to a server with 6 GB of RAM and will try it next week but
I've heard that Java on Linux may only support up to 2 GB of RAM per process.
Anyone already tried this?
Thanks.
Are people seeing a significant speed performance with Lucene when they upgrade
to JDK 1.5?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi Jian,
Thanks for the reply. The problem with that is it completely
ignores document length. A book that mentions "frog" 5 times in its 2,000
pages should be less relevant than a book that mentions "frog" 4 times in
its 4 pages.
I really want to lower the document length weight instead
of rem
Hi,
Short documents bubble to the top of the results because the field
length is short. Does anyone have a good strategy for working around this?
Will doing something like log(document length) flatten out my results while
still making them meaningful? I'm going to try some different approaches
Hi,
I'm comparing SpanNearQuery to PhraseQuery results and noticing about
an 8x difference on Linux. Is a SpanNearQuery doing 8x as much work?
I'm considering diving into the code if the results sounds unusual to people.
But if its really doing that much more work, I won't spend time optimiz
Hmmm... I'll look into that. I thought the MultiSearcher would still need
access to each index. Does the RemoteSearchable avoid that? Will it allow
me to delegate searching to multiple boxes and then collate the results
correctly?
Thanks for the tip about the RemoteSearchable.
--- java-us
Hi Daniel,
The problem is that if I tell Lucene about only one of the indexes
it has no way of knowing what the total document frequency is across the other
index servers.
Does that make sense? I think my collator will need to
calculate the idf somehow.
Thanks.
--- java-user@lucene.apa
Hi,
Due to the size of my index, I need to break it into several different
segments. I have a service that gets a query from the user and contacts each
index searcher service asynchronously and waits for the results. The results
are then collated and returned to the user.
The problem is tha
17 matches
Mail list logo