You can use the payload functionality (have a look at
BoostingTermQuery and Michael B. excellent ApacheCon talk at
http://people.apache.org/~buschmi/apachecon/). Other option is to put
the synonyms into a separate field and boost that less than the main
field.
-Grant
On Dec 27, 2007, at 4
Happy festivus everyone,
So I have my fancy new stemmed synonym based Lucene index. Let's say I have
the following synonym defined:
radiation -> radiotherapy (and the reverse)
The search results rank all results exactly the same. Is there a way to
Boost the actual search term a little higher t
Thanks.
Date: Wed, 26 Dec 2007 13:07:03 -0500
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Re: Pagination ...
CC: java-user@lucene.apache.org
You might want to take a look at Solr (http://lucene.apache.org/solr/). You
could either use Solr directly, or see how they implement paging.
You were right. I forgot I optimized the index at the end. Thanks for your
help.
Erick Erickson wrote:
>
> Are you optimizing your index? I suspect that if you are, that's
> the problem.
>
> Why does it matter? Is this just a curiosity question or is there
> some problem?
>
> Best
> Erick
>
Thanks for your reply, Erick. I am not optimizing my index. I am trying to
keep heap use to a minimum. The indexing service that I'm working on will be
part of a wizard in an RCP framework and heap use is crucial for
performance. Do you know why its doing that?
Erick Erickson wrote:
>
> Are you
Oh, now I get it . While I thoroughly approve of the strong
typing in generics, I continually trip over code written for the 1.4
code base not being approved by the 1.5 compiler...
Best
Erick
On Dec 27, 2007 12:29 PM, Frank Schima <[EMAIL PROTECTED]> wrote:
>
> Hi Erick,
>
>
>
> Erick Erickson w
Are you optimizing your index? I suspect that if you are, that's
the problem.
Why does it matter? Is this just a curiosity question or is there
some problem?
Best
Erick
On Dec 27, 2007 12:18 PM, tgospodinov <[EMAIL PROTECTED]> wrote:
>
> Does anyone know why JVM heap use almost doubles at the v
Hi Erick,
Erick Erickson wrote:
>
> I don't think this has anything to do with Lucene, the problem
> seems to be that your compiler can't find the Java Stack
> class.
>
> You need to set your classpath to include wherever
> java.utils is on your disk.
>
I agree it's a Java issue. I'm just u
Does anyone know why JVM heap use almost doubles at the very end when
indexing in memory?
around 9 megs @ 1:03 min into indexing - around 18 megs @ 1:05 min when
indexing is complete -> heap use jumps about 9 megs in 2 sec!?
--
View this message in context:
http://www.nabble.com/JVM-heap-when-
I don't think this has anything to do with Lucene, the problem
seems to be that your compiler can't find the Java Stack
class.
You need to set your classpath to include wherever
java.utils is on your disk.
Erick
On Dec 27, 2007 10:56 AM, Frank Schima <[EMAIL PROTECTED]> wrote:
>
> Hello all,
Hello all,
I'm trying to implement a synonym engine in Lucene 2.2 based on the code in
the Lucene In Action book. However, I'm getting compile errors:
My Synonym filter looks like this:
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.TokenFilter;
import org.ap
As long as you control both ends (i.e. what's indexed and what's searched)
then UN_TOKENIZED is fine. Note that case has to match, etc.
As an added benefit, you can sort by the field too...
Best
Erick
On Dec 27, 2007 10:31 AM, webspeak <[EMAIL PROTECTED]> wrote:
>
> Hello,
>
> Thank you for you
Hello,
Thank you for your reply :-)
The customer value will be choosed from a dropdown list.The value that it
will be selected must match the value in the CUSTOMER field.
I think I don't have to tokenized it... as it is exact match.
Erick Erickson wrote:
>
> Well, it depends upon what you w
Well, it depends upon what you want to accomplish. By indexing
UN_TOKENIZED, the text is NOT broken up. So indexing
"some text" will not match if you search on "some". or "text" or
even "text some".
You really, really, really need to tell us what it is you want to
accomplish before anyone can sugg
Depends on whether you want fuzzy matches on Customer or not.
Assuming this value contains things like first and last name, I would
think you would want to tokenize so that you can search for those
separately. If it truly contains something that is a single token,
then this should be fine
Eric,
Thanks for your advise. I implement a version count to sych the index and
search.
2007/12/27, Erick Erickson <[EMAIL PROTECTED]>:
>
> If you search the mail archives, you'll find many discussions
> of the fact that when you open an index reader, it takes a
> snapshot of the index and subseq
Hello,
I would like to search documents by "CUSTOMER".
So I search on the field "CUSTOMER" using a KeywordAnalyzer.
The CUSTOMER field is indexed with those params:
Field.Index.UN_TOKENIZED
Field.Index.Store
Is it the Best Practice ?
--
View this message in context:
http://www.nabble.com/Sea
If you search the mail archives, you'll find many discussions
of the fact that when you open an index reader, it takes a
snapshot of the index and subsequent modifications to the
index are not available until the searcher is closed and re-opened.
It is NOT a good idea to open a new reader every ti
Hi all,
I encounter a strange probelm. To improve performance, I open the
indexreader at the start time and reuse it in later search. I have another
process running to do online indexing. The search service and indexing
service is accessing the same index folder. But I found out the search
servi
Try printing all these after you close the writer:
- ((FSDirectory) dir).getFile().getAbsolutePath()
- dir.list().length (n)
- dir.list()[0], .. , dir.list[n]
This should at least help you verify that an index was created and where.
Regards,
Doron
On Dec 27, 2007 12:26 PM, Liaqat Ali <[EMAIL PR
Doron Cohen wrote:
On Dec 27, 2007 11:49 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
I got your point. The program given does not give not any error during
compilation and it is interpreted well. But the it does not create any
index. when the StandardAnalyzer() is called without Stopwords list
On Dec 27, 2007 11:49 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
> I got your point. The program given does not give not any error during
> compilation and it is interpreted well. But the it does not create any
> index. when the StandardAnalyzer() is called without Stopwords list it
> works well, b
Doron Cohen wrote:
This is not a self contained program - it is incomplete, and it depends
on files on *your* disk...
Still, can you show why you're saying it indexes stopwords?
Can you print here few samples of IndexReader.terms().term()?
BR, Doron
On Dec 27, 2007 10:22 AM, Liaqat Ali <[EMAIL
This is not a self contained program - it is incomplete, and it depends
on files on *your* disk...
Still, can you show why you're saying it indexes stopwords?
Can you print here few samples of IndexReader.terms().term()?
BR, Doron
On Dec 27, 2007 10:22 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Hi Liaqat,
Are you sure that the Urdu characters are being correctly interpreted
by the JVM even during the file I/O operation?
I would expect Unicode characters to be encoded as multi-byte
sequences and so, the string-matching operations would fail (if the
literals are different from the
Doron Cohen wrote:
Hi Liagat,
This part of the code seems correct and should work, so problem
must be elsewhere.
Can you post a short program that demonstrates the problem?
You can start with something like this:
Document doc = new Document();
doc.add(new Field("text",URDU_STOP_WOR
Hi Liagat,
This part of the code seems correct and should work, so problem
must be elsewhere.
Can you post a short program that demonstrates the problem?
You can start with something like this:
Document doc = new Document();
doc.add(new Field("text",URDU_STOP_WORDS[0] +
27 matches
Mail list logo