hi all:
our bussiness system generate some data,that information structrue like email
message,one message have some attachments,so we can use email message to think
of our data,I need index and search the message and its attachments,and when
display hits,must display two kinds of links for every
I have got a lot of valuable information in this thread so far.
Thanks to all.
In my last mail I mentioned only two fields because others' usage was
negligible and I thought they are not important. But now after *Toke *explained
the formulae, I think sorting on those fields would also be consuming
Solr's timestamp representation (TrieDateField) is tuned for space and
speed. It has a compressed representation, and sorts with far less
space than Strings.
Also you get something called a date facet, which lets you bucketize
facet searches by time block.
On Tue, Apr 27, 2010 at 1:02 PM, Toke Es
Thanks again for your help!! :)
Regards,
-Clara Vania-
From: Uwe Schindler
To: java-user@lucene.apache.org
Sent: Wed, April 28, 2010 12:38:04 AM
Subject: RE: Range Score in Lucene
This hast o do with combining multiple terms in a Boolean query. If you hav
I found the error of my ways. It was a typo. For linux2 I directed
setup.py to use
'linux2': '/usr/lib/jdk/java-6-sun-1.6.0.17', <--WRONG
rather than
'linux2': '/usr/lib/jvm/java-6-sun-1.6.0.17',
There was not a /usr/lib/jdk folder on my Ubuntu 8.04 box.
- Original Messag
Hi Hoss,
I didn't end up writing my own query (well I did, but all it does is rewrite
into another query). I found DisjunctionMaxQuery, which seemed a good fit
for what I was trying to do. Instead of TermQuery, I used ConstantScoreQuery
combined with TermsFilter to create queries that weren't depe
I'm trying to compile JCC, using python setup.py build
This is what I get:
~/pylucene-2.9.2-1/jcc$ python setup.py build running
build
running build_py
copying jcc/config.py -> build/lib.linux-x86_64-2.5/jcc
running build_ext
building 'jcc._jcc' extension
gcc -pthread -fno-s
Samarendra Pratap [samarz...@gmail.com] wrote:
> 1. Our default option is sort by score, however almost 8% of searches use
> sorting on a field (mmddHHMMSS). This field is indexed as string (not as
> NumericField or DateField).
Guessing that the timestamp is practically unique for each documen
Thanks for the explanation. The situation makes much more sense now.
Fortunately, I did wrap the result of Analyzer.tokenStream(). I had
contemplated adding it to the Analyzer as you described and warned not to.
- Original Message
From: Uwe Schindler
To: java-user@lucene.apache.or
A Reader can only be read one time, that’s the problem. Resetting a TokenStream
is not able to reset the Reader (see java.io.Reader API). To reply the same
tokens again, you must wrap with a Caching filter. This is also done in
Highlighters code.
The general contract of reset() is not to reset
Oops. Sorry. replied to wrong message.
- Original Message -
From: "Herbert Roitblat"
To:
Sent: Tuesday, April 27, 2010 12:01 PM
Subject: Re: HTMLStripReader, HTMLStripCharFilter
Great, I will look forward to it.
Thanks,
Herb
- Original Message -
From: "Justin"
To:
Sent:
Great, I will look forward to it.
Thanks,
Herb
- Original Message -
From: "Justin"
To:
Sent: Tuesday, April 27, 2010 11:47 AM
Subject: Re: HTMLStripReader, HTMLStripCharFilter
Thanks for the help. No more exception. Seems odd that I need to add a
filter to make reset apply to the
Thanks for the help. No more exception. Seems odd that I need to add a filter
to make reset apply to the stream's underlying reader.
- Original Message
From: Uwe Schindler
To: java-user@lucene.apache.org
Sent: Tue, April 27, 2010 12:00:31 AM
Subject: RE: HTMLStripReader, HTMLStripC
First off: if you haven't already make sure you OMIT_NORMS when indexing
this field, that way you don't have to worry about docs with "lots" of
numbers scoring low purely because of hte fieldNorm.
Second: i wouldn't bother with a custom query, i would stick with your
BooleanQuery appraoch, but
This hast o do with combining multiple terms in a Boolean query. If you have
only one term and no boost factors involved, you will get 1. I just repeat, the
score numbers are arbitrary scale, only compareable within one query.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.t
Really thanks for the quick reply,
I want to find documents similar to one document (let's call it document A) in
my index. To do this I use the MoreLikeThis class to help create query from
document A. I also included document A in my index, so I assumed that I will
have document A at the first
Thank you Koji.
Everything is now working as desired. You have been an invaluable
resource for helping to resolve this issue and I really appreciate the
time you spent reviewing this issue.
Best regards,
Steve
-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
Sent: M
Sorting by score down to the second will use a lot of memory. Can you
make it less granular? And I think that switching that field to a
NumericField will give you some savings - this has come up before but
I can't remember the details. I'm sure someone else will.
--
Ian.
On Tue, Apr 27, 2010
Samarendra,
In regard to point #2, the GC should indeed handle the clean-up and that
might explain the "idle" time with your original configuration during
major collections.
Have you checked that your machine is correctly identified as a server
and has optimized GC settings?
More info in the ex
Hi Ian. Thanks for the points
Here are my answers -
1. Our default option is sort by score, however almost 8% of searches use
sorting on a field (mmddHHMMSS). This field is indexed as string (not as
NumericField or DateField).
2. We are opening readers at the time of starting the application
There is no simple answer. However your app does sound to be using
rather a lot of memory for what you describe as simple searches.
Are you using lucene sorting? That can use lots of memory. How are
you using/reusing searchers/readers? Having multiple ones open, or
failing to close old ones, w
Hi.
I am searching for some guidance on right memory options for my Search
Server application. How much memory a lucene based application should be
given?
Till a few days back I was running my search server on java 1.4 with memory
options "-Xmx3600m" which was running quite fine. After upgrading
The score is an arbitrary number > 0. It's not normalized to anything, it
should only be used to e.g. sort the results. You cannot even compare scores
between two searches. They should only be used to compare hits *within* one
result set (e.g. sort as done in top docs).
-
Uwe Schindler
H.-H
Hi Clara,
Any particular reason why you'd need the score? Perhaps this would be of
help
http://lucene.apache.org/java/2_9_1/scoring.html
http://lucene.apache.org/java/2_3_2/scoring.pdf
Hope this explains whatever you were looking for.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The
Oooh -- I suspect you are hitting this issue:
https://issues.apache.org/jira/browse/LUCENE-2283
Your 3rd image ("fdt") jogged my memory on this one. Can you try
testing the trunk JAR from after that issue landed? (Or, apply that
patch against 3.0.x -- let me know if it does not apply cleanl
Hi all,
I am new to Lucene and I want to ask about range score that Lucene used,
because I got score greater than 1.
I'm using lucene-3.0.1 and using
MoreLikeThis to do document similarity and ScoreDoc class to get hits of my
search.
Thanks,
-Clara Vania-
26 matches
Mail list logo