: Sunday gets ranked highly due to idf. How do I reduce this skewness
: due to the date-posted field? I saw a reference earlier to
: ConstantScoreRangeQuery on JIRA - is it the solution?
Yes. RangeQuery expands to a BooleanQuery containing all of the terms in
the. The number of terms (and the fr
Hi,
I am running a search for something akin to a news site, when each
news document has a date, title, keywords/bylines, summary fields and
then the actual content. Using Lucene for this database of documents,
it seems that:
1. The relevancy score is skewed drastically by the actual number of
ne
Chris
Thanks. Appreciate your comment about using ConstantScoreQuery as well.
Amrit
On 2/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> : I am experimenting with using a custom filter with QueryParser and ran
> into
> : some unanticipated issues with using NOT terms. I narrowed down the
hello
IndexReader.delete receives a docNum
How do I know a docNum given a document? I will always need to get
this number (sometimes called id in the javadocs) from the Hits.id?
thanks
--
Paulo Silveira
http://www.paulo.com.br/
--
Pasha Bizhan wrote:
Hi,
From: Daniel Noll [mailto:[EMAIL PROTECTED]
I don't know how this will be for efficiency. If you did it
that way, you would have to re-open the index for every
single document you add, otherwise you might miss a duplicate
which was added recently.
You do not need
Chris Hostetter wrote:
: I think that overriding getFieldQuery would work, yeah... you're right.
:It's just a matter of comparing efficienty of this:
:
: BooleanQuery of (TermQuery, FilteredQuery of (AllDocsQuery, Filter))
:
: to the efficiency of this:
:
: FilteredQuery of (TermQue
i am currently implementing lucene using multiple rmi servers as index
searchers,
has anyone done this using ejbs? (any tips?)
if so, are there any performance hits?
thanks in advance,
-
Relax. Yahoo! Mail virus scanning helps det
On the other hand, if you want be the most cheapest, why don't give a chance
to google search appliance?
: I am experimenting with using a custom filter with QueryParser and ran into
: some unanticipated issues with using NOT terms. I narrowed down the issue
...
: bquery = new BooleanQuery();
: bquery.add(new BooleanClause(fq, BooleanClause.Occur.MUST_NOT));
:
I am experimenting with using a custom filter with QueryParser and ran into
some unanticipated issues with using NOT terms. I narrowed down the issue
into the following test case. I am expecting a MUST_NOT booleanclause within
a booleanquery to return a resultset that is the complement of a MUST
cl
It seems, from the javadoc, that the 10K default is enforced to avoid a
possible OutOfMemoryError. I wonder how safe/unsafe it is to set the value to
maximum possible, if we don't impose any limit on customers' document sizes.
Perhaps, the best solution is to expose the value as configurable b
Thanks Hoss... You're absolutely right!
Kevin
On 2/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> : I need all the documents returned from the search and am manipulating
> the
> : results with a custom HitCollector, therefore I can't use filters.
>
> I don't understand this comment. The
: I need all the documents returned from the search and am manipulating the
: results with a custom HitCollector, therefore I can't use filters.
I don't understand this comment. There are certianly methods in the
Searchble interface that allow you to use both a Filter and a HitCollector
together
>
> One more thing: in case these queries are generated, you might
> consider building the corresponding (nested) BooleanQuery yourself
> instead of using the QueryParser.
>
> Regards,
> Paul Elschot
I'll give that a try. Thanks Paul.
On Thursday 09 February 2006 00:52, Kevin Dutcher wrote:
> Hey Everyone,
>
> I'm running into the "More than 32 required/prohibited clauses in query"
> exception when running a query. I thought I understood the problem but the
> following two scenarios confuse me.
>
> 1st - No Error
> 33 required
On Thursday 09 February 2006 15:25, Kevin Dutcher wrote:
> > I don't know a lot about the error your encountering (or not encountering
> > as the case may be) but please for hte love of all that is sane use a
> > Filter instead of putting all those categories in your Query.
> >
> > Your search perf
There is a HighFreqTerms class in contib/misc. that may be interesting to
you. I just modified it slightly locally last night to limit things to a
specific field, and will commit it later.
Otis
- Original Message
From: Dmitry Goldenberg <[EMAIL PROTECTED]>
To: java-user@lucene.apa
Hi,
Antworten:
1) No date set yet
2) I've been happily using 1.9 in production - see http://www.simpy.com/
3) Yes, there have been some memory improvements - see CHANGES.txt file in
Subversion
Otis
Hello all,
I have a couple of questions for the community about the 1.9
Lucene version. As I
Daniel,
If you end up trying all 3 options here, please report your findings
(speed/memory). I'm about to rework some of the Lucene stuff behind Simpy.com,
and am looking at Filters used this way (+ sort by date or some int) more and
more.
Thanks,
Otis
- Original Message
From: Chris
Definitely batch your adds/updates/deletes, and reuse the IndexReader as you
described instead of opening a new one for every search. I _believe_ you can
keep the same IndexWriter for adds, as long as you don't overlap it with an
IndexReader that does deletes. If you have Lucene in Action, che
I'm not sure if it will work better than what you've got, but you can try the
code from section 7.5 in Lucene in Action:
http://www.lucenebook.com/search?query=word+document+microsoft
The code is free, even if you don't have the book.
Otis
- Original Message
From: [EMAIL PROTECTED]
To
Hi
I dont know much about lucene's scoring, but my intuition tells me that
a boost of "2" tells lucene to regard that field/document as
"double-important", while a boost of "0.5" tells lucene to regard the
field/document as "half-important". Thus the boost is exponential, is
that right !?
If not,
The TextMining.org website keeps getting hacked and I don't have the
time to upgrade postnuke to a more secure version. Also, because of
legal reasons I can't maintain the software. I am more than willing
to "hand-off" the project to lucene or someone else. It's an apache 2
license so anyone ca
Chris,
Awesome stuff. A few questions: is your Excel extractor somehow better than
POI's? and, what do you see as the timeframe for adding WordPerfect support?
Are you considering supporting any other sources such as MS Project,
Framemaker, etc?
Thanx,
- Dmitry
_
Hello everybody.
I have a big index that will be stored in the FS. I have lots of
updates, insertions and deletions in the index, and I would like to
minimize the number of "phatom reads".
I ve seen in the wiki this link:
http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex
So, what about my i
> I don't know a lot about the error your encountering (or not encountering
> as the case may be) but please for hte love of all that is sane use a
> Filter instead of putting all those categories in your Query.
>
> Your search performance and your scores will thank you.
I need all the documents
Nick Burch wrote:
You could try using org.apache.poi.hwpf.HWPFDocument, and getting the
range, then the paragraphs, and grab the text from each paragraph. If
there's interest, I could probably commit an extractor that does this to
poi.
Yes, that's exactly what I'm doing. Having this in POI wo
On Thu, 9 Feb 2006, Christiaan Fluit wrote:
My experience is that the WordDocument class crashes on about 25% of the
documents, i.e. it throws some sort of Exception. I've tested POI
2.5.1-final as well as the current code in CVS, but both produce this
result. I even suspect the output to be 10
Hello all,
I'm replying to two threads at once as what I have to say relates to both.
My company recently started an open source project called Aperture
(http://sourceforge.net/projects/aperture), together with the German
DFKI institute. The project is still very much in alpha stage, but I do
Hello,
I use the Poi Api to parse MSword files in order to index the content to
enable lucene search.
For that I download the last jars from Poi (including the scratchdpad
one) and use the parser from lucenebook called POIWordDocHandler.
It works quiet good, but for some files the parser does
Have you considered running the .net version (dotLucene)? The converters for
Office and PDF are freely available and there is a cheap commercial IFilter
available for wordperfect files (and many others).
-Gwyn
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: 09
Hi,
> From: Daniel Noll [mailto:[EMAIL PROTECTED]
> I don't know how this will be for efficiency. If you did it
> that way, you would have to re-open the index for every
> single document you add, otherwise you might miss a duplicate
> which was added recently.
You do not need to reopen in
This is a real gotcha with Lucene in it's out of the box
configuration. In the several applications I've built to index
documents I've always hit this and had to set the maxFieldLength to
its maximum possible value. Is there still an argument to be made to
keep the default at 10K or would
>for hte love of all
> that is sane use a
> Filter instead of putting all those categories in
> your Query.
Try this one:
package org.apache.lucene.search;
import java.io.IOException;
import java.util.ArrayList;
import java.util.BitSet;
import java.util.Iterator;
import org.apache.lucene.ind
34 matches
Mail list logo