Most likely the string jakarta&apache is analyzed as a single word,
both at indexing time and at search time.
See also "AnalysisParalysis" in Lucene Wiki.
"ruchi thakur" <[EMAIL PROTECTED]> wrote on 07/03/2007 20:39:27:
> Thanks Patrick. One more question. The info in link says to use the belo
Erick and Mark thank you very much, you really give me good information. so
I decided to try HitCollector and see how it works. but about storing
document ID I dont think it is good because the result may be exceed than 50
000 and I just were optimistic about telling that number ;)
any way, I wil
Thanks Patrick. One more question. The info in link says to use the below
query for phrase
"jakarta apache" . It works fine.
But when i run jakarta&apache also, it has the same effect, ie; like
a phrase. It works fine too. Though it is working but still am little
doubtful as i could n
I understand the NPE could be all reasonable changes. Those are
changes that a Lucene-API may need to pay attention.
The "java.io.IOException:read past EOF" is pretty consistent for my
case. I have run it on two computers and got the same error. After
changing back to indexWriter.addIndexes(direc
Hi Chris, thanks for sharing this info (see below)
"Chris Lu" <[EMAIL PROTECTED]> wrote on 07/03/2007 18:32:22:
> I would like to share my experience for upgrading from Lucene 1.9 to
> Lucene 2.2, build 515893.
>
> I have been working on a product called DBSight. It has both a
> designing web UI
http://lucene.apache.org/java/docs/scoring.html
"ashwin kumar" <[EMAIL PROTECTED]> wrote on 07/03/2007 18:54:49:
> hi all when i search using lucene i am getting the path of the documents
in
> which the search string is found along with this
>
> i am also gettin a score . my question is
>
> what
hi all when i search using lucene i am getting the path of the documents in
which the search string is found along with this
i am also gettin a score . my question is
what is this score?
whats the use of score?
how the score is given for each document ?
thanks
regards
ashwin
Also, QueryParser would generate that combination with:
*:* -naughty1 -naughty2
> Thanks! I was not aware of that class, for some reason.
>> http://lucene.apache.
>> org/java/docs/api/org/apache/lucene/search/MatchAllDocsQuery.html
>>
>> You can use that Query in front of a NOT query clause.
I would like to share my experience for upgrading from Lucene 1.9 to
Lucene 2.2, build 515893.
I have been working on a product called DBSight. It has both a
designing web UI for configuring database crawl, and also capabilities
to serve search requests like later-emerged SOLR. So DBSight can do
I thought about this, as I think overall the resources required would be less
than creating a filter. Ultimately I decided against it for a few reasons:
1) I'm working with an existing index of ~50 million documents, I don't want to
reindex the whole thing, or even just the documents that contai
See TermFreqVector, HitCollector, perhaps TopDocs, perhaps
TermEnum. Make sure you create your index such that frequencies
are stored (see the FAQ).
Erick
On 3/7/07, teramera <[EMAIL PROTECTED]> wrote:
So after I execute a search I end up with a 'Hits' object. The number of
Hits
is the order
On Wednesday 07 March 2007 18:12, Philipp Nanz wrote:
> Thanks for your answers. Your input is really appreciated :-)
>
> @Paul Elschot:
> Thanks for the hint. I guess I could use coord() to penalize missing
> terms like this:
>
> Query: a b c d
> Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1
On Wednesday 07 March 2007 16:07, Greg Gershman wrote:
> I'm attempting to create a profanity filter. I thought to use a QueryFilter
created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I
have run
into is that, as a pure negative query is not supported (a query for (-term
So after I execute a search I end up with a 'Hits' object. The number of Hits
is the order of a million.
What I want to do is from these Hits is extract term frequencies for a few
known fields. I don't have a global list of terms for any of the fields but
want to generate the term frequency based
Not sure if this helpful given your proposed solution, but could you
do something on the indexing side, such as:
1. Remove the profanity from the token stream, much like a
stopword. This would also mean stripping it from the display text
2. If your TokenFilter comes across a profanity, some
Hi,
Please suggest what should be the query String for a pharse search.
Did you take a look at:
http://lucene.apache.org/java/docs/queryparsersyntax.html ?
Patrick
: Query: a b c d
: Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1
: Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75
:
: Doc would score higher. I guess that might be a valid solution.
: There is a drawback though, i.e. sloppyFreq(1) * coord(4, 4) = 0,5
: So a perfect match with one insertion
One point: if you use stemming, or some other modification of the terms before
indexing, you'll need to make sure the terms you create to match against are
also stemmed.
Greg
- Original Message
From: Greg Gershman <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, Mar
Hello,
Please suggest what should be the query String for a pharse search.
Thanks and Regards,
Ruchi
Thanks for your answers. Your input is really appreciated :-)
@Paul Elschot:
Thanks for the hint. I guess I could use coord() to penalize missing
terms like this:
Query: a b c d
Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1
Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75
Doc would score
Hi,
On Tue, 2007-03-06 at 15:34 -0500, Andy Liu wrote:
> Is there a working solution out there that would let me use ParallelReader
> to search over a large, immutable index and a smaller, auxillary index that
> is updated frequently? Currently, from my understanding, the ParallelReader
> fails
From my understanding, MultiSearcher is used to combine two indexes that
have the same fields but different documents. ParallelReader is used to
combine two indexes that have same documents but different fields. I'm
trying to do the latter. Is my understanding correct? For example, what
I'm t
Ah. Sorry. Last post was a ProfanitySelector rather than ProfanityFilter! -
this fixes it anyway
naughty1 naughty2 xxx
- Original Message
From: mark
Sounds like the sort of filter that could be usefully cached.
You can do all this in Java code or the XML query parser (in contrib) might be
a quick and simple way to externalize the profanity settings in a stylesheet
which is actually used at query time e.g.
<
Thanks! I was not aware of that class, for some reason.
I tried creating a NegativeQueryFilter, it works just fine. Can you think of
any reason why one approach would be better than the other? If there's
interest, I'm happy to post the NegativeQueryFilter.
Greg
- Original Message
Fr
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/MatchAllDocsQuery.html
You can use that Query in front of a NOT query clause.
Greg Gershman wrote:
I'm attempting to create a profanity filter. I thought to use a QueryFilter
created with a Query of (-$#!+ AND [EMAIL PROTECTED] A
You should not be returning 50 thousand documents. Since you are
implementing paging, you should only return enough to cover your page
size. If a user is viewing page 1 with documents 1-10, you send back
information for 10 of the docs. On page 2, 10-20, you send back
information for 10 of the d
I'm attempting to create a profanity filter. I thought to use a QueryFilter
created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I
have run into is that, as a pure negative query is not supported (a query for
(-term) DOES NOT return the inverse of a query for (term)), I b
You may not be able to store all the documents, but what about
just storing the document IDs in a list?
And remember that a Hits object re-queries the index every 100
documents or so when you iterate through it, so if you're really
using a Hits object, you're re-executing the query anyway.
You m
yes I am very concerned about this because we have a big project with many
users and I am responsible for this. the thing that preoccupied my mind is
application performance because there is more than 500 thousands records
(documents).
a single search may returns about 50 thousand documents and i
To address your hits question: I wouldn't keep hits around, but would
re-search instead. It is often more of a headache than a time savings to
keep around all of the Hits objects and to have to manage them. I made
my own Hits object that does no caching because of this. Pagination is
often best
You only want a single IndexSearcher for EVERY user searching an index.
IndexAccessor will manage this for you. The only reason you might have
more than one IndexSearcher is if you have more than one index to search
or some additional MultiSearchers. You always want ONE IndexSearcher,
ONE Index
32 matches
Mail list logo