Using QueryParser to parse *tex* seems to create a PrefixQuery rather than
WildcardQuery due to the trailing *, rather than Wildcard because of the other
leading *.
As a result, this does not match, for example "context". I've swapped the order
of WILDTERM and PREFIXTERM in queryparsr.jj but
I also would like to know wheter searching in the indexfile eats lots of
memory...I always ran out of memory when doing searching,i.e. it gives the
exception java heap space(although I have put -Xmx768 in the VM argument) ...Is
there any way to solve it?
-
TV di
On 2/21/07, dmitri <[EMAIL PROTECTED]> wrote:
What is the point to calculate score if the result set is going to be sorted
by some field?
Is it ok to replace several terms query (a OR b OR c) with MatchAllQuery and
RangeFilters (from a to a, from b to b, from c to c) if sorting is needed?
Won't
Hi,
I had an exsisting index file with the size 20.6 GB...I havent done any
optimization in this index yet.Now I had a HDD of 100 GB,but apparently when I
create program to optimize(which simply calls writer.optimize() to this
indexfile),it gives the error that there is not enough space on
What is the point to calculate score if the result set is going to be sorted
by some field?
Is it ok to replace several terms query (a OR b OR c) with MatchAllQuery and
RangeFilters (from a to a, from b to b, from c to c) if sorting is needed?
Won't it be faster?
-
dmitri
--
: So I don't see why using a SpanNear that respects order and a large
: IncrementGap won't solve your problem.. Although it would return "odd"
i think the use case he's worreid about is that he needs to be able to
find matches just on the "start" of a persons name, ie...
Email#1 To:
I really think you need to stop obsessing on SpanFirst . I suspect that
this is leading you down an unrewarding path.
So I don't see why using a SpanNear that respects order and a large
IncrementGap won't solve your problem.. Although it would return "odd"
matches. Let's say you indexed "firs
Ahh, now it falls into place.
Thanks
Antony
Chris Hostetter wrote:
it's not called Analyzer.getPositionAfterGap .. it's
Analyzer.getPositionIncrementGap .. it's the Position Increment used when
there is a Gap -- so returning 0 means that no exra increment is used, and
multiple values are treate
: So, if you can add 1000, shouldn't setting 0 each time cause it to start at 0
: each time? The default Analyzer.getPositionIncrementGap always returns 0.
it's not called Analyzer.getPositionAfterGap .. it's
Analyzer.getPositionIncrementGap .. it's the Position Increment used when
there is a Ga
Hi Erick,
What this does is allow you to put gaps between successive sets of terms
indexed in the same field. For instance...
doc.add("field", "some stuff");
doc.add("field", "bunch hooey");
doc.add("field", "what is this");
writer.add(doc);
In this case, there would be the following positions,
I think you can get MUCH better efficiency by using TermEnum/TermDocs. But I
think you need to index (UN_TOKENIZED) your primary key (although now I'm
not sure. But I'd be surprised if TermEnum worked with un-indexed data.
Still, it'd be worth trying but I've always assumed that TermEnums only
wor
: so I thought that sounded good, but there does not seem to be a way to set it
: and most of the Analyzers just seem to use the base Analyzer method which
: returns 0, so I'm now confused as to what this actually does in practice.
by default all the analyzers return 0, but you can subclass any a
: Could someone enlighten me a bit about the subject? When do I want to
: use a MultiSearcher rather than a searcher running of a MultiReader?
: There seems to be a bunch of limitations in the MultiSearcher, and it
: is these that made me curious.
as i understand it the limitations of the MultiSe
: A question about efficiency and the internal workings of the Hits class.
: When we make a call to IndexSearcher's search method thus:
:
: Hits hits = searcher.Search(query);
:
: Do we actually, physically get back all the results of the query even if
: there are 20 million results or for efficien
I have an index where I'm storing the primary key of my database record as
an unindexed field. Nightly I want to update my search index with any
database changes / additions.
I don't really see an efficient way to update these records besides doing
something like this which I'm worried with thr
When adding documents to an index has anyone seen either
java.lang.ClassCastException: org.apache.lucene.analysis.Token cannot be cast to
org.apache.lucene.index.Posting
at
org.apache.lucene.index.DocumentWriter.sortPostingTable(DocumentWriter.java:238)
at org.apache.lucene.index.DocumentW
Excellent! I'll be getting this first thing in the morning...
For a guy who's "really busy at his day job" you sure turned this around
quickly!
Erick
On 2/21/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Hi all,
I'm happy to announce that a new version of Luke - the Lucene Index
Toolbox -
See below..
On 2/21/07, Antony Bowesman <[EMAIL PROTECTED]> wrote:
Hi Erick,
> I'm not sure you can, since all the interfaces I use alter the increment
> between successive terms, but I'll be the first to admit that there are
> many
> nooks and crannies that I don't know about... But I suspect
optimizing away the expensive cases is your best bet if you can do it ...
another option is to use a custom HitCollector which keeps track of how
long it's been running and throws a subclass of RUntimeException which you
explicitly catch and deal with as appropriate if the query has been taking
t
: Well, here's my current thoughts on acheiveing this. Instead of putting
: a 1000 space gap between elements of the 1ll field could I not use a
: character that isn't used in the data such as ~ and then somehow (don't
: know how) use that to search all fields?
you could certianly introduce an a
Hi all,
I'm happy to announce that a new version of Luke - the Lucene Index
Toolbox - is now available. As usually, you can get it from:
http://www.getopt.org/luke
Highlights of this release:
* support for Lucene 2.1.0 release and earlier
* pagination of search results
* support for many
Hi Erick,
I'm not sure you can, since all the interfaces I use alter the increment
between successive terms, but I'll be the first to admit that there are
many
nooks and crannies that I don't know about... But I suspect that a negative
increment is not supported intentionally
I read your
Hi all,
A question about efficiency and the internal workings of the Hits class.
When we make a call to IndexSearcher's search method thus:
Hits hits = searcher.Search(query);
Do we actually, physically get back all the results of the query even if
there are 20 million results or for efficiency
Nothing jumps out at me
Erick
On 2/21/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:
Sorry I didn't make myself clear at all. Remember you said that it is
possible to do this:
> Sure. Convert your simple queries into span queries (which are also
> relatively simple). Then, when you index
I'll add that in a web application that using Hits to page through
results is perfectly acceptable. Going to these other APIs is a bit
more complicated and often unnecessary. Don't prematurely optimize! :)
Erik
On Feb 21, 2007, at 8:07 AM, Erick Erickson wrote:
See TopDocs, HitC
Hi
I've overcome this problem without HitCollector, I build an interface just
like java.sql.ResultSet and its implementation class accept a Hits as
parameter and provide next() previous() etc. method to navigate between
records.
in my opinion this is a good solution.
Hope this help you
On 2/21/0
have a look at LuceneQueryOptimizer.java in nutch
- Original Message
From: Tim Johnson <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 21 February, 2007 3:34:36 PM
Subject: Stop long running queries
I'm having issues with some queries taking in excess of 500 secs t
I'm having issues with some queries taking in excess of 500 secs to run to
completion. The system being used consists of ~100 million docs spilt up
across ~600 indexes. The indexes are of various sizes from 15MB to 8GB and
all searches done in the system require an exact count of matching hits.
Th
I might be missing something because TopDocs seems to only be about
finding the relevancy of documents and HitCollector doesn't seem to be
relavent either.
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: 21 February 2007 13:08
To: java-user@lucene.apache.org
Subje
On Feb 21, 2007, at 2:35 AM, Chris Hostetter wrote:
the other situation i brought up in that thread from way back was
something Solr doesn't currently have a good solution for: one
field value for display, but other values (in the same field) for
searching ... likewise indexing one field value
Sorry I didn't make myself clear at all. Remember you said that it is
possible to do this:
> Sure. Convert your simple queries into span queries (which are also
> relatively simple). Then, when you index everything in the "all"
> field, subclass your analyzer to return a large PositionIncrement
Thanks Karl and Daniel
I am already disponing of the Searchers I am using. And regarding
IndexWriter.setTermIndexInterval(), I need the indexing to be as fast
as possible, is the searches where I dont need any speed and prefer to
keep the memory low.
javier
On 2/14/07, Daniel Naber <[EMAIL PROT
On Feb 20, 2007, at 11:35 PM, Chris Hostetter wrote:
the biggest differnece is that the field infos aren't globals, so as
segments merge and old segments get deleted old data (and field info)
vanishes into hte ether ... i take advantage of that a lot when
planning
upgrades ... many types of fi
I'm not sure you can, since all the interfaces I use alter the increment
between successive terms, but I'll be the first to admit that there are many
nooks and crannies that I don't know about... But I suspect that a negative
increment is not supported intentionally
But I really doubt you wan
See TopDocs, HitCollector, etc. Don't iterate through a Hits objects to get
docs beyond, say, 100 since it's designed to efficiently return the first
100 documents but re-executes the queries each 100 or so times you advance
to the next document.
Erick
On 2/21/07, Kainth, Sachin <[EMAIL PROTECTE
I don't see what you're getting at. There are only two forms of a query
term
field:value
value
And the second is really the first with the default field you specified in
the parser implied. So just think of all terms you specify in a query as
field:term.
Having some "special character" in th
Hi
MultiSearcher we have used whenwe use more than one folder. As of we used
so far we did not had much issues with multisearcher. The index at times
becomes slow when you incluse more no of folder to search
We have the full index in one folder and the incremental index in another
folder so that
Could someone enlighten me a bit about the subject? When do I want to
use a MultiSearcher rather than a searcher running of a MultiReader?
There seems to be a bunch of limitations in the MultiSearcher, and it
is these that made me curious.
--
karl
--
Hi,
I have a field to which I add several bits of information, e.g.
doc.add(new Field("x", "first bit"));
doc.add(new Field("x", "second part"));
doc.add(new Field("x", "third section"));
I am using SpanFirstQuery to search them with something like:
while...
SpanTermQuery stquery = new SpanT
Hello,
I was wondering if Lucene provides any mechanism which helps in
pagination. In other words is there a way to return the first 10 of 500
results and then the next 10 and so on.
Cheers
This email and any attached files are confidential and copyright protected. If
you are not the address
Well, here's my current thoughts on acheiveing this. Instead of putting
a 1000 space gap between elements of the 1ll field could I not use a
character that isn't used in the data such as ~ and then somehow (don't
know how) use that to search all fields?
-Original Message-
From: Chris Hos
41 matches
Mail list logo