Hi Hrishi,
The only way you'll know is to try it with some subset of your data - some
queries can be very expensive, some are really easy. It'll depend on your
document size, the vocabulary (total number and distribution of terms), and
kinds of queries, as well as of course your hardware. I wo
Thanks Jake.
I have around 75 TB data to be indexed. So even though I do the sharding,
individual index file size might still be pretty high. And that's why I wanted
to find out whether there is any limit as such. And obviously whether such a
huge index files can be searched at all.
>From your
Dear Friend,
I have encountered some performance problems recently in lucene
search 2.9. I use a single IndexSearcher in the whole system, It seems
perfect when there is less than 10 threads doing search concurrenty.
Bu if there is more than 100 threads doing concurrent search,the
average resp
On Thu, Oct 22, 2009 at 10:29 PM, Hrishikesh Agashe <
hrishikesh_aga...@persistent.co.in> wrote:
> Can I create an index file with very large size, like 1 TB or so? Is there
> any limit on how large index file one can create? Also, will I be able to
> search on this 1 TB index file at all?
>
Leav
I am running Ubuntu 9.04 on 64 bit machine with NAS of 100 TB capacity. JVM is
running with 2.5 GB Xmx.
Can I create an index file with very large size, like 1 TB or so? Is there any
limit on how large index file one can create? Also, will I be able to search on
this 1 TB index file at all?
HI Michael:
I understand exactly what you mean.
I have done some experiments with the multiQ approach by carrying over
the bottom to next segment. (which would need to extend the
ScoreDocComparator api to support the same type of "convert", the difference
here is that it is optional, sup
Yes - in many cases, the other wins outweigh the queue transition cost -
in some cases it does not.
But we are talking degradation as you add more segments, not pure speed.
Degradation is worse now in the sort case.
John Wang wrote:
> With many other coding that happened in 2.9, e.g. the PQ api e
With many other coding that happened in 2.9, e.g. the PQ api etc., sorting
is actually faster than 2.4.
-John
On Thu, Oct 22, 2009 at 5:07 AM, Mark Miller wrote:
> Bill Au wrote:
> > Since Lucene 2.9 has per segment searching/caching, does query
> performance
> > degrade less than before (2.9) a
All previous suggestions are very good.
It's usually just the database. Lucene itself are faster enough.
Previously when I used Pentium III years ago, the indexing speed matters.
But upgrading the CPU to Xeon etc, the indexing bottle neck is on
database side.
Basically use the simplest SQL as
But with Lucene 2.9 you would want to use StringHelper.intern right?
adviner wrote:
> Thank you
>
>
> Uwe Schindler wrote:
>
>> Use this one:
>>
>>
>>
>> String fieldname="BookTitle";
>>
>>
>>
>> fieldname = fieldname.intern(); // because of this we need no
>> String.equals()
>>
>> TermEnum
If you look into the testcase I provided with my QueryParser example, you
will see, that the negative numbers have a problem in newTermQuery.
"-" is a control character in QueryParser, which means to do a "NOT" on this
term. Because of this the syntax of the query is wrong. To hit the negative
num
Hi, I have a problem to work support the NumericField in query parser.
My environment is like this:
Windows XP with
C:\work\> java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing)
I am using
Thank you
Uwe Schindler wrote:
>
> Use this one:
>
>
>
> String fieldname="BookTitle";
>
>
>
> fieldname = fieldname.intern(); // because of this we need no
> String.equals()
>
> TermEnum te = IndexReader.terms(new Term(fieldname, ""));
>
> do {
>
> Term term = te.term();
>
>
Use this one:
String fieldname="BookTitle";
fieldname = fieldname.intern(); // because of this we need no
String.equals()
TermEnum te = IndexReader.terms(new Term(fieldname, ""));
do {
Term term = te.term();
if (term == null || term.field() != fieldname) break;
System
nevermind I figured it out. I did this:
while ((term = termEnum.Term()) != null)
{
if (!term.Field().Equals("BookTitle"))
break;
map = new SearchResultMap();
map.Title = term.Text
22 okt 2009 kl. 20.00 skrev Chris Hostetter:
: I'm thinking a decorator with deletions on top of the original
reader, merged
: with the clone reader using a MultiReader. But this would still
require a new
you don't really mean a clone do you? ... you should just need a very
small index c
How do you know if your on your last term? I tried it and it does work but
continues. How do you know to check if its the last entry?
Thanks
Erick Erickson wrote:
>
> Try something like
> TermEnum te = IndexReader.terms(new Term("BookTitle", ""));
> do {
> Term term = te.term();
> if
Try something like
TermEnum te = IndexReader.terms(new Term("BookTitle", ""));
do {
Term term = te.term();
if (! term.field().equals("BookTitle")) break;
System.out.println(term.text());
} while (te.next());
Note that next() will merrily continue beyond the last term for
the field "Bo
: I'm thinking a decorator with deletions on top of the original reader, merged
: with the clone reader using a MultiReader. But this would still require a new
you don't really mean a clone do you? ... you should just need a very
small index containing the new versions of the docs, in a MultiRea
: Subject: Maximum index file size
: References:
: In-Reply-To:
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
I have a field in called BookTitle. I want to loop through all the entries
without doing a search. I just want to get the list of BookTitle's that is
in this field:
I tried IndexReader but MaxDocs() doesnt work because it returns everything
and I have other fields in their which is allot bigger
This is basically what LuSql does. The time increases ("8h to 30 min")
are similar. Usually on the order of an order of magnitude.
Oh, the comments suggesting most of the interaction is with the
database? The answer is: it depends.
With large Lucene documents: Lucene is the limiting factor (worsen
Profile your application first hand and find out where the bottlenecks really
are during indexing.
For me it was clearly the database calls which took most of the time. Due to a
very complex SQL Query.
I applied the Producer - Consumer pattern and put a blocking queue in between. I
have a threadpo
Hi Paul:
Mostly of the time indexing big tables is spent on the table full
scan and network data transfer.
Please take a quick look at my OOW08 presentation about Oracle
Lucene integration:
http://docs.google.com/present/view?id=ddgw7sjp_156gf9hczxv
specially slides 13 and 14 wh
Glen Newton wrote:
You might want to consider using LuSql, which is a high performance,
multithreaded, well documented tool designed specifically for moving
data from a JDBC database into Lucene (you didn't say if it was a
JDBC-accessible db...)
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswik
Besides the other suggestions, I'd really, really, really put
some instrumentationin the code and see where you're spending your time. For
a fast hint, put
a cumulative timer around your indexing part only. This will indicate
whether
the time is consumed in querying your database or indexing..
See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.
That includes some info on merge and buffer factors, and recommends
multiple threads. When I've done this sort of thing in the past it
has tended to be the database that is the problem, but maybe your
database is faster than mine.
You might want to consider using LuSql, which is a high performance,
multithreaded, well documented tool designed specifically for moving
data from a JDBC database into Lucene (you didn't say if it was a
JDBC-accessible db...)
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
Di
I'm building a lucene index from a database, creating 1 about 1 million
documents, unsuprisingly this takes quite a long time.
I do this by sending a query to the db over a range of ids , (10,000)
records
Add these results in Lucene
Then get next 10, and so on.
When completed indexing I the
Check this out with the .net port folks, but in the Java world, whenyou open
an IndexReader (which I presume you do after optimizing),
the first few queries fill various caches etc. and do run slowly. One
solution is to fire a few warmup queries at the newly opened reader
before letting your main a
Or you can use MappingCharFilter if you are using Lucene 2.9.
You can convert "c++" into "cplusplus" prior to running Tokenizer.
Koji
--
http://www.rondhuit.com/en/
Ian Lea wrote:
You need to make sure that these terms are getting indexed, by using
an analyzer that won't drop them and using
Hi
I am looking at handling special characters in the query as using certain
characters cause an exception. I looked at QueryParser.escape(..) to handle
this. It works to a certain extent for example using '!' doesn't cause an
exception. However when I use a wildcard then the wildcard is ignored.
Bill Au wrote:
> Since Lucene 2.9 has per segment searching/caching, does query performance
> degrade less than before (2.9) as more segments are added to the index?
> Bill
>
>
I think non sorting cases are actually faster now over multiple segments
- though you will still see performance degrad
Please post your question to: lucene-net-user[AT]incubator.apache.org for
Lucene.Net related topics. See http://incubator.apache.org/lucene.net/ for
subscription info.
-- George
-Original Message-
From: ShibbyUK [mailto:lewis_...@hotmail.com]
Sent: Thursday, October 22, 2009 7:17 AM
To:
Hi,
I am running Ubuntu 9.04 on 64 bit machine with NAS of 100 TB capacity. JVM is
running with 2.5 GB Xmx.
Can I create an index file with very large size, like 1 TB or so? Is there any
limit on how large index file one can create? Also, will I be able to search on
this 1 TB index file at all
Hi,
We're having some odd performance problems. Recently, searching our index is
becoming slow *after* performing an optimize. This is counter intuitive as
usually the optimize has the opposite effect!
We're using lucene.net 2.3.2 and have an index of 250,000 documents and
about 500 queries per
Can you provide more details? Which version of Lucene, Java, OS are
you using? Is there a small test case?
Hideously, it looks like somehow your path was supposed to be
c:\Indexes\_z3_1.del, but somehow the \ was lost.
Mike
On Wed, Oct 21, 2009 at 9:50 PM, mitu2009 wrote:
>
> Hi,
>
> Why do I
You need to make sure that these terms are getting indexed, by using
an analyzer that won't drop them and using Luke to check. Then, if
you are using QueryParser, you'll need to escape the special
characters e.g. c\+\+. See
http://lucene.apache.org/java/2_9_0/queryparsersyntax.html#Escaping%20Spe
Bill,
per segments search does not replace index optimisation neither it
prevents the performance degrade if your number of segments is
increasing. Depending on how your index changes it can give you a
performance improvement when reopening the index and it will certainly
prevent one or another GC
39 matches
Mail list logo