Hi all,
I'm loath to stick this in a Jira issue yet, until I've run it past you.
I've been looking at it for a while so I'd like to make sure I haven't
confused myself beyond belief and it IS actually a problem.
It seems to me that there's a possible bug in FieldSortedHitQueue,
specifically
Hi Grant and Otis,
Thanks for the feedback, I appreciate it. You've given some good ideas.
Sounds like a really interesting system! I am curious, are your users
fluent in multiple languages or are you using some type of translation
component?
The former. We're talking about construction pro
: If I interrupt my IndexWriter with a kill signal, must of the time I
: will be left with a lock file AND corrupted index files (the searcher
: will throw some IllegalStateExceptions after the lock file is
: deleted).
if you are trying to deal with teh possibility that your indexing process
migh
: ChainedFilter: [views:[0.4-0.6] level:[1-} ]
:
: i am concerned about not being able to see the logical operator in the
: print string. Should i be able to see the operator?
I've never looked at it closely, but a quick glance at the source
indicates that the toString does not make any attempt t
Chris,
I really would like only this extra files, but I have the same problem here.
If I interrupt my IndexWriter with a kill signal, must of the time I
will be left with a lock file AND corrupted index files (the searcher
will throw some IllegalStateExceptions after the lock file is
deleted).
P
: And I read the following issue again:
:
: ConstantScoreRangeQuery - fixes "too many clauses" exception
: http://issues.apache.org/jira/browse/LUCENE-383
:
: But still, I cannot understand very well why ConstantScoreQuery comes out.
: Is it for to implement ConstantScoreRangeQuery? Or, is it used
: I'm relatively new to Lucene and I've been trying to index a large
: number of html files. If my operation is interrupted the index
: appears to be corrupted. I can no longer open it for searching with
: IndexSearcher (and no amount of toying with Luke's options seems to
: help if I try to bro
On 3/16/06, Koji Sekiguchi <[EMAIL PROTECTED]> wrote:
> But still, I cannot understand very well why ConstantScoreQuery comes out.
> Is it for to implement ConstantScoreRangeQuery? Or, is it used for something
> by itself?
ConstantScoreQuery can wrap any Filter and gives a constant score for
every
Hello,
Somebody asked me if I knew any good Lucene people who'd be interested in some
work that involves a good amount of Lucene...
Here is some info.
The company is in New York City. Full-time of contractors. Ideally local, but
remote work with good candidates may be ok, too.
Work involves L
Hello,
At Doug's hand on recent thread "Re: TooManyClauses exception in Lucene
(1.4)",
I could understand why ConstantScoreRangeQuery was added in Lucene 1.9.
I appreciate that.
And I read the following issue again:
ConstantScoreRangeQuery - fixes "too many clauses" exception
http://issues.apach
All the index files will be in a file.
Has anyone written a module for lucene that provides an alternative IO
method? Instead of FSDirectory, it reads out of a stream?
-Original Message-
From: hu andy [mailto:[EMAIL PROTECTED]
Sent: Friday, March 17, 2006 9:24 AM
To: java-user@lucene.apa
Do you mean you pack the index files into the file *.luc.If it is the case,
Lucene can't read it.
If you put index files and *.luc together under some directory, That's OK.
Lucene knows how to find these files
2006/3/14, Aditya Liviandi <[EMAIL PROTECTED]>:
>
> Hi all,
>
>
>
> If I want to embed
On 3/16/06, Nick Atkins <[EMAIL PROTECTED]> wrote:
> Hi Yonik, I'm not actually using any IndexReaders, just IndexWriters
> and IndexSearchers.
An IndexSearcher contains an IndexReader.
> I on;y get an IndexReader when I'm doing deletes
> but that isn't the case in this test.
Opening an IndexR
Hi Yonik, I'm not actually using any IndexReaders, just IndexWriters
and IndexSearchers. I on;y get an IndexReader when I'm doing deletes
but that isn't the case in this test. I definitely optimize() and
close() each IndexWriter when it's done writing its documents (about 200).
Anyway, I the pr
On 3/16/06, Nick Atkins <[EMAIL PROTECTED]> wrote:
> Yes, indexing only right now, although I can issue the odd search to
> test it's being built properly.
Ahh, as Otis suggests, it's probably is IndexReader(s) that are
exhausting the file descriptors.
Are you explicitly closing the old IndexReade
Yes, indexing only right now, although I can issue the odd search to
test it's being built properly.
My test (indexing 4+ message's in a user's mailbox) causes
BatchUpdater thread to write everything to the index approx every 15-17
seconds. The logs say:
[EMAIL PROTECTED] bin]# tail -f ../lo
This happens when you are doing indexing only!? Wow, I've never seen that.
Try posting your code in a form of a unit test.
Otis
- Original Message
From: Nick Atkins <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, March 16, 2006 6:28:52 PM
Subject: Re: Lucene and To
Hi Doug,
I have experimented with a mergeFactor of 5 or 10 (default) but it
didn't help matters once I reached the ulimit. I understand how the
mergeFactor affects Lucene's performance.
I am actually not doing any searches with IndexReader right now, just
indexing. Yes, I do store and reuse the
Thanks very much for your reply, I appreciate you taking the time.
Erick
Are you changing the default mergeFactor or other settings? If so, how?
Large mergeFactors are generally a bad idea: they don't make things
faster in the long run and they chew up file handles.
Are all searches reusing a single IndexReader? They should. This is
the other most common reason
Thanks Hannes, on my Fedora machine the maximum I can do is ulimit -n
1048576 which is 1M files. This should be enough for most sane cases
but it makes me uneasy. I assume the "deleted" file entries reported by
lsof will be cleared up eventually?
I can't believe this is really the only option a
Nick
it is a guess, but the only difference between my approach and yours
is that I am optimizing as soon as I open the writer, and you are
optimizing after the last (100th) document is written.
At the same time I am using:
writer.setUseCompoundFile(true);
writer.
Hi Nick,
use 'ulimit' on your ix system to check if its set to unlimited.
check:
http://wwwcgi.rdg.ac.uk:8081/cgi-bin/cgiwrap/wsi14/poplog/man/2/ulimit
You don't have to set it to unlimited, maybe increasing the number will
help.
later
Hannes
Nick Atkins schrieb:
Thanks Otis, I tried th
Thanks Otis, I tried that but I still get the same problem at the ulimit
-n point. I assume you meant I should call
IndexWriter.setUseCompoundFile(true). According to the docs compound
structure is the default anyway.
Any further thoughts? Anything I can tweak in the OS (Linux), Java
(1.5.0) or
16 mar 2006 kl. 11.47 skrev Erik Hatcher:
This can be done with some work to implement a SpanFuzzyQuery
(similar to the SpanRegexQuery in contrib/regex currently) and
using SpanNearQuery instead of a PhraseQuery.
Thanks, I'll check it out.
Performance is at risk doing such a query as all
I'm relatively new to Lucene and I've been trying to index a large
number of html files. If my operation is interrupted the index
appears to be corrupted. I can no longer open it for searching with
IndexSearcher (and no amount of toying with Luke's options seems to
help if I try to browse
Hi,
> From: Doug Cutting [mailto:[EMAIL PROTECTED]
>
> The primary advantage of a RangeQuery is that the ranking
> incorporates the degree of match of each term in the range,
> which may be useful for wildcard-like searches but is useless
> for date-like searches.
Also, RangeQuery allows to
Thanks Paulo,
I actually do something very similar. I have a queue of all pending
updates and a Thread that manages the queue. When the queue gets about
100 big or is 30 seconds old (whatever comes sooner) I process it which
results in all the Index writes. I also always optimize() and close()
Erick Erickson wrote:
Could you point me to any explanation of *why* range queries expand this
way?
It's just what they do. They were contributed a long time ago, before
things like RangeFilter or ConstantScoreRangeQuery were written. The
latter are relatively recent additions to Lucene and
Nick!
I had also the same problem. Now on my SearchEngine class, when I
write a document to the index, I check if the number of documents mod
100 is 0. if it is, optimize().
Optimize() reduces the number of documents used by the index, so the
number of open files also is reduced.
Take a look:
The easiest first step to try is to go from multi-file index structure to the
compound one.
Otis
- Original Message
From: Nick Atkins <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, March 16, 2006 3:00:59 PM
Subject: Lucene and Tomcat, too many open files
Hi,
What'
Hi,
What's the best way to manage the number of open files used by Lucene
when it's running under Tomcat? I have a indexing application running
as a web app and I index a huge number of mail messages (upwards of
4 in some cases). Lucene's merging routine always craps out
eventually with the
When I read LIA, I was struck by this issue, and it seemed...er...like an
easy mistake to make. Given that my impression of Lucene is that it's
extraordinarily well designed, I assume that there must be a good reason for
expanding range queries this way.
Could you point me to any explanation of *w
Thank you for reply, my Java run time environment did not work, that's why.
It is fixed now.
Miki
Original Message Follows
From: Erik Hatcher <[EMAIL PROTECTED]>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: IndexFiles
Date: Thu, 16 Mar 2006 09:33:37
i wrote a patch for this and the difference is unbelievable,
the memory footprint has been cut almost in half and it seems like
performance is basically the same if not better!!!
if anyone is interested let me know
Best approach here is to open up a Jira issue, then submit the patch
On 3/16/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> I had no idea that rangequery worked by enumerating every
> possible value, that's terrifying.
You could use either a RangeFilter or a ConstantScoreRangeQuery
-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Ser
Ouch! Yes, we're indexing with seconds, that's almost certainly the
problem. :( I had no idea that rangequery worked by enumerating every
possible value, that's terrifying.
We have a requirement to index data going back for about 20 years,
though, and although daily resolution would be fine, this
Tim,
This is possible a lot of days:
date:[2005-03-16 TO 2006-03-16]
And if your 'date' field is more granular than 'a day', then this is a lot more
hours/minutes/seconds/milliseconds.
Your range query is expanded to all unique values in the range. This is
probably in the FAQ, but if not, lo
Because I will delete the indexed document periodically, So the index files
must be deleted after that. If I just want to delete some documents added
before some past day from the index, How should i do it?
Thank you in advance
Great!
Can you please share your changes? The best way to do this is to:
1. Check Lucene's trunk out from subversion, with:
svn co http://svn.apache.org/repos/asf/lucene/java lucene-trunk
2. Make your changes. Use 'svn add' for new files, like unit tests.
Please try to conform to the S
i wrote a patch for this and the difference is unbelievable,
the memory footprint has been cut almost in half and it seems like
performance is basically the same if not better!!!
if anyone is interested let me know
Doug Cutting <[EMAIL PROTECTED]> wrote:
RAMDirectory is indeed curre
Hi,
I am using ChainedFilter to combine various filters. No mattar which
logical operator i try to apply to all filters, when i try to print the
chained filters using toString() method, i see
ChainedFilter: [views:[0.4-0.6] level:[1-} ]
i am concerned about not being able to see the logical
Hi,
We're using queryparser to generate my queries (not ideal, and we're
planning on rewriting it, but at the moment we don't have the resources
to do so).
We have a default field "text" which contains all of our text fields,
and a "date" field which is just a string field in the format -MM-
Does anyone have a lead on "business" stop words? Things like "inc", "llc",
"md", etc.
I'd rather not reinvent this wheel. :-)
cheers,
jeff
Hi,
have implemented the DistanceComparatorSource example
from Lucene In Action (my Bible) and it works great.
We are now in the situation where we have nearly a
million documents in our index and the performance of
this implementation has degraded.
I have downloaded and am trying to understand t
On Mar 16, 2006, at 5:10 AM, Waleed Tayea wrote:
I'm using the QueryParser to parse and return a query of a search
string
of a single word. But the analyzer I uses emits another morphological
tokens from that single word. How can I prevent the QueryParser of
considering the search query as a P
The registry setting is probably irrelevant. What does "java -
version" report?
Erik
On Mar 16, 2006, at 6:07 AM, miki sun wrote:
Hi
I am trying to use Lucene1.9.1 to index files on my computer.
According to the FAQ of the website:
- What Java version is required to run Lucene?
Luc
Hi Paul,
Sounds like a really interesting system! I am curious, are your users
fluent in multiple languages or are you using some type of translation
component?
Some comments below and a few thoughts here.
How are you querying? Are users entering mixed language queries too?
Do you have a
Terenzio Treccani wrote:
You're both true, this doesn't sound like Lucene at all...
But the problem of such SQL tables is their size: speaking about
millions of customers and thousands of news items, the many-to-many
(CustArt) table would end up by containing BILLIONS of lines A bit
too big
Hi
I am trying to use Lucene1.9.1 to index files on my computer.
According to the FAQ of the website:
- What Java version is required to run Lucene?
Lucene 1.4 will run with JDK 1.3 and up but requires at least JDK 1.4 to
compile. Lucene >= 1.9 requires Java 1.4.
But I got the following error
On Mar 16, 2006, at 2:40 AM, karl wettin wrote:
Is it possible to make a phrase query fuzzy?
What do you mean by a fuzzy phrase query? As in each term in the
phrase is treated as a FuzzyQuery essentially such that "kool kat"
matches "cool cat"?
This can be done with some work to impleme
You're both true, this doesn't sound like Lucene at all...
But the problem of such SQL tables is their size: speaking about
millions of customers and thousands of news items, the many-to-many
(CustArt) table would end up by containing BILLIONS of lines A bit
too big even for an Oracle table, I
16 mar 2006 kl. 08.40 skrev karl wettin:
Is it possible to make a phrase query fuzzy?
It could be a quick and not so dirty replacement for hidden markov
models and thus produce great results for spell checking and other
natrual language classifications.
Perhaps it is easier to make a Spa
Dear All.
I'm using the QueryParser to parse and return a query of a search string
of a single word. But the analyzer I uses emits another morphological
tokens from that single word. How can I prevent the QueryParser of
considering the search query as a PhraseQuery with the terms of that
single wo
It was published by Norbert Fuhr in the IR Summer Scholl Proceedings. I
found it via Google by using the small extention ext:pdf :-) that time...
http://www.is.informatik.uni-duisburg.de/bib/pdf/ir/Fuhr:00a.pdf
In return, you can do me also a favour and email me (personally, if you like
since thi
16 mar 2006 kl. 08.53 skrev Supheakmungkol SARIN:
Dear Luceners,
I wonder if there is any pre-defined option to read stop-word from
a file?
/** Builds an analyzer with the stop words from the given file.
* @see WordlistLoader#getWordSet(File)
*/
public StopAnalyzer(File stopwor
56 matches
Mail list logo