That helped! Thanks!
I just added some .close() calls to a few places where I kept file
handles open and it worked quite nicely. Good lesson, make sure you
all clean up after yourselves!
Thanks,
Michael
On Feb 14, 2007, at 8:04 PM, Steven Parkes wrote:
See the wiki:
http://wiki.apache.
See the wiki:
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-48921635adf2c968f79
36dc07d51dfb40d638b82
-Original Message-
From: Michael Prichard [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 14, 2007 5:02 PM
To: java-user@lucene.apache.org
Subject: Too many open files?!
I am
I am getting this exception:
Exception in thread "main" java.io.FileNotFoundException: /index/_gna.f13 (Too
many open files)
This is happening on a SLES10 (64-bit) box when trying to index 18k items.
I can run it on a much lesser SLES9 box without any issues.
Any ideas?!
Thanks,
Michael
---
Here is the comment:
/*
* Note that this method signature avoids having a user call new
* o.a.l.d.Field(...) which would be much too expensive due to the
* String.intern() usage of that class.
*
* More often than not, String.intern() leads to serious performance
* degra
14 feb 2007 kl. 20.49 skrev Mark Miller:
There is some code in contrib with comments claiming this interning
is actually slower. I think it was the MemoryIndex? Has this ever
been discussed?
There is of course a cost of RAM and CPU involved with flyweighting
instances. In order to win th
There is some code in contrib with comments claiming this interning is
actually slower. I think it was the MemoryIndex? Has this ever been
discussed?
- Mark
Otis Gospodnetic wrote:
I'm not looking at the code now, but I believe this is because those Strings
are interned, and I believe they a
On Wednesday 14 February 2007 17:12, jm wrote:
> So my question, is it possible to disable some of the caching lucene
> does so the memory consumption will be smaller (I am a bit concerned
> on the memory usage side)? Or the memory savings would not pay off?
You could set IndexWriter.setTermIndex
OK, final note. I wish I knew what kind of drugs I was on when I first
thought that the sizes were so much smaller. Because they weren't. I got to
thinking that "gee, it's kind of weird that if you don't specify anything
for TermVector when creating a field, you get all this advanced stuff. If it
Cool.
Thanks!
BTW, I have another issue here.
The array of floats for the Float cache is not initialised. Which means that
it will return '0.0' (not initialised) as the value for those documents that
have a '0' as the value, as well as for those ones that do not have the
field.
In my actual sys
I'm not looking at the code now, but I believe this is because those Strings
are interned, and I believe they are interned precisely so that this (faster)
comparison can be done.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/
Hi guys,
I have been diving into the FieldCacheImpl code.
I have seen sth on actual version:
Revision 488908 - (view) (download) (annotate) - [select for diffs]
Modified Wed Dec 20 03:47:09 2006 UTC (8 weeks ago) by yonik
File length: 13425 byte(s)
that I wonder if it's not totally right, or if
14 feb 2007 kl. 17.12 skrev jm:
So my question, is it possible to disable some of the caching lucene
does so the memory consumption will be smaller (I am a bit concerned
on the memory usage side)? Or the memory savings would not pay off?
You could try to create a new Searcher for each query,
Hi,
That last thread about caching reminded me of something. Me need is
actually the opposite...
I use lucene to search in hundreds/thousands of indexes. Doing a
lucene query on a set of the indexes is only one of the steps involved
in my 'queries', and some of the other steps take longer than l
Hi,
I have been diving into the code and I don't see why the class
FieldCacheImpl is not extendible. It is not defined as a public class...
though, I would like to be able to subclass it to change a slight bit.
Why is it defined like that?
Thanks
--
View this message in context:
http://www.nab
On 2/14/07, Mark Miller <[EMAIL PROTECTED]> wrote:
Not to get off topic, but I was curious Yonik, what does solr do if many
updates come in at a time opening and closing a writer each
update...does the first update kick off a warm operation, then before
that warm is done the second updates kicks
It's always embarrassing when the correct unit test takes, say, 3 minutes to
write and I've engaged in all this angst that I could have dispelled all by
myself (although it is nice to have confirmation from folks in the know).
The answer is that omitting term vectors has no influence on the behav
My apologies to Erik...and Erick...I am horrible with names.
If I am reading Grant's email correctly, he also said you don't need to
store the Term Vectors...just that if you did store them, you can use
them with the highlighter so that you do not need to reanalyze the
text...why exactly this
Not to get off topic, but I was curious Yonik, what does solr do if many
updates come in at a time opening and closing a writer each
update...does the first update kick off a warm operation, then before
that warm is done the second updates kicks off a warm operation, and
then before that warm i
Thanks for that addition, it may well be important to me (as well as
pointing up a weakness in my unit tests. Honest, I've been thinking about
explicitly testing this. Really. I'll get around to it real soon now.
Truly). We store multiple entries in the same field, think of it as
storing a lis
On 2/14/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:
I have an index with 5.2 million records (each record containing 3
fields) and it sometimes takes about a minute and a half for results to
come back.
Doe to sort fields (and other factors), the first query can be slow.
Solr has built-in supp
As Erick said, Term positions are kept regardless of whether you store
term vectors. The positional information is needed for phrase queries,
span queries, etc. You certainly don't lose the ability to use phrase
queries if you do not store term vectors. If you check out the Posting
class in Doc
As Erik stated, you don't need term vectors to do spans, but I
thought I would add a bit on the difference between positions and
offsets.
Positions are what is stored in Lucene internally (see
Token.getPositionIncrement() and TermPositions) and are usually just
consecutive integers (altho
Well,
I have an index with 5.2 million records (each record containing 3
fields) and it sometimes takes about a minute and a half for results to
come back. I have noticed however, that when I run the same query the
second time the result comes back faster. I just thought that this was
a bit too
Erik Hatcher sez no.
Erick
On 2/14/07, karl wettin <[EMAIL PROTECTED]> wrote:
14 feb 2007 kl. 15.03 skrev Erick Erickson:
> My reasoning was that I do need position information since I need
> to do Span
> queries, but character information (WITH_OFFSETS) isn't necessary
> here/now.
> So I t
14 feb 2007 kl. 15.03 skrev Erick Erickson:
My reasoning was that I do need position information since I need
to do Span
queries, but character information (WITH_OFFSETS) isn't necessary
here/now.
So I thought I'd make a small test to see if this was worth
pursuing. If
omitting offsets ha
14 feb 2007 kl. 14.57 skrev Kainth, Sachin:
I have read that Lucene performs caching of search results so that if
you perform the same search in succession the second result should be
returned faster. What I wanted to ask is whether this caching is any
good or whether it's a good idea to add s
You've made me a happy man .
Thanks again.
[EMAIL PROTECTED] .
On 2/14/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
On Feb 14, 2007, at 9:03 AM, Erick Erickson wrote:
> My reasoning was that I do need position information since I need
> to do Span
> queries, but character information (WITH_OF
This is really an unanswerable question, since, to steal a phrase, "It
depends" ...
Do you have any reason to believe that the current performance is inadequate
for you application? Caching is notoriously difficult to get right, so I
wouldn't go there unless there is a *demonstrated* need. By dem
On Feb 14, 2007, at 9:03 AM, Erick Erickson wrote:
My reasoning was that I do need position information since I need
to do Span
queries, but character information (WITH_OFFSETS) isn't necessary
here/now.
1> Am I going off a cliff here? I suppose this is really answered by
2> what is the d
Hi,
I have created a Query that works for numerical max-min ranges, that may
work for any Field specified.
I have done that by extending Query, and creating own Weight and Scorer
subclasses as well.
So it works ... but I have problems when setting min or max boundary to 0:
In this case, those ent
I'm indexing books, with a significant amount of overhead in each document
and a LOT of OCR data. I'm indexing over 20,000 books and the index size is
8G. So I decided to play around with not storing some of the termvector
information and I'm shocked at how much smaller the index is. By storing al
The usual source of this problem is HTML forms. If you want to get UTF-8
back from a form, you have to send \the form itself/ to the browser in
UTF-8.
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 14, 2007 3:50 AM
To: java-user@lucene.apache.
Hi all,
I have read that Lucene performs caching of search results so that if
you perform the same search in succession the second result should be
returned faster. What I wanted to ask is whether this caching is any
good or whether it's a good idea to add some sort of caching layer on
top of Luc
Hi,
I think it makes sense if it returns zero records because you are using
BooleanClause.Occur.SHOULD for each field, it means the term "open" should
occurs in all fields. but when you specify the field name in your query you
limit searching through that mentioned field.
as stated in Lucene java
you can use wordnet implementation like model.
http://www.tropo.com/techno/java/lucene/wordnet.html
jose
- Mensaje original -
De: Saroja Kanta Maharana <[EMAIL PROTECTED]>
Fecha: Miércoles, Febrero 14, 2007 10:24 am
Asunto: Help me in Thesaurus implementation using lucene
A: java-user@lu
Hi All,
I'm a new user of Lucene, and a would like to use it to create a
Thesaurus.
Do you have any idea to do this? Thanks!
*Regards *
*Saroj*
Can you come to online now .
On 2/14/07, ashwin kumar <[EMAIL PROTECTED]> wrote:
hi thanks for your kindest reply.
i just trying to index some text files using lucene-2.0.0
if you can share any sample programs for text file indexing in
lucene-2.0.0
it will be allot helpfull for me to unders
Internally Lucene deals with pure Java Strings; when writing those strings
to and reading those strings back from disk, Lucene allways uses the stock
Java "modified UTF-8" format, regardless of what your file.encoding
system property may be.
typcially when people have encoding problems in their
Hi,
Hope this help.
Regards,
Wooi Meng
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.441 / Virus Database: 268.17.39/685 - Release Date: 2/13/2007
10:01 PM
Disclaimer
---
Hi ho peoples.
We have an application that is internationalized, and stores data
from many languages (each project has it's own index, mostly aligned
with a single language, maybe 2).
Anyway, I've noticed during some thread dumps diagnosing some
performance issues, that there appears to b
40 matches
Mail list logo