Since the heap dump was so big and can't be attached, I have taken a few screen
shots from Java VisualVM of the heap dump. In the first image you can see that
at the time our memory has become very tight most of it is held up in bytes.
In the second image I examine one of those instances and n
Hello All,
I am kind of new to Lucene, and having problem filtering search results.
Background:
My Indexed documents have multiple bills and each bill has multiple
versions.
Each version of the same bill has a different bill Version Id, but the same
bill Id. In most likely case, the text in d
I ran your code. Since I don't have the queries file (Docs/documento.txt), I
set this line instead:
String termos = "\"Lucene in Action\"";
When I set it to \"Lucene\", both documents are found. When I set it to
\"Lucene in Action\" only the first document is found. Seems correct to me.
Can you
Are these fixes in 2.9x branch? We are using 2.9x and can't move to 3x just
yet. If so, where do I specifically pick this up from?
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Monday, April 12, 2010 10:20 PM
To: java-user@lucene.apache.org
Subject: Re: IndexW
On Tue, Apr 13, 2010 at 11:55 AM, Burton-West, Tom wrote:
> At some point maybe the File Formats Document could be updated to make it
> clear that the tii has an entry similar to the IntexInterval'th tis entry but
> instead of holding frq/prx deltas it holds absolute pointers. Is it worth
> e
Thanks Mike,
At some point maybe the File Formats Document could be updated to make it clear
that the tii has an entry similar to the IntexInterval'th tis entry but instead
of holding frq/prx deltas it holds absolute pointers. Is it worth entering a
JIRA issue? I would be happy to update the
Hi Shai,
On 4/13/2010 1:41 AM, Shai Erera wrote:
Hi
WhitespaceAnalyzer definitely has a Version dependent ctor. What
Lucene version do you use?
You van use LUCENE_CURRENT but be aware that of a certain Analyzer's
behavior has changed in a way that affects your app, you'll need to
reindex your
Hi Uwe,
On 4/13/2010 2:23 AM, Uwe Schindler wrote:
As of Lucene 3.0, WhitespaceAnalyzer has not yet a Version ctor. It will come
in 3.1, when Lucene is changed to be Unicode 4.0 conform (3.0 and before is
Unicode 3.0, which is Java 1.4).
QueryParser need the Version ctor for the handling of s
On Apr 12, 2010, at 1:31 PM, Ramon De Paula Marques wrote:
> Hi guys,
>
> I'm trying to use highlighter to a better search on my website, but when the
> search get documents html and pdf that were indexed with a reader causes an
> exception that tells the field is not stored.
>
> I don't know w
Can you whittle down your example even more?
EG don't read the term vectors for the first hit. Just open a single
reader and do the TermQuery search over and over?
BTW what does this line in PyLucene do?:
tfvP = lucene.TermFreqVector.cast_(tfv)
You never hit exceptions in this code right?
On Mon, Apr 12, 2010 at 9:50 AM, Herbert L Roitblat wrote:
> Thank you Michael. Your suggestions are helpful. I inherited all of the
> code that uses pyLucene and don't consider myself an expert on it, so I very
> much appreciate your suggestions.
>
> It does not seem to be the case that these e
This would be a very good thing to try, given that you have some huge
documents that, indexed alone, use far more than your RAM buffer.
Mike
On Tue, Apr 13, 2010 at 12:19 AM, Lance Norskog wrote:
> There is some bugs where the writer data structures retain data after
> it is flushed. They are co
The infoStream generally looks healthy. You seem to have a contained
set of unique field names.
The one thing that's interesting is... your docs are quite large. If
you grep for "flush: segment=" in your infoStream you see how many
docs "fit" in 16 MB before flushing, and it's lowish (as high as
Hi Tom,
Fear not: we only scan up to 128 terms, to find the specific term.
First, the terms dict index (tii) is fully loaded into RAM, and then a
binary search is done on this (in-RAM) to find the nearest index term
just before the term you want. Then, we seek to that spot in the
main terms dict
14 matches
Mail list logo