As of Lucene 3.0, WhitespaceAnalyzer has not yet a Version ctor. It will come
in 3.1, when Lucene is changed to be Unicode 4.0 conform (3.0 and before is
Unicode 3.0, which is Java 1.4).
QueryParser need the Version ctor for the handling of stop words. As WhiteSpace
Analyzer does not use StopFi
Hi
WhitespaceAnalyzer definitely has a Version dependent ctor. What
Lucene version do you use?
You van use LUCENE_CURRENT but be aware that of a certain Analyzer's
behavior has changed in a way that affects your app, you'll need to
reindex your data. Usually an Analyzer (or any other Version-awar
There is some bugs where the writer data structures retain data after
it is flushed. They are committed as of maybe the past week. If you
can pull the trunk and try it with your use case, that would be great.
On Mon, Apr 12, 2010 at 8:54 AM, Woolf, Ross wrote:
> I was on vacation last week so jus
Hi all,
Please let me know if this should be posted instead to the Lucene java-dev list.
We have very large tis files (about 36 GB). I have not been too concerned as I
assumed that due to the indexing of the tis file by the tii file, only a small
portion of the file needed to be read. However
We are in the process of removing the deprecated api from our code to
move to version. One of the deprecation is, the queryparser now expects
a version parameter in the constructor. I also have read somewhere that
we should pass the same version to analyzer when indexing as wel as when
search
Thanks David.
I think that I neglected to say that I am using pyLucene 2.4.0.
Your suggestion is almost what we're doing:
indexReader.getTermFreqVector(ID, fieldName)
self.hits = list(self.lSearcher.search(self.query))
if self.hits:
self.hit = lucene.Hit.cast_(self.hi
Hi,
you are walking from indexReader.terms() then on indexReader.termDocs(Term t)
for each term and then match your docID on the termsDocs enum? So you walk
the whole index?
You need a forward index and lucene is inverted but you have IMHO 2
solutions with lucene (sadly, they both require re-ind
Hi, folks.
I appreciate the help people have been offering.
Here is my problem. My immediate need is to get the tokens for a document from
the Lucene index. I have a list of documents that I walk, one at a time.
Right now, I am getting the tokens and their frequencies and the problem is
that
And the main objective: when I pass the word "Lucene in Action", it find and
remove that term of phrase in the Index, for when I pass the 2nd term
("Lucene"), he does not find that phrase anymore, as has been found the
"Lucene in Action" .
2010/4/12 Railan Xisto
> Ok. There is a piece of code a
Hi guys,
I'm trying to use highlighter to a better search on my website, but when the
search get documents html and pdf that were indexed with a reader causes an
exception that tells the field is not stored.
I don't know where to attack now, i must try to index documents storing
fields? How to do
Update:
reusing the reader and searcher made almost no difference. It still eats up
the heap.
- Original Message -
From: "Herbert L Roitblat"
To:
Sent: Monday, April 12, 2010 6:50 AM
Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded
Thank you Michael. Your sugge
Thank you, Karl!
--- On Fri, 4/9/10, Karl Wettin wrote:
> From: Karl Wettin
> Subject: Re: Lucene Partition Size
> To: java-user@lucene.apache.org
> Date: Friday, April 9, 2010, 9:39 AM
> It's hard for me to say why this is
> slow.
>
> Here are a few more questions whose anwers might provide
>
Thank you Michael. Your suggestions are helpful. I inherited all of
the code that uses pyLucene and don't consider myself an expert on it,
so I very much appreciate your suggestions.
It does not seem to be the case that these elements represent the index
of the collection. TermInfo and Term
I see the payload in the token now.
--
View this message in context:
http://n3.nabble.com/How-to-calculate-payloads-in-queries-too-tp712743p713413.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To
After a closer look, I forgot to mention a major clue : it's also the
first time we use NRT.
I thought IW.getReader() would return a pooled NRT and in fact it
returns always a new IR. This should explain the Too many opened files
exception. After each addDocument(doc) I prepare a reader with
IW.ge
Ok. There is a piece of code attached.. As I already said, I want to pass
that when the term "Lucene in Action" he finds only the 1st sentence.
2010/4/10 Shai Erera
> Hi. I'm not sure I understand what you searched for. When you search
> for "Lucene in action", do you search it with the quote
Hi,
I found a bug in my application, there was no commit at all in all the
indexing chain.
I noticed thanks to this bug that lucene keeps a file system reference
to deleted index files. So after many files indexed I hit a Too many
open files.
I use a 32 bits 1.6.16 JVM on a linux 64bits system.
D
The large count of TermInfo & Term is completely normal -- this is
Lucene's term index, which is entirely RAM resident.
In 3.1, with flexible indexing, the RAM efficiency of the terms index
should be much improved.
While opening a new reader/searcher for every query is horribly
inefficient, it sh
18 matches
Mail list logo