Mike, thanks for that URL, I saw a similar issue being discussed on
stackoverflow.
I am doing an external ant build and trying to debug through eclipse. For
some reason eclipse is failing to import the ant build file as a project so
i use a debug configuration and build externally.
I now have the
OMG, it's SO OBVIOUS! For the normal search (sector:IT AND group:group)
the problem was indeed that IT is "it", stopword. Thanks, I was so not
seeing it!
But what about the BooleanQuery? It should work fine too now...
//
// Test BooleanQuery
//
BooleanQuery que
Hi Michel,
I don't have time to look in too much detail right now, but I'll bet ya $5
it's because
your query is for "sector:IT" - 'IT' lowercases to 'it' which is in the
default stopword
list, and if you're not careful about how you query with this, you'll end up
with TermQuery
instances which
Hi !
I spent all night trying to get a simple BooleanQuery working and I really
can't figure out what is my problem. See this very simple program :
public class test {
@SuppressWarnings("deprecation")
public static void main(String[] args) throws ParseException,
CorruptIndexException, Lo
With the recent release of Apache Lucene 2.9, Lucid Imagination has put
together an in-depth technical white paper on the range of performance
improvements and new features (per segment indexing, trierange numeric
analysis, and more), along with recommendations for upgrading your
Lucene application
Hmm, only a few affected terms, and all this particular
"literals:cfid196$" term, with optional suffixes. Really strange.
One things that's odd is the exact term "literals:cfid196$" is printed
twice, which should never happen (every unique term should be stored
only once, in the terms dict).
And
Just to be safe, I ran with the official jar file from one of the mirrors
and reproduced the problem.
The debug session is not showing any characters = '\u' (checking this in
Tokenizer).
The output from the modified CheckIndex follows. There are only a few terms
with the inconsistency. They are
That's exactly what oal.util.UnicodeUtils does when convertig UTF-8 to
UTF-16 (which is Java's internal encoding).
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemcc
On Wed, Oct 28, 2009 at 10:58 AM, Peter Keegan wrote:
> The only change I made to the source code was the patch for PayloadNearQuery
> (LUCENE-1986).
That patch certainly shouldn't lead to this.
> It's possible that our content contains U+. I will run in debugger and
> see.
OK may as well c
thats exactly the result I saw FWIW
On Wed, Oct 28, 2009 at 11:25 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> Right, I would expect Lucene would silently truncate the term at the
> U+, and not lead to this odd exception.
>
> Mike
>
> On Wed, Oct 28, 2009 at 11:23 AM, Robert M
Right, I would expect Lucene would silently truncate the term at the
U+, and not lead to this odd exception.
Mike
On Wed, Oct 28, 2009 at 11:23 AM, Robert Muir wrote:
> i might be wrong about this, but recently I intentionally tried to create
> index with terms with U+ to see if it would
i might be wrong about this, but recently I intentionally tried to create
index with terms with U+ to see if it would cause a problem :)
the U+ seemed to be discarded completely (maybe at UTF-8 encode time)...
then again I was using RAMDirectory.
On Wed, Oct 28, 2009 at 10:58 AM, Peter Ke
The only change I made to the source code was the patch for PayloadNearQuery
(LUCENE-1986).
It's possible that our content contains U+. I will run in debugger and
see.
The data is 'sensitive', so I may not be able to provide a bad segment,
unfortunately.
Peter
On Wed, Oct 28, 2009 at 10:43 AM
OK... when you exported the sources & built yourself, you didn't make
any changes, right?
It's really odd how many of the errors are due to the term
"literals:cfid196$", or some variation (one time with "on" appended,
another time with "microsoft"). Do you know what documents typically
contain th
> >Also, what does Lucene version "2.9 exported - 2009-10-27 15:31:52" mean?
> This appears to be something added by the ant build, since I built Lucene
> from the source code.
This is because it was build from a source artifact with no SVN revision
information. At this place, normally the svn rev
My last post got truncated - probably exceeded max msg size. Let me know if
you want to see more of the IndexWriter log.
Peter
I suppose this could be summarised as:
"how do i set the score of each document result to be the score of that
of the field that best matches the search terms"?
-Original Message-
From: Joel Halbert
Reply-To: java-user@lucene.apache.org
To: Lucene Users
Subject: similarity function
Da
Hi,
Given a query with multiple terms, e.g. fish oil, and searching across
multiple fields e.g.
query= fieldA:fish fieldA:oil fieldB:fish fieldB:oil etc...
I don't want to give any more weight to documents that match the same
word multiple times (either in the same, or different fields). I am
Hmmm, what do you mean by "multiple indexing"? Using more than one thread?
more than one processor? Searching across more than one index? Each of these
has a different answer...
Best
Erick
On Wed, Oct 28, 2009 at 1:55 AM, DHIVYA M wrote:
> Can anyone tell me what is multiple indexing and how doe
There is no such thing in lucene as "unique" doc.
They might be unique from your application point of view (have some ID
that is unique)
>From lucene's point of view it's perfectly fine to have duplicate documents.
So the "deleted" documents in combined index are coming from your second index.
E
Can you not suppress the AIOOBE (just in case you're hitting that)?
Also, you are failing to close the old reader after opening a new one.
This shouldn't cause the issue you're seeing, but, will lead
eventually to OOME or file descriptor exhaustion.
Can you verify you are in fact reopening the r
I am doing some test with optimize and adding segments and I am wondering if
someone knows if what I am doing can give document inconsistency.
I have 2 folders with one index each. One have a non optimized index1 with 1
milion docs and a mergeFactor=10. The other one, index2 has the same index
op
Okay sir. Let me then try out with lucene 2.4.0 demos.
--- On Wed, 10/28/09, Anshum wrote:
From: Anshum
Subject: Re: how to extract text from the result document in lucene search
To: java-user@lucene.apache.org
Date: Wednesday, October 28, 2009, 11:20 AM
I wouldn't have a reference to a vers
I wouldn't have a reference to a version that old. You could request over
the community, and incase some one would have an archived version he/she
could share it with you,
It would require some time and modifications to the oldest available version
for the highlighter.
--
Anshum Gupta
Naukri Labs!
Hi Anshum,
> Is it that your engine keeps an IndexSearcher[Reader] open all through
this
while?
The answer is yes. I have tried to keep a singleton instance of
IndexSearcher open across web requests.
Regarding to your advice, I have tried to re-open the IndexReader that is
associated with that I
Ya thats great sir.
Thanks a lot.
Currently am working with lucene 1.4.3 how to include that highlight class? Can
you please let me know the procedure of using it?
Thanks in advance
Dhivya
--- On Wed, 10/28/09, Anshum wrote:
From: Anshum
Subject: Re: how to extract text from the result doc
I guess it should be available starting 1.9 onwards and patch-able with a
few changes for even 1.4.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw
On Wed, Oct 28, 2009 at 4:2
ya i found sir.
May i know from which version is it available?
--- On Wed, 10/28/09, Anshum wrote:
From: Anshum
Subject: Re: how to extract text from the result document in lucene search
To: java-user@lucene.apache.org
Date: Wednesday, October 28, 2009, 10:51 AM
Yes Dhivya, there's a highli
Yes Dhivya, there's a highlighter in the contrib for 2.4 as well.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw
On Wed, Oct 28, 2009 at 4:03 PM, DHIVYA M wrote:
> Thats exa
Hi Dinh,
Is it that your engine keeps an IndexSearcher[Reader] open all through this
while? For the deleted document to actually reflect in the search (service),
you'd need to reload the index searcher with the latest version.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expr
Hi all,
I have a very simple method to delete a document that is indexed before
/**
* @param id
*/
public void deleteById(String id) throws IOException {
IndexWriter writer = IndexWriterFactory.factory();
try {
writer.deleteDocuments(new Term(Configu
Thats exactly matching my need sir.
Thanx a lot
But is this highlighter in lucene 2.4.0?
--- On Wed, 10/28/09, Benjamin Heilbrunn wrote:
From: Benjamin Heilbrunn
Subject: Re: how to extract text from the result document in lucene search
To: java-user@lucene.apache.org
Date: Wednesday, October
Hello Dhivya,
i'm not familiar with the Lucene Demos.
But for Highlighting take a look at
http://lucene.apache.org/java/2_9_0/api/contrib-highlighter/index.html
Best regards
Benjamin
Hi
Am a beginner in using lucene. I succeeded in running the demo files of lucene
and found the concept.
When we execute the SearchFiles.java file in the demo folder, am getting the
names of the documents containing the given query string. Is it possible to
display some portions of the text
Are you using an IDE (Eclipse)? This may help?:
http://forums.java.net/jive/thread.jspa?messageID=363989
Or maybe try building from the command line instead ("ant compile-demo")?
Mike
On Tue, Oct 27, 2009 at 8:34 PM, s rajan wrote:
> hi, I am playing with lucene 2.9.0 source build, ant 1.7.
Robert Muir wrote:
Will, I think this parsing of documents into different fields, is separate
and unrelated from lucene's analysis (tokenization)...
the analysis comes to play once you have a field, and you want to break the
text into indexable units (words, or entire field as token like your url
The unit tests do test multi-segment indexes (though we could always
use deeper testing, here), but, don't test big-ish indexes, like this,
very well.
Are you also using JDK 1.6.0_16 when running CheckIndex? If you run
CheckIndex on the same index several times in a row, does it report
precisely
37 matches
Mail list logo