Re: SearchFiles demo fails with exception while IndexFiles works

2009-10-28 Thread s rajan
Mike, thanks for that URL, I saw a similar issue being discussed on stackoverflow. I am doing an external ant build and trying to debug through eclipse. For some reason eclipse is failing to import the ant build file as a project so i use a debug configuration and build externally. I now have the

Re: Lucene 2.9.0 / BooleanQuery problem

2009-10-28 Thread Michel Nadeau
OMG, it's SO OBVIOUS! For the normal search (sector:IT AND group:group) the problem was indeed that IT is "it", stopword. Thanks, I was so not seeing it! But what about the BooleanQuery? It should work fine too now... // // Test BooleanQuery // BooleanQuery que

Re: Lucene 2.9.0 / BooleanQuery problem

2009-10-28 Thread Jake Mannix
Hi Michel, I don't have time to look in too much detail right now, but I'll bet ya $5 it's because your query is for "sector:IT" - 'IT' lowercases to 'it' which is in the default stopword list, and if you're not careful about how you query with this, you'll end up with TermQuery instances which

Lucene 2.9.0 / BooleanQuery problem

2009-10-28 Thread Michel Nadeau
Hi ! I spent all night trying to get a simple BooleanQuery working and I really can't figure out what is my problem. See this very simple program : public class test { @SuppressWarnings("deprecation") public static void main(String[] args) throws ParseException, CorruptIndexException, Lo

[ANN] New Technical White Paper on Apache Lucene 2.9 from Lucid Imagination

2009-10-28 Thread Mark Miller
With the recent release of Apache Lucene 2.9, Lucid Imagination has put together an in-depth technical white paper on the range of performance improvements and new features (per segment indexing, trierange numeric analysis, and more), along with recommendations for upgrading your Lucene application

Re: IO exception during merge/optimize

2009-10-28 Thread Michael McCandless
Hmm, only a few affected terms, and all this particular "literals:cfid196$" term, with optional suffixes. Really strange. One things that's odd is the exact term "literals:cfid196$" is printed twice, which should never happen (every unique term should be stored only once, in the terms dict). And

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan
Just to be safe, I ran with the official jar file from one of the mirrors and reproduced the problem. The debug session is not showing any characters = '\u' (checking this in Tokenizer). The output from the modified CheckIndex follows. There are only a few terms with the inconsistency. They are

RE: IO exception during merge/optimize

2009-10-28 Thread Uwe Schindler
That's exactly what oal.util.UnicodeUtils does when convertig UTF-8 to UTF-16 (which is Java's internal encoding). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemcc

Re: IO exception during merge/optimize

2009-10-28 Thread Michael McCandless
On Wed, Oct 28, 2009 at 10:58 AM, Peter Keegan wrote: > The only change I made to the source code was the patch for PayloadNearQuery > (LUCENE-1986). That patch certainly shouldn't lead to this. > It's possible that our content contains U+. I will run in debugger and > see. OK may as well c

Re: IO exception during merge/optimize

2009-10-28 Thread Robert Muir
thats exactly the result I saw FWIW On Wed, Oct 28, 2009 at 11:25 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Right, I would expect Lucene would silently truncate the term at the > U+, and not lead to this odd exception. > > Mike > > On Wed, Oct 28, 2009 at 11:23 AM, Robert M

Re: IO exception during merge/optimize

2009-10-28 Thread Michael McCandless
Right, I would expect Lucene would silently truncate the term at the U+, and not lead to this odd exception. Mike On Wed, Oct 28, 2009 at 11:23 AM, Robert Muir wrote: > i might be wrong about this, but recently I intentionally tried to create > index with terms with U+ to see if it would

Re: IO exception during merge/optimize

2009-10-28 Thread Robert Muir
i might be wrong about this, but recently I intentionally tried to create index with terms with U+ to see if it would cause a problem :) the U+ seemed to be discarded completely (maybe at UTF-8 encode time)... then again I was using RAMDirectory. On Wed, Oct 28, 2009 at 10:58 AM, Peter Ke

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan
The only change I made to the source code was the patch for PayloadNearQuery (LUCENE-1986). It's possible that our content contains U+. I will run in debugger and see. The data is 'sensitive', so I may not be able to provide a bad segment, unfortunately. Peter On Wed, Oct 28, 2009 at 10:43 AM

Re: IO exception during merge/optimize

2009-10-28 Thread Michael McCandless
OK... when you exported the sources & built yourself, you didn't make any changes, right? It's really odd how many of the errors are due to the term "literals:cfid196$", or some variation (one time with "on" appended, another time with "microsoft"). Do you know what documents typically contain th

RE: IO exception during merge/optimize

2009-10-28 Thread Uwe Schindler
> >Also, what does Lucene version "2.9 exported - 2009-10-27 15:31:52" mean? > This appears to be something added by the ant build, since I built Lucene > from the source code. This is because it was build from a source artifact with no SVN revision information. At this place, normally the svn rev

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan
My last post got truncated - probably exceeded max msg size. Let me know if you want to see more of the IndexWriter log. Peter

Re: similarity function

2009-10-28 Thread Joel Halbert
I suppose this could be summarised as: "how do i set the score of each document result to be the score of that of the field that best matches the search terms"? -Original Message- From: Joel Halbert Reply-To: java-user@lucene.apache.org To: Lucene Users Subject: similarity function Da

similarity function

2009-10-28 Thread Joel Halbert
Hi, Given a query with multiple terms, e.g. fish oil, and searching across multiple fields e.g. query= fieldA:fish fieldA:oil fieldB:fish fieldB:oil etc... I don't want to give any more weight to documents that match the same word multiple times (either in the same, or different fields). I am

Re: What is multiple indexing and how does it work in Lucene [Java]

2009-10-28 Thread Erick Erickson
Hmmm, what do you mean by "multiple indexing"? Using more than one thread? more than one processor? Searching across more than one index? Each of these has a different answer... Best Erick On Wed, Oct 28, 2009 at 1:55 AM, DHIVYA M wrote: > Can anyone tell me what is multiple indexing and how doe

Re: Adding segments to an optimized index

2009-10-28 Thread Danil ŢORIN
There is no such thing in lucene as "unique" doc. They might be unique from your application point of view (have some ID that is unique) >From lucene's point of view it's perfectly fine to have duplicate documents. So the "deleted" documents in combined index are coming from your second index. E

Re: deleteDocuments() does not work

2009-10-28 Thread Michael McCandless
Can you not suppress the AIOOBE (just in case you're hitting that)? Also, you are failing to close the old reader after opening a new one. This shouldn't cause the issue you're seeing, but, will lead eventually to OOME or file descriptor exhaustion. Can you verify you are in fact reopening the r

Adding segments to an optimized index

2009-10-28 Thread Marc Sturlese
I am doing some test with optimize and adding segments and I am wondering if someone knows if what I am doing can give document inconsistency. I have 2 folders with one index each. One have a non optimized index1 with 1 milion docs and a mergeFactor=10. The other one, index2 has the same index op

Re: how to extract text from the result document in lucene search

2009-10-28 Thread DHIVYA M
Okay sir. Let me then try out with lucene 2.4.0 demos. --- On Wed, 10/28/09, Anshum wrote: From: Anshum Subject: Re: how to extract text from the result document in lucene search To: java-user@lucene.apache.org Date: Wednesday, October 28, 2009, 11:20 AM I wouldn't have a reference to a vers

Re: how to extract text from the result document in lucene search

2009-10-28 Thread Anshum
I wouldn't have a reference to a version that old. You could request over the community, and incase some one would have an archived version he/she could share it with you, It would require some time and modifications to the oldest available version for the highlighter. -- Anshum Gupta Naukri Labs!

Re: deleteDocuments() does not work

2009-10-28 Thread Dinh
Hi Anshum, > Is it that your engine keeps an IndexSearcher[Reader] open all through this while? The answer is yes. I have tried to keep a singleton instance of IndexSearcher open across web requests. Regarding to your advice, I have tried to re-open the IndexReader that is associated with that I

Re: how to extract text from the result document in lucene search

2009-10-28 Thread DHIVYA M
Ya thats great sir. Thanks a lot. Currently am working with lucene 1.4.3 how to include that highlight class? Can you please let me know the procedure of using it?   Thanks in advance Dhivya --- On Wed, 10/28/09, Anshum wrote: From: Anshum Subject: Re: how to extract text from the result doc

Re: how to extract text from the result document in lucene search

2009-10-28 Thread Anshum
I guess it should be available starting 1.9 onwards and patch-able with a few changes for even 1.4. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Wed, Oct 28, 2009 at 4:2

Re: how to extract text from the result document in lucene search

2009-10-28 Thread DHIVYA M
ya i found sir. May i know from which version is it available? --- On Wed, 10/28/09, Anshum wrote: From: Anshum Subject: Re: how to extract text from the result document in lucene search To: java-user@lucene.apache.org Date: Wednesday, October 28, 2009, 10:51 AM Yes Dhivya, there's a highli

Re: how to extract text from the result document in lucene search

2009-10-28 Thread Anshum
Yes Dhivya, there's a highlighter in the contrib for 2.4 as well. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Wed, Oct 28, 2009 at 4:03 PM, DHIVYA M wrote: > Thats exa

Re: deleteDocuments() does not work

2009-10-28 Thread Anshum
Hi Dinh, Is it that your engine keeps an IndexSearcher[Reader] open all through this while? For the deleted document to actually reflect in the search (service), you'd need to reload the index searcher with the latest version. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expr

deleteDocuments() does not work

2009-10-28 Thread Dinh
Hi all, I have a very simple method to delete a document that is indexed before /** * @param id */ public void deleteById(String id) throws IOException { IndexWriter writer = IndexWriterFactory.factory(); try { writer.deleteDocuments(new Term(Configu

Re: how to extract text from the result document in lucene search

2009-10-28 Thread DHIVYA M
Thats exactly matching my need sir. Thanx a lot But is this highlighter in lucene 2.4.0? --- On Wed, 10/28/09, Benjamin Heilbrunn wrote: From: Benjamin Heilbrunn Subject: Re: how to extract text from the result document in lucene search To: java-user@lucene.apache.org Date: Wednesday, October

Re: how to extract text from the result document in lucene search

2009-10-28 Thread Benjamin Heilbrunn
Hello Dhivya, i'm not familiar with the Lucene Demos. But for Highlighting take a look at http://lucene.apache.org/java/2_9_0/api/contrib-highlighter/index.html Best regards Benjamin

how to extract text from the result document in lucene search

2009-10-28 Thread DHIVYA M
Hi   Am a beginner in using lucene. I succeeded in running the demo files of lucene and found the concept.   When we execute the SearchFiles.java file in the demo folder, am getting the names of the documents containing the given query string. Is it possible to display some portions of the text

Re: SearchFiles demo fails with exception while IndexFiles works

2009-10-28 Thread Michael McCandless
Are you using an IDE (Eclipse)? This may help?: http://forums.java.net/jive/thread.jspa?messageID=363989 Or maybe try building from the command line instead ("ant compile-demo")? Mike On Tue, Oct 27, 2009 at 8:34 PM, s rajan wrote: > hi, I am playing with lucene 2.9.0 source build, ant 1.7.

Re: Split single string into several fields?

2009-10-28 Thread Andrzej Bialecki
Robert Muir wrote: Will, I think this parsing of documents into different fields, is separate and unrelated from lucene's analysis (tokenization)... the analysis comes to play once you have a field, and you want to break the text into indexable units (words, or entire field as token like your url

Re: IO exception during merge/optimize

2009-10-28 Thread Michael McCandless
The unit tests do test multi-segment indexes (though we could always use deeper testing, here), but, don't test big-ish indexes, like this, very well. Are you also using JDK 1.6.0_16 when running CheckIndex? If you run CheckIndex on the same index several times in a row, does it report precisely