Re: CompressingStoredFieldsFormat doesn't show improvement

2013-01-29 Thread arun k
Adrein, I have created an index of size 370M of 1 million docs of 40 fields of 40 chars and did the profiling. I see that the indexing and in particular the addDocument & ConcurrentMergeScheduler in 4.1 takes double the time compared to 3.0.2. Looks like CompressionStoredFieldsFormat is of little

Re: FacetRequest include residue

2013-01-29 Thread Shai Erera
Hi Nicola, There might be a way to do what you want, with some coding on your part. If you're interested in counting the top-10 of the "Brand" facet, but also return the count of "Brand/X", even if it's not in the top-10, then what you should do is write code similar to this: FacetArrays facetArr

Re: Pulsing40PostingsFormat in lucene 4.1

2013-01-29 Thread Sean Bridges
Thanks, we will try the class path trickery. How do we avoid similar situations in the future? Is Pulsing41PostingsFormat going to be maintained in future versions of Lucene? What are the safe PostingFormat/Codecs to use? Every PostingFormat/Codec is @deprecated or @experimental. Sean On Tue

Re: Pulsing40PostingsFormat in lucene 4.1

2013-01-29 Thread Michael McCandless
Another option would be, using 4.0, use addIndexes(IndexReader[]) into a new index, to convert your entire index into the supported (back compat) codec (Lucene40). Don't use addIndexes(Directory[]) as this just copies files. Then you can read that resulting index with 4.1. Mike McCandless http:

Re: Pulsing40PostingsFormat in lucene 4.1

2013-01-29 Thread Michael McCandless
Woops, sorry: PulsingPostingsFormat was already moved from core to codecs as of 4.0 (not in 4.1 like I said before). And ... yes, you need Pulsing40PostingsFormat on your classpath to read your 4.0 indices with 4.1. I think you need to excise the sources and get them on the classpath? But this w

Re: Pulsing40PostingsFormat in lucene 4.1

2013-01-29 Thread Sean Bridges
Do I need the Pulsing40PostingsFormat class to read my indexes though? Pulsing40PostingsFormat isn't shipped with lucene 4.1. I have index files with names like _0_Pulsing40_0.frq. When I try to open my index I get, java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs

Re: Pulsing40PostingsFormat in lucene 4.1

2013-01-29 Thread Michael McCandless
Pulsing41PostingsFormat was just moved out of core to the "codecs" module. Still, the worst case (had it been deleted) would be to revive the code from the past release and put it in your classpath, so old indices could be read. Mike McCandless http://blog.mikemccandless.com On Tue, Jan 29, 201

Re: Questions about FuzzyQuery in Lucene 4.x

2013-01-29 Thread Jack Krupansky
I'm sorry, but for anybody to help you here, you really need to be able to provide a concise test case, like 10-20 lines of code, completely self-contained. If you think you need a million documents to repro what you claimed was a simple scenario, then you leave me very, very confused - and una

Re: Questions about FuzzyQuery in Lucene 4.x

2013-01-29 Thread George Kelvin
Hi Jack, The problematic query is "scar"+"wads". There are several (more than 10) documents in the data with the content "star wars", so I think that query should be able to find all these documents. I was trying to provide a minimal test case, but I couldn't reduce the size of data showing the

Migration to Lucene 4.1

2013-01-29 Thread Paul Sitowitz
Hello, I currently have the following production code which currently works with Lucene 3.0: this.luceneWriter = new IndexWriter( directory, analyzer, true, MaxFieldLength.UNLIMITED ); this.fs.delete( this.finalOutput, true ); this.luceneWriter.setUseCompoundFile( true ); this.luceneWriter.setMer

Re: Questions about FuzzyQuery in Lucene 4.x

2013-01-29 Thread Jack Krupansky
I also noticed that you have "MUST" for your full string of fuzzy terms - that means everyone of them must appear in an indexed document to be matched. Is it possible that maybe even one term was not in the same indexed document? Try to provide a complete example that shows the input data and

Re: FacetRequest include residue

2013-01-29 Thread Shai Erera
Hi Nicola, How does the interface allow the user to select a facet values not from the top-10? How does the interface know which other facet values are there? Does it query the taxonomy somehow? One thing you can do is to set numResults to Integer.MAX_VALUE and numToLabel to 10. That way your Fac

FacetRequest include residue

2013-01-29 Thread Nicola Buso
Hi, I have a FacetRequest with numResults setted to 10, how can I specify additional facets value to add to the FacetResult? I try to explain the use-case: - the user view 10 facet result - the interface permit the user to choose a facet value not from the top-10 results - the user execute the qu

Re: Questions about FuzzyQuery in Lucene 4.x

2013-01-29 Thread George Kelvin
Hi Jack, ed is set to 1 here and I have lowercased all the data and queries. Regarding the indexed data factor you mentioned, can you elaborate more? Thanks! George On Tue, Jan 29, 2013 at 9:10 AM, Jack Krupansky wrote: > That depends on the value of "ed", and the indexed data. > > Another f

Re: Faceted search in OR

2013-01-29 Thread Nicola Buso
Hi Michael, I'm looking into implementing a solution. On Fri, 2013-01-25 at 16:23 -0500, Michael McCandless wrote: > On Fri, Jan 25, 2013 at 3:48 PM, Nicola Buso wrote: > > > if you have experiences in this use case can you share solutions? What > > is reusable from Lucene 4.x implementation? >

Re: Questions about FuzzyQuery in Lucene 4.x

2013-01-29 Thread Jack Krupansky
That depends on the value of "ed", and the indexed data. Another factor to take into consideration is that a case change ("Star" vs. "star") also counts as an edit. -- Jack Krupansky -Original Message- From: George Kelvin Sent: Tuesday, January 29, 2013 11:49 AM To: java-user@lucene

Re: Large Index Query Help!

2013-01-29 Thread Chris Hostetter
: Subject: Large Index Query Help! : References: <1359429227142-4036943.p...@n3.nabble.com> https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh emai

Re: CompressingStoredFieldsFormat doesn't show improvement

2013-01-29 Thread Adrien Grand
Arun, Lucene uses a very light compression algorithm so I'm a little surprised it can make indexing 2x slower. Could you run indexing under a profiler to make sure it really is what makes indexing slower? Thanks! -- Adrien - T

Re: CompressingStoredFieldsFormat doesn't show improvement

2013-01-29 Thread Ian Lea
Fair point, but with such small numbers per index any variation between versions is likely to just be noise. It is also certainly possible that on your index the compressing format may not help. An aside: have you considered merging the thousands of small indexes into one, with some field to iden

Re: CompressingStoredFieldsFormat doesn't show improvement

2013-01-29 Thread arun k
Hi Ian, you r right in that if we have 1 index of say 15 Mb there is no prob but i have thousands of such indexes. So the time will add up with the number of such indexes being open simultaneously and parallel indexing. Arun On Tue, Jan 29, 2013 at 7:09 PM, Ian Lea wrote: > I make that about

Re: update index for user defined types

2013-01-29 Thread Ian Lea
Well, lucene is a java API so if you can make java calls from an Oracle stored procedure, maybe you can make it work. Sounds a terrible idea to me, but then using Oracle is, in my opinion, a terrible idea, as are stored procedures. How about using the stored procedure to set some "reindexthisreco

Re: CompressingStoredFieldsFormat doesn't show improvement

2013-01-29 Thread Ian Lea
I make that about 15Mb of data - trivial. What happens if you make each field 400 chars and index a million or two? If you really have that few docs, what are you worrying about? A doubling of indexing time from 3.0.2 to 4.1 is surprising, but for 40k docs are we talking about it taking 2 second

Re: update index for user defined types

2013-01-29 Thread solruser13
Hi we are using oracle stored procedure which accepts UDT ( java object ) and returns the results as UDT ( java object ). we want to use this SP to update the index. Does lucene allows updating index in such a scenario. we plan to use solr ( solr uses lucene library ) for building index and searc

RE: Re: Large Index Query Help!

2013-01-29 Thread Uwe Schindler
Hi, > 4 JVM Flag : -Xms512m -Xmx1576m > 5 Other app don't occupy too much memory As said by Ian, read this blog post and you will understand that Lucene is not eating your memory. The "RES" column in TOP shows the actual memory usage (resident memory): PID USER PR NI VIRT RES SHR S

Re: Re: Large Index Query Help!

2013-01-29 Thread dizh
OK, I will show the scene. 1 OS : Redhat5 2 JVM 64bit jdk1.7 3 Lucene4.0 4 JVM Flag : -Xms512m -Xmx1576m 5 Other app don't occupy too much memory My Code is as follow: public synchronized IndexSearcher openSearcher() { if (QueryUtil.indexExist(SearchEnv.searchEnv.indexDir)

Re: update index for user defined types

2013-01-29 Thread Ian Lea
Please try and phrase your question in terms of lucene. Oracle? What's that? User defined type? What's that? IndexWriter has various updateDocuments() methods. I usually give all docs in my indexes a unique id, supplied by me (primary key in database terminology) and use the method that "Update

Re: Large Index Query Help!

2013-01-29 Thread Ian Lea
Lucene won't load the whole index into memory. See http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html What version of lucene? How are you opening index readers? How are you searching? How much memory are you giving the jvm? What else in your app is using all the memory?