RE: Lucene index limit

2011-03-24 Thread Uwe Schindler
Are you sure that you not forgot to commit your changes? Maybe that's the reason you see only 32768 documents. There is no such low limit, the number of documents is limited by Integer.MAX_VALUE, number of terms is much higher... - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://ww

Lucene index limit

2011-03-24 Thread Pulkit Singhal
Is there some sort of default limit imposed on the Lucene indexes? I try to index 50k or 60k documents but when I use Luke to go inside the index and check the total # of entries indexed, it shows that there are only 32768 entries. It seems liek some sort of limit ... what should I look at to adjus

Re: Searching partial names using Lucene

2011-03-24 Thread Sujit Pal
I don't know if there is already an analyzer available for this, but you could use GATE or UIMA for Named Entity Extraction against names and expand the query to include the extra names that are used synonymously. You could do this outside Lucene or inline using a custom Lucene tokenizer that embed

Re: Distributing a Lucene application?

2011-03-24 Thread Chris Lu
It's great that the requirement is loose... But I suppose users would ask for more later. Well, I worked on DBSight, which covers more than just search. It also includes scheduling indexing, reindexing, and even rendering. In your case, you just need to specify a SQL and have index up and runnin

Searching partial names using Lucene

2011-03-24 Thread Deepak Konidena
Hi, I would like to build a search system where a search for "Dan" would also search for "Daniel" and a search for "Will", "William" . Any ideas on how to go about implementing that? I can think of writing a custom Analyzer that would map these partial tokens to their full firstname or lastnam

SmartChineseAnalyzer for Traditional Chinese?

2011-03-24 Thread Woolf, Ross
As I look at the api for SmartChineseAnalyzer it indicates it is for Simplified Chinese. Has anyone attempted modifying it for Traditional Chinese? Or does anyone know of any other "smart" analyzer that is geared towards traditional Chinese? Thanks, Ross

Using Fuzzysearch with MultiFieldQueryParser

2011-03-24 Thread Deepak Konidena
Hi, I am using MultiFieldQueryParser with a custom analyzer for parsing search text. Now, when I say MultiFieldQueryParser qp = new MultiFieldQueryParser(Version, new String[] {"field1", "field2", "field3"}, customAnalyzer); qp.setDefaultOperator(QueryParser.AND_OPERATOR); Query query = qp.p

Re: TermDoc to TermDocsEnum

2011-03-24 Thread Michael McCandless
Simplest solution is to wrap your findFeatures.reader in a SlowMultiReaderWrapper (as the exception suggests). More performant solution is to change your code to visit the sequential sub-readers of findFeatures.reader, directly. But if performance isn't important here, just do the simple solution

Re: Should I use MultiSearcher?

2011-03-24 Thread Ian Lea
https://issues.apache.org/jira/browse/LUCENE-2756. -- Ian. On Thu, Mar 24, 2011 at 2:13 PM, Devon H. O'Dell wrote: > 2011/3/24 Uwe Schindler : >> Don't use MultiSearcher. Instead create a MultiReader around the separate >> IndexReaders for each index and pass that MultiReader to a conventional

Re: Should I use MultiSearcher?

2011-03-24 Thread Ian Lea
Care to define huge? There often isn't a "best" solution but in this case I think I'd vote for the index-per-year approach. btw with recent versions of lucene you don't need to call optimize() very often, if at all, although you might want to run it at the beginning of each year against the previ

Re: Should I use MultiSearcher?

2011-03-24 Thread Devon H. O'Dell
2011/3/24 Uwe Schindler : > Don't use MultiSearcher. Instead create a MultiReader around the separate > IndexReaders for each index and pass that MultiReader to a conventional > IndexSearcher as IndexReader. MultiSearcher is very buggy. Could you elaborate on this point at all, Uwe? I'm using Para

RE: Should I use MultiSearcher?

2011-03-24 Thread Uwe Schindler
Don't use MultiSearcher. Instead create a MultiReader around the separate IndexReaders for each index and pass that MultiReader to a conventional IndexSearcher as IndexReader. MultiSearcher is very buggy. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u

Should I use MultiSearcher?

2011-03-24 Thread sol myr
Hi, I need to search a Catalog. Most users search *this* year's catalog, but on rare occasions they may ask for old products (from previous years). I'm trying to select between 2 options: 1) Keep huge big index for all years (where documents have a "year" field, so I can filter out the current ye

Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

2011-03-24 Thread fr . jurain
Hi David, thanks for your advice I'll keep it in mind. Best regards, François Jurain. > Message du 22/03/11 à 17h40 > De : "David Causse" > A : java-user@lucene.apache.org > Copie à : > Objet : Re: Wanted: a directory of quick-and-(not too)dirty analyzers for > multi-language RDF. >

BerlinBuzzwords 2011 Early Bird Ticket Period ends on April 7th.

2011-03-24 Thread Simon Willnauer
Hey folks, just a short notice for those who haven't noticed we have only a limited amount of Early-Bird tickets left and the Early-Bird period is ends on April 7th. If you want to get one of the 30 remaining tickets go and get one now here: http://berlinbuzzwords.de/content/tickets While we are