Problem with special charecters

2008-12-02 Thread Ravichandra
Hi I am having problems with searching for special charecters like %,*,+,- etc. I am using standard analyzer. I create queries using the query constructor. I read in the forums and tried with QueryParser.escpae and parse methods, but the problem is if I have two fields as "ABC+S" and "ABC-S" and

Indexing Names in Lucene -- Thomas = Tom, etc

2008-12-02 Thread Khawaja Shams
Hello, I am indexing documents with a field that contains the first and last name of people. It is working wonderfully with a slight issue: if Thomas is indexed for a document, I would like searches for Tom to match that document. I am sure this is a common problem that many of you must have addre

Re: # of fields, performance

2008-12-02 Thread Mark Miller
There is not much impact as long as you turn off Norms for the majority of them. - Mark On Dec 2, 2008, at 8:47 AM, Darren Govoni <[EMAIL PROTECTED]> wrote: Hi, I saw this question asked before without a clear answer. Pardons if I missed it in the archive elsewhere. Is there a serious deg

Re: Query time document group boosting

2008-12-02 Thread Chris Hostetter
: > "foo AND ( : > groupboost_A:dummy^10 OR : > groupboost_B:dummy OR : > groupboost_C:dummy^0.1 OR : > ... : > groupboost_Z:dummy : > )" : > : > With that query, it seems that only documents matching foo will result : > in a hit and be scored? : : Someone else than me needs to answer this.

I would want to know more about the lucene implementation in C++

2008-12-02 Thread Ariel
Hi everybody: I have seen the lucene project for C++ has been abandoned, could you tell me if there is another similar implementation of java lucene in C++ ???

Re: Merging indexes & multicore/multithreading

2008-12-02 Thread Michael McCandless
With ConcurrentMergeScheduler, adding all indexes at once to a single IndexWriter will use multiple threads to do the merging, assuming you have enough total segments that need merging (> 2 X mergeFactor will use 2 threads; > 3 X mergeFactor will use 3, etc.; CMS defaults to max 3 merge t

RE: Merging indexes & multicore/multithreading

2008-12-02 Thread Sudarsan, Sithu D.
Our experience is, if the number of cores equal number of active threads, then it performs optimal using single JVM. Both on Windows XP and CentOS 5.2, with Lucene 2.3.2 Sincerely, Sithu D Sudarsan [EMAIL PROTECTED] [EMAIL PROTECTED] -Original Message- From: Glen Newton [mailto:[EMAIL

Re: StandardAnalyzer vs KeywordAnalyzer in Luke

2008-12-02 Thread Andrzej Bialecki
elguillelmo wrote: Kai_testing Middleton wrote: The nutch analyzer is NutchDocumentAnalyzer. Does anyone know how to add this to the Luke classpath? I tried this kind of thing but it didn't work I'm trying to work out the same thing, to no avail. Would anybody be able to detail how to add

Merging indexes & multicore/multithreading

2008-12-02 Thread Glen Newton
Let's say I have 8 indexes on a 4 core system and I want to merge them (inside a single vm instance). Is it better to do a single merge of all 8, or to in parallel threads merge in pairs, until there is only a single index left? I guess the question involves how multi-threaded merging is and if it

Re: StandardAnalyzer vs KeywordAnalyzer in Luke

2008-12-02 Thread elguillelmo
Kai_testing Middleton wrote: > > The nutch analyzer is NutchDocumentAnalyzer. Does anyone know how to add > this to the Luke classpath? I tried this kind of thing but it didn't work > I'm trying to work out the same thing, to no avail. Would anybody be able to detail how to add Nutch's Analy

Re: PhraseQuery and non-letter characters

2008-12-02 Thread Ng Vinny
Hi Ian Thanks for the suggestion. I was able to write the custom analyzer to return non-letters as tokens, as well as to keep the numeric characters instead of skipping them. This is probably not the best solution, but at least i can have a demo without bugs :-) To save time for others who may ha

Re: serialVersionUID issue between 2.3 and 2.4

2008-12-02 Thread Jason Rutherglen
I prefer Externalizable as well as it makes Serialization faster. Perhaps also for Query and it's subclasses to start? I had code to do this for Analyzer as well which could be useful, perhaps a different patch though. On Tue, Dec 2, 2008 at 2:22 AM, Michael McCandless < [EMAIL PROTECTED]> wrote

# of fields, performance

2008-12-02 Thread Darren Govoni
Hi, I saw this question asked before without a clear answer. Pardons if I missed it in the archive elsewhere. Is there a serious degradation of performance when using high number of fields per document? Like 100's? Is the impact more on the write than the read? What are the performance charact

Re: serialVersionUID issue between 2.3 and 2.4

2008-12-02 Thread Michael McCandless
Karl Wettin wrote: As for statically setting a serialVersionUID in the class, one could instead set it to a final value and implement Externalizable in order to keep binary compatibility between class versions that contains more changes than what the Serializable reflection support code

Re: serialVersionUID issue between 2.3 and 2.4

2008-12-02 Thread Michael McCandless
Jason Rutherglen wrote: Hi Mike, Can you build and release a 2.4 jar using the 2.3 build environment? I don't think that's the right approach here. We should, instead, directly fix Term.java to be serializable, if that's the goal. (Or, as a workaround, Karl's suggestion that you build y

RE: Pdf in Lucene?

2008-12-02 Thread tiziano bernardi
This is the exception: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.lucene.document.Document.add(Lorg/apache/lucene/document/Field;)V at org.pdfbox.searchengine.lucene.LucenePDFDocument.addUnindexedField(LucenePDFDocument.java:224) at org.pdfbox.searchengine.lucene.Lucene

Re: Boosting fields are searching or indexing time?

2008-12-02 Thread Marc Sturlese
Thanks, I clearly understood it. Grant Ingersoll-6 wrote: > > Possibly, but probably not. Index time boosting is generally done to > say one field is more important than another field, or one document is > more important than another document, whereas query time boosting > generally says