Re: I need 'cat???' to match 'cat' again!

2007-06-06 Thread Tim Smith
Hi! Doug's suggestion would make me perfectly happy if that patch were available now(!).. Thanks, Tim --- Steven Rowe <[EMAIL PROTECTED]> wrote: > Hi Tim, > > Tim Smith wrote: > > How can I restore the behavior of the old > > WildcardQuery under 2.1? > > I badly need 'cat???' to match 'cat' ag

Re: I need 'cat???' to match 'cat' again!

2007-06-06 Thread Tim Smith
Hi! Isn't RegexQuery slower than '???' at the end of a word? Thanks, Tim --- Daniel Noll <[EMAIL PROTECTED]> wrote: > On Wednesday 06 June 2007 17:48:10 Tim Smith wrote: > > Hi! > > > > How can I restore the behavior of the old > > WildcardQuery under 2.1? > > I badly need 'cat???' to match 'ca

Re: I need 'cat???' to match 'cat' again!

2007-06-06 Thread Tim Smith
Hi! The situation is the following: In my native language, stemming is not available for lucene AFAIK. ..and there are pleny of forms of words that needs to be stemmed. People usually search with '*' at the end of the word because of the reason, but because of the nature of our language, in most

Re: How can I search over all documents NOT in a certain subset?

2007-06-06 Thread Antony Bowesman
Steven Rowe wrote: Conceptually (caveat: untested), you could: 1. Extend Filter[1] (call it DejaVuFilter) to hold a BitSet per IndexReader. The BitSet would hold one bit per doc[2], each initialized to true. 2. Unset a DejaVuFilter instance's bit for each of your top N docs by walking the TopD

Need Lucene Compression help -- can pay nominal fee

2007-06-06 Thread lucenebuyer
All, I need to store all the attributes of the document i index as part of the index. And I need to get the size of the files as close to 20% of the original size as possible. If anyone can help with this I can pay a nominal fee. Please contact me if anyone can help. BG. -- View this message in

Re: IndexWriter.Optimize() is too slow and IOException! How Can I do?

2007-06-06 Thread James liu
U should split your index. large index will cause query speed, optimize speed when u index. 2007/6/7, 童小军 <[EMAIL PROTECTED]>: my indexfile to 8G when I optionze() the index.program is too slow . and some time IOException. And use TOO memory . xiaojun tong 010-64489518-613 [EMAIL PROTEC

IndexWriter.Optimize() is too slow and IOException! How Can I do?

2007-06-06 Thread 童小军
my indexfile to 8G when I optionze() the index.program is too slow . and some time IOException. And use TOO memory . xiaojun tong 010-64489518-613 [EMAIL PROTECTED] www.feedsky.com

RE: How can I search over all documents NOT in a certain subset?

2007-06-06 Thread Hilton Campbell
Steve, Thanks for the great reply! That worked like a charm. I really appreciate it. Thanks, Hilton Campbell -Original Message- From: Steven Rowe [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 05, 2007 2:50 PM To: java-user@lucene.apache.org Subject: Re: How can I search over all docum

Re: issues with optimizer

2007-06-06 Thread Chris Hostetter
: actually its more of a "general" question rather than a compass specific : one. Here's my complete process: : I have incoming data being indexed every hour. The data varies from 100 to : 1 documents. I'm also having the index optimized via Compass (using its : Adaptive or Aggressive optim

Re: I need 'cat???' to match 'cat' again!

2007-06-06 Thread Daniel Noll
On Wednesday 06 June 2007 17:48:10 Tim Smith wrote: > Hi! > > How can I restore the behavior of the old > WildcardQuery under 2.1? > I badly need 'cat???' to match 'cat' again just like > in the older versions. > > I could modify my istance of lucene by removing those > "new" lines, but I don't wan

Re: Indexing PDF document

2007-06-06 Thread jim shirreffs
ok thnaks found FontBox/jar on the net, but now I see the included jars with pdfbox. I expected them to be in /lib not /external my bad. thanks again jim s - Original Message - From: "Ben Litchfield" <[EMAIL PROTECTED]> To: Sent: Wednesday, June 06, 2007 6:14 PM Subject: Re: Index

Re: Indexing PDF document

2007-06-06 Thread Ben Litchfield
you need to include the both the bouncy castle jars and FontBox jar. Both are included with the PDFBox distribution. Ben Quoting jim shirreffs <[EMAIL PROTECTED]>: Thanks I rebuilt PDFbox and got past that problem but now I am getting Exception in thread "main" java.lang.NoClassDefFoundEr

Re: Indexing PDF document

2007-06-06 Thread jim shirreffs
Thanks I rebuilt PDFbox and got past that problem but now I am getting Exception in thread "main" java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider seems my test pdf file is provider locked so I tried a Lucene pdf file and got java.lang.NoClassDefFoundError

Re: Indexing PDF document

2007-06-06 Thread Chris Hostetter
: Exception in thread "main" java.lang.NoSuchMethodError: : org.apache.lucene.document.Document.add(Lo : rg/apache/lucene/document/Field;)V : Very strange since the exception is NoSuchMethod Document.add(Field) I *believe* the problem is that there is actually no such method (Document.add(Field

Indexing PDF document

2007-06-06 Thread jim shirreffs
Well I got no where trying to index openoffice documents so I thought I try indexing PDF documents. Seemed Like PDFBox was a good bet, claimed to offer Lucene support and was on the Lucene recommended list. But after numerious attempts failed I decided try the IndexFiles.java that comes with PDF

Re: issues with optimizer

2007-06-06 Thread moraleslos
Hoss, actually its more of a "general" question rather than a compass specific one. Here's my complete process: I have incoming data being indexed every hour. The data varies from 100 to 1 documents. I'm also having the index optimized via Compass (using its Adaptive or Aggressive optimize

Re: Case Insensitive but not Tokenized

2007-06-06 Thread Chris Hostetter
: For now, I am storing both the exact phrase (as is, for retrieval) and : the string lower-cased (to search against) with no analyzers in the : index. When I search, I lower-case my query string and search against : my lower-cased index, I give the matching exact phrase back to the user. : This

Re: issues with optimizer

2007-06-06 Thread Chris Hostetter
I have no idea what it is you are asking ... it seems to be very compass specifci .. perhaps you should be consulting a Compass User group? : I'm running into the "WARN | Compass Scheduled Optimizer | AdaptiveOptimizer : | ne.optimizer.AdaptiveOptimizer 104 | Failed to obtain lock on sub-index

Case Insensitive but not Tokenized

2007-06-06 Thread Anna Putnam
I am trying to implement a prefix query search where I want the searching to be case insensitive but not tokenized (I want to preserve exact phrases). For now, I am storing both the exact phrase (as is, for retrieval) and the string lower-cased (to search against) with no analyzers in the ind

issues with optimizer

2007-06-06 Thread moraleslos
I'm running into the "WARN | Compass Scheduled Optimizer | AdaptiveOptimizer | ne.optimizer.AdaptiveOptimizer 104 | Failed to obtain lock on sub-index [book], will do it next time." messages more frequently, most likely due to the lucene index getting enormous. The adaptive optimizer is scheduled

Re: I need 'cat???' to match 'cat' again!

2007-06-06 Thread Steven Rowe
Hi Tim, Tim Smith wrote: > How can I restore the behavior of the old > WildcardQuery under 2.1? > I badly need 'cat???' to match 'cat' again just like > in the older versions. The behavior you want was last sighted in Java Lucene four releases ago (v1.4.3). See Doug Cutting's response to a simil

Re: I need 'cat???' to match 'cat' again!

2007-06-06 Thread Erick Erickson
Well, having your application depend upon incorrect behavior is...er...fraught. It looks like what you really want is custom behavior for multiple question marks, perhaps only with multiple question marks at the end of your query? If this is the case, I'd think about substituting splat (*) in th

I need 'cat???' to match 'cat' again!

2007-06-06 Thread Tim Smith
Hi! How can I restore the behavior of the old WildcardQuery under 2.1? I badly need 'cat???' to match 'cat' again just like in the older versions. I could modify my istance of lucene by removing those "new" lines, but I don't want to maintain a custom lucene package. Please help! Tim Source