Re: MySimilarity with Lucene 1.2 ?

2005-08-18 Thread jian chen
Hi, I hacked the lucene 1.2 a little while ago and I am trying to use my own similarity algorithm. If you are interested in the changes I have made to the Lucene 1.2, you can email me back at chenjian1227 at gmail.com Cheers, Jian On 8/18/05, Karl Koch <[EMAIL PROTECTED]> wrote: > Hello Lucen

Re: Case-sensitive search

2005-08-18 Thread Erik Hatcher
On Aug 18, 2005, at 6:22 PM, [EMAIL PROTECTED] wrote: On Thu, 2005-08-18 at 17:16, [EMAIL PROTECTED] wrote: Thanks again! The analyzer is working now. But seems like actually the QueryParser I am using is probably converting the queries to lowercase first. Is there any way to stop that? He

Re: Case-sensitive search

2005-08-18 Thread tareque
> On Thu, 2005-08-18 at 17:16, [EMAIL PROTECTED] wrote: >> Thanks again! The analyzer is working now. But seems like actually the >> QueryParser I am using is probably converting the queries to lowercase >> first. Is there any way to stop that? Here is the line of code where I >> am >> parsing: >>

Re: Case-sensitive search

2005-08-18 Thread Luke Francl
On Thu, 2005-08-18 at 17:16, [EMAIL PROTECTED] wrote: > Thanks again! The analyzer is working now. But seems like actually the > QueryParser I am using is probably converting the queries to lowercase > first. Is there any way to stop that? Here is the line of code where I am > parsing: > > Query q

Re: Case-sensitive search

2005-08-18 Thread tareque
Thanks again! The analyzer is working now. But seems like actually the QueryParser I am using is probably converting the queries to lowercase first. Is there any way to stop that? Here is the line of code where I am parsing: Query query = QueryParser.parse(line, "contents", analyzer); As for anal

Re: Token Filter question

2005-08-18 Thread Erik Hatcher
On Aug 18, 2005, at 3:51 PM, Dan Armbrust wrote: I am implementing a filter that will remove certain characters from the tokens - thing like '(', etc - but the chars to be removed will be customizable. This is what I have come up with - but it doesn't seem very efficient. Is there a bette

Re: Case-sensitive search

2005-08-18 Thread Erik Hatcher
On Aug 18, 2005, at 4:16 PM, [EMAIL PROTECTED] wrote: Thanks! I have used StopAnalyzer to index. Does it lower-case before indexing? I don't touch the query string before sending for searching, so the query string is not lower-cases. Pretty much all built-in Lucene analyzers lower-case:

Re: Case-sensitive search

2005-08-18 Thread tareque
Ok, seems like it does is a LowerCaseFilter. Is there any analyzer that do the same thing as StopAnalyzer does, except for lowering the case? Cuz StopAnalyzer best fits my purpose. > Thanks! I have used StopAnalyzer to index. Does it lower-case before > indexing? I don't touch the query string bef

Re: Case-sensitive search

2005-08-18 Thread tareque
Thanks! I have used StopAnalyzer to index. Does it lower-case before indexing? I don't touch the query string before sending for searching, so the query string is not lower-cases. > The search really is case sensitive, it's just that all input is > usually lower-cased, so it feels like it's case i

Re: Case-sensitive search

2005-08-18 Thread Erik Hatcher
On Aug 18, 2005, at 3:50 PM, [EMAIL PROTECTED] wrote: Is there any way to do a case-sensitive search? All Lucene searches are case-sensitive, actually. But most often a lowercasing analyzer is used. So the trick is to change the analysis process to not lowercase. It gets more fun when y

Re: Case-sensitive search

2005-08-18 Thread Otis Gospodnetic
The search really is case sensitive, it's just that all input is usually lower-cased, so it feels like it's case insensitive. In other words, don't lower-case your input before indexing, and don't lower-case your queries (i.e. pick an Analyzer that doesn't lower-case). Otis --- [EMAIL PROTECTED

Token Filter question

2005-08-18 Thread Dan Armbrust
I am implementing a filter that will remove certain characters from the tokens - thing like '(', etc - but the chars to be removed will be customizable. This is what I have come up with - but it doesn't seem very efficient. Is there a better way? Should I be adjusting the token endOffset when

Case-sensitive search

2005-08-18 Thread tareque
Is there any way to do a case-sensitive search? Thanks Tareque ControlDOCS - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: OutOfMemoryError on addIndexes()

2005-08-18 Thread Doug Cutting
Tony Schwartz wrote: What about the TermInfosReader class? It appears to read the entire term set for the segment into 3 arrays. Am I seeing double on this one? p.s. I am looking at the current sources. see TermInfosReader.ensureIndexIsRead(); The index only has 1/128 of the terms, by def

java.io.IOException: term out of order

2005-08-18 Thread Dan Quaroni
Hello, all. I'm trying to optimize an index, and I get this exception... A copy of this index made a couple weeks ago optimized correctly, and I don't THINK there have been any changes made to this index since there. (but there may have been) I also couldn't find anything about this in the

Lucene 1.3 on Java 1.2 ?

2005-08-18 Thread Karl Koch
Does Lucene 1.3 theoretically run on Java 1.2 ? I have tried and got JIT errors when trying to search an index on the harddisk: --- output from Eclipse Java IDE--- A nonfatal internal JIT (3.10.107(x)) error 'chgTarg: Conditional' has occurred in : 'org/apach

MySimilarity with Lucene 1.2 ?

2005-08-18 Thread Karl Koch
Hello Lucene experts, as you might have seen in my previous postings, I am bound to use not more than Lucene 1.2 (due to hardware limitations I can only use Java 1.1 or 1.2). I would like to do my own Similarity implementation which, I think, would allow me to insert other algorithms in Lucene wh

Re: Indexing document instances and retrieving instance attributes

2005-08-18 Thread Doug Cutting
Chris D wrote: Well in my case field order is important, but the order of the individual fields isn't. So I can speed up getFields to roughly O(1) by implementing Document as follows. Have you actually found getFields to be a performance bottleneck in your application? I'd be surprised if it

Re: OutOfMemory error when searching

2005-08-18 Thread Doug Cutting
Fredrik wrote: Opening the index with Luke, I can see the following: Number of fields: 17 Number of documents: 1165726 Number of terms: 6721726 The size of the index is approx 5,3 GB. Lucene version is 1.4.3. The index contains Norwegian terms, but lots of inline HTML, etc is probably increasin

Re: OutOfMemoryError on addIndexes()

2005-08-18 Thread Doug Cutting
Tony Schwartz wrote: I think you're jumping into the conversation too late. What you have said here does not address the problem at hand. That is, in TermInfosReader, all terms in the segment get loaded into three very large arrays. That's not true. Only 1/128th of the terms are loaded by

MySimilarity with Lucene 1.2 ?

2005-08-18 Thread Karl Koch
Hello Lucene experts, as you might have seen in my previous postings, I am bound to use not more than Lucene 1.2 (due to hardware limitations I can only use Java 1.1 or 1.2). I would like to do my own Similarity implementation which, I think, would allow me to insert other algorithms in Lucene w

Re: Books about Lucene?

2005-08-18 Thread Karl Koch
Hello Erik, I find "Lucene in Action" an extemely well written and easy accessable book and I must say: Well done (including of course everybody who participated to the book). Naturally the book is very strong on the latest version on Lucene. I currently, and you may have realised that on all my

Re: OutOfMemoryError on addIndexes()

2005-08-18 Thread Paul Elschot
On Thursday 18 August 2005 14:32, Tony Schwartz wrote: > Is this a viable solution? > Doesn't this make sorting and filtering much more complex and much more > expensive as well? Sorting would have to be done on more than one field. I would expect that to be possible. As for filtering: would you

OutOfMemory error when searching

2005-08-18 Thread Fredrik
We have an index with approximately 1,2 million documents. Web site users search this index, but we get sporadic out of memory errors, as Lucene tries to allocate over 500 MB of memory. Opening the index with Luke, I can see the following: Number of fields: 17 Number of documents: 1165726 Number o

RE: DEFAULT_OPERATOR_AND

2005-08-18 Thread Andrew Boyd
What about trying something like: BooleanQuery booQuery = new BooleanQuery(); Query titleQuery = null; QueryParser.Operator operator = contentParser.getDefaultOperator(); if(QueryParser.Operator.AND == operator){ //logger.debug("Content Ope

RE: OutOfMemoryError on addIndexes()

2005-08-18 Thread Tony Schwartz
I think you're jumping into the conversation too late. What you have said here does not address the problem at hand. That is, in TermInfosReader, all terms in the segment get loaded into three very large arrays. If your index is massive and has many fields indexed (dates for example), you nee

RE: OutOfMemoryError on addIndexes()

2005-08-18 Thread Mordo, Aviran (EXP N-NANNATEK)
You can still have the complete date as a separate field, and sort or filter by it, just don't use this field in your query. Aviran http://www.aviransplace.com -Original Message- From: Tony Schwartz [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 8:32 AM To: java-user@lucene.ap

RE: w.fnm (System can not find file.)

2005-08-18 Thread Mordo, Aviran (EXP N-NANNATEK)
Try to decrease the merge factor, and I would also check the Max number of files allowed to be opened in the OS. HTH Aviran http://www.aviransplace.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 7:34 AM To: java-user@lucene.apa

RE: [ANN] Lucene "Did You Mean" article on java.net

2005-08-18 Thread Mordo, Aviran (EXP N-NANNATEK)
Thanks, Very nice article :) Aviran http://www.aviransplace.com -Original Message- From: Joseph B. Ottinger [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 7:22 AM To: java-user@lucene.apache.org Subject: Re: [ANN] Lucene "Did You Mean" article on java.net TSS referred to it,

Re: OutOfMemoryError on addIndexes()

2005-08-18 Thread Tony Schwartz
Is this a viable solution? Doesn't this make sorting and filtering much more complex and much more expensive as well? Tony Schwartz [EMAIL PROTECTED] > On Wednesday 17 August 2005 22:49, Paul Elschot wrote: >> > the index could potentially be huge. >> > >> > So if this is indeed the case, it is

w.fnm (System can not find file.)

2005-08-18 Thread dozean
Hi, i have a problem with the indexing. It concerns the following... i index documents in one directory. In this directory there are many other directories with documents... etc. Above 1000 directories! And i have one Index directory. So i get the Exception: Exception in thread "main" java.io.Fil

Re: [ANN] Lucene "Did You Mean" article on java.net

2005-08-18 Thread Joseph B. Ottinger
TSS referred to it, too. :) On Thu, 18 Aug 2005, Tom White wrote: In case subscribers to this list missed it, my article on how to add a "did you mean" facility to Lucene searches was published last week: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html. Regards, Tom

Re: DEFAULT_OPERATOR_AND

2005-08-18 Thread Erik Hatcher
On Aug 18, 2005, at 1:48 AM, Karthik N S wrote: Does this mean MultiFieldQueryParser will always have to use 'DEFAULT_OPERATOR_OR' instead of DEFAULT_OPERATOR_AND operations. Yup, that's what I said :) Is there any alternative in handling this processs ( other then API 'replaceAll(" ", " A

Re: [ANN] Lucene "Did You Mean" article on java.net

2005-08-18 Thread Ronnie Kolehmainen
Nice article! And for those interested in the different "did you mean" techniques can also look at my simple implementation using the first approach mentioned in the article, minimum edit distance, along with document frequency. This implementation can easily be applied over an existing index. ht

[ANN] Lucene "Did You Mean" article on java.net

2005-08-18 Thread Tom White
In case subscribers to this list missed it, my article on how to add a "did you mean" facility to Lucene searches was published last week: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html. Regards, Tom - To unsubscri