Hi,
I hacked the lucene 1.2 a little while ago and I am trying to use my
own similarity algorithm. If you are interested in the changes I have
made to the Lucene 1.2, you can email me back at chenjian1227 at
gmail.com
Cheers,
Jian
On 8/18/05, Karl Koch <[EMAIL PROTECTED]> wrote:
> Hello Lucen
On Aug 18, 2005, at 6:22 PM, [EMAIL PROTECTED] wrote:
On Thu, 2005-08-18 at 17:16, [EMAIL PROTECTED] wrote:
Thanks again! The analyzer is working now. But seems like
actually the
QueryParser I am using is probably converting the queries to
lowercase
first. Is there any way to stop that? He
> On Thu, 2005-08-18 at 17:16, [EMAIL PROTECTED] wrote:
>> Thanks again! The analyzer is working now. But seems like actually the
>> QueryParser I am using is probably converting the queries to lowercase
>> first. Is there any way to stop that? Here is the line of code where I
>> am
>> parsing:
>>
On Thu, 2005-08-18 at 17:16, [EMAIL PROTECTED] wrote:
> Thanks again! The analyzer is working now. But seems like actually the
> QueryParser I am using is probably converting the queries to lowercase
> first. Is there any way to stop that? Here is the line of code where I am
> parsing:
>
> Query q
Thanks again! The analyzer is working now. But seems like actually the
QueryParser I am using is probably converting the queries to lowercase
first. Is there any way to stop that? Here is the line of code where I am
parsing:
Query query = QueryParser.parse(line, "contents", analyzer);
As for anal
On Aug 18, 2005, at 3:51 PM, Dan Armbrust wrote:
I am implementing a filter that will remove certain characters from
the tokens - thing like '(', etc - but the chars to be removed will
be customizable.
This is what I have come up with - but it doesn't seem very
efficient. Is there a bette
On Aug 18, 2005, at 4:16 PM, [EMAIL PROTECTED] wrote:
Thanks! I have used StopAnalyzer to index. Does it lower-case before
indexing? I don't touch the query string before sending for
searching, so
the query string is not lower-cases.
Pretty much all built-in Lucene analyzers lower-case:
Ok, seems like it does is a LowerCaseFilter. Is there any analyzer that do
the same thing as StopAnalyzer does, except for lowering the case? Cuz
StopAnalyzer best fits my purpose.
> Thanks! I have used StopAnalyzer to index. Does it lower-case before
> indexing? I don't touch the query string bef
Thanks! I have used StopAnalyzer to index. Does it lower-case before
indexing? I don't touch the query string before sending for searching, so
the query string is not lower-cases.
> The search really is case sensitive, it's just that all input is
> usually lower-cased, so it feels like it's case i
On Aug 18, 2005, at 3:50 PM, [EMAIL PROTECTED] wrote:
Is there any way to do a case-sensitive search?
All Lucene searches are case-sensitive, actually.
But most often a lowercasing analyzer is used. So the trick is to
change the analysis process to not lowercase. It gets more fun when
y
The search really is case sensitive, it's just that all input is
usually lower-cased, so it feels like it's case insensitive. In other
words, don't lower-case your input before indexing, and don't
lower-case your queries (i.e. pick an Analyzer that doesn't
lower-case).
Otis
--- [EMAIL PROTECTED
I am implementing a filter that will remove certain characters from the
tokens - thing like '(', etc - but the chars to be removed will be
customizable.
This is what I have come up with - but it doesn't seem very efficient.
Is there a better way?
Should I be adjusting the token endOffset when
Is there any way to do a case-sensitive search?
Thanks
Tareque
ControlDOCS
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Tony Schwartz wrote:
What about the TermInfosReader class? It appears to read the entire term set
for the
segment into 3 arrays. Am I seeing double on this one?
p.s. I am looking at the current sources.
see TermInfosReader.ensureIndexIsRead();
The index only has 1/128 of the terms, by def
Hello, all.
I'm trying to optimize an index, and I get this exception... A copy of this
index made a couple weeks ago optimized correctly, and I don't THINK there have
been any changes made to this index since there. (but there may have been)
I also couldn't find anything about this in the
Does Lucene 1.3 theoretically run on Java 1.2 ? I have tried and got JIT
errors when trying to search an index on the harddisk:
--- output from Eclipse Java IDE---
A nonfatal internal JIT (3.10.107(x)) error 'chgTarg: Conditional' has
occurred in :
'org/apach
Hello Lucene experts,
as you might have seen in my previous postings, I am bound to use not more
than Lucene 1.2 (due to hardware limitations I can only use Java 1.1 or
1.2).
I would like to do my own Similarity implementation which, I think, would
allow me to insert other algorithms in Lucene wh
Chris D wrote:
Well in my case field order is important, but the order of the
individual fields isn't. So I can speed up getFields to roughly O(1)
by implementing Document as follows.
Have you actually found getFields to be a performance bottleneck in your
application? I'd be surprised if it
Fredrik wrote:
Opening the index with Luke, I can see the following:
Number of fields: 17
Number of documents: 1165726
Number of terms: 6721726
The size of the index is approx 5,3 GB.
Lucene version is 1.4.3.
The index contains Norwegian terms, but lots of inline HTML, etc
is probably increasin
Tony Schwartz wrote:
I think you're jumping into the conversation too late. What you have said here
does not
address the problem at hand. That is, in TermInfosReader, all terms in the
segment get
loaded into three very large arrays.
That's not true. Only 1/128th of the terms are loaded by
Hello Lucene experts,
as you might have seen in my previous postings, I am bound to use not more
than Lucene 1.2 (due to hardware limitations I can only use Java 1.1 or
1.2).
I would like to do my own Similarity implementation which, I think, would
allow me to insert other algorithms in Lucene w
Hello Erik,
I find "Lucene in Action" an extemely well written and easy accessable book
and I must say: Well done (including of course everybody who participated to
the book).
Naturally the book is very strong on the latest version on Lucene. I
currently, and you may have realised that on all my
On Thursday 18 August 2005 14:32, Tony Schwartz wrote:
> Is this a viable solution?
> Doesn't this make sorting and filtering much more complex and much more
> expensive as well?
Sorting would have to be done on more than one field.
I would expect that to be possible.
As for filtering: would you
We have an index with approximately 1,2 million documents. Web site
users search this index, but we get sporadic out of memory errors,
as Lucene tries to allocate over 500 MB of memory.
Opening the index with Luke, I can see the following:
Number of fields: 17
Number of documents: 1165726
Number o
What about trying something like:
BooleanQuery booQuery = new BooleanQuery();
Query titleQuery = null;
QueryParser.Operator operator = contentParser.getDefaultOperator();
if(QueryParser.Operator.AND == operator){
//logger.debug("Content Ope
I think you're jumping into the conversation too late. What you have said here
does not
address the problem at hand. That is, in TermInfosReader, all terms in the
segment get
loaded into three very large arrays. If your index is massive and has many
fields
indexed (dates for example), you nee
You can still have the complete date as a separate field, and sort or
filter by it, just don't use this field in your query.
Aviran
http://www.aviransplace.com
-Original Message-
From: Tony Schwartz [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 18, 2005 8:32 AM
To: java-user@lucene.ap
Try to decrease the merge factor, and I would also check the Max number of
files allowed to be opened in the OS.
HTH
Aviran
http://www.aviransplace.com
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 18, 2005 7:34 AM
To: java-user@lucene.apa
Thanks, Very nice article :)
Aviran
http://www.aviransplace.com
-Original Message-
From: Joseph B. Ottinger [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 18, 2005 7:22 AM
To: java-user@lucene.apache.org
Subject: Re: [ANN] Lucene "Did You Mean" article on java.net
TSS referred to it,
Is this a viable solution?
Doesn't this make sorting and filtering much more complex and much more
expensive as well?
Tony Schwartz
[EMAIL PROTECTED]
> On Wednesday 17 August 2005 22:49, Paul Elschot wrote:
>> > the index could potentially be huge.
>> >
>> > So if this is indeed the case, it is
Hi,
i have a problem with the indexing.
It concerns the following... i index documents in one directory.
In this directory there are many other directories with documents... etc.
Above 1000 directories! And i have one Index directory.
So i get the Exception: Exception in thread "main"
java.io.Fil
TSS referred to it, too. :)
On Thu, 18 Aug 2005, Tom White wrote:
In case subscribers to this list missed it, my article on how to add a
"did you mean" facility to Lucene searches was published last week:
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html.
Regards,
Tom
On Aug 18, 2005, at 1:48 AM, Karthik N S wrote:
Does this mean MultiFieldQueryParser will always have to use
'DEFAULT_OPERATOR_OR' instead of DEFAULT_OPERATOR_AND
operations.
Yup, that's what I said :)
Is there any alternative in handling this processs ( other then API
'replaceAll(" ", " A
Nice article!
And for those interested in the different "did you mean" techniques can
also look at my simple implementation using the first approach mentioned
in the article, minimum edit distance, along with document frequency.
This implementation can easily be applied over an existing index.
ht
In case subscribers to this list missed it, my article on how to add a
"did you mean" facility to Lucene searches was published last week:
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html.
Regards,
Tom
-
To unsubscri
35 matches
Mail list logo