答复: Lucene Analyzer that can handle C++ vs C#

2009-12-11 Thread 王巍巍
Mappingcharfilter may bring you some ideas - 原始邮件 - 发件人: maxSchlein 发送时间: 2009年12月12日 星期六 1:09 收件人: java-user@lucene.apache.org 主题: Lucene Analyzer that can handle C++ vs C# Can someone please point me in the right direction. We are creating an application that needs to beable to searc

Re: Lucene Analyzer that can handle C++ vs C#

2009-12-11 Thread Chris Lu
What we did in DBSight is to provide a reserved list of words for every Lucene Analyzer. This way you can handle any special characters like C++ and C#. Any common analyzers usually are not suitable for these special words. -- Chris Lu - Instant Scalable Full-Text Search

RE: Lucene 3.0.0 writer with a Lucene 2.3.1 index

2009-12-11 Thread Uwe Schindler
The index *should* grow after merging/optimizing, but it will only do this, if the fields you had compressed were not bigger then without compression. One of the tests showed: A string field with 80 ascii chars needed compressed about 250 bytes, which is 3 times (as chars are UTF-8 encoded) the unc

Re: Lucene 3.0.0 writer with a Lucene 2.3.1 index

2009-12-11 Thread Anshum
Hi Tom, Pt 3: As per my knowledge, it wouldn't be a 'mixture' of 2 index types. Rather, as soon as you optimize (or do a IndexWriter operation on the current index), it would expand the index to a non compressed format. I read it somewhere in the release notes that on doing so, a growth in the inde

Re: Lucene Analyzer that can handle C++ vs C#

2009-12-11 Thread AHMET ARSLAN
> Can someone please point me in the right direction. > > We are creating an application that needs to beable to > search on C++ and get > back doc's that have C++ in it.  The StandardAnalyzer > does not seem to index > the "+", so a search for "C++" will bring back docs that > contain, C++, C, >

Lucene 3.0.0 writer with a Lucene 2.3.1 index

2009-12-11 Thread Rob Staveley (Tom)
I'm upgrading from 2.3.1 to 3.0.0. I have 3.0.0 index readers ready to go into production and writers in the process of upgrading to 3.0.0. I think understand the implications of http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats for the upgrade, but I'd love it if someone coul

Re: IndexingChain and TermHash

2009-12-11 Thread Renaud Delbru
Hi Michael, I am reporting my experience with the codec interface. I have successfully implemented my own encoding, which is a kind of simplified tree-based encoding (similarly to what you can find in XML IR). You can have more information about my project (siren) on [1]. The basic idea is to

Lucene Analyzer that can handle C++ vs C#

2009-12-11 Thread maxSchlein
Can someone please point me in the right direction. We are creating an application that needs to beable to search on C++ and get back doc's that have C++ in it. The StandardAnalyzer does not seem to index the "+", so a search for "C++" will bring back docs that contain, C++, C, C#, etc. The

Re: Returns nothing when sorting

2009-12-11 Thread Ian Lea
Hi Sounds very odd. I suggest you break it down into the smallest self-contained program/test case that demonstrates the problem. If that doesn't help you find the problem, post it here. -- Ian. On Fri, Dec 11, 2009 at 8:10 AM, Michel Nadeau wrote: > By the way the same search + filter com

Re: heap memory issues when sorting by a string field

2009-12-11 Thread Michael McCandless
How long does Lucene take to build the ords for the toplevel reader? You should be able to just time FieldCache.getStringIndex(topLevelReader). I think your 8.5 seconds for first Lucene search was with the StringIndex computed per segment? Mike On Fri, Dec 11, 2009 at 8:30 AM, Toke Eskildsen

Re: Recover special terms from StandardTokenizer

2009-12-11 Thread Weiwei Wang
Thanks, Koji On Fri, Dec 11, 2009 at 7:59 PM, Koji Sekiguchi wrote: > MappingCharFilter can be used to convert c++ to cplusplus. > > Koji > > -- > http://www.rondhuit.com/en/ > > > > Anshum wrote: > >> How about getting the original token stream and then converting c++ to >> cplusplus or anyothe

Re: heap memory issues when sorting by a string field

2009-12-11 Thread Toke Eskildsen
I've spend the last day working on a multipass order builder, where the order is defined by a Collator and stored in an int-array. Compromising a bit on the "minimal memory at all cost"-approach resulted in a fair boost in speed, but it's still very slow for the first sorted search, compared to Luc

Re: Recover special terms from StandardTokenizer

2009-12-11 Thread Koji Sekiguchi
MappingCharFilter can be used to convert c++ to cplusplus. Koji -- http://www.rondhuit.com/en/ Anshum wrote: How about getting the original token stream and then converting c++ to cplusplus or anyother such transform. Or perhaps you might look at using/extending(in the non java sense) some ot

Re: Recover special terms from StandardTokenizer

2009-12-11 Thread Anshum
How about getting the original token stream and then converting c++ to cplusplus or anyother such transform. Or perhaps you might look at using/extending(in the non java sense) some other tokenized! -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everyb

Re: Returns nothing when sorting

2009-12-11 Thread Michel Nadeau
By the way the same search + filter combination but with a sort on another field (string) works. It seems only the float sort isn't working. The float sort is working correctly in other conditions though. I'm very puzzled ! - Mike aka...@gmail.com On Fri, Dec 11, 2009 at 2:52 AM, Michel Nadeau

RE: Index file compatibility and a migration plan to lucene 3

2009-12-11 Thread Uwe Schindler
Exactly. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Nigel [mailto:nigelspl...@gmail.com] > Sent: Friday, December 11, 2009 2:56 AM > To: java-user@lucene.apache.org > Subject: Re: Index file compatib