Mappingcharfilter may bring you some ideas
- 原始邮件 -
发件人: maxSchlein
发送时间: 2009年12月12日 星期六 1:09
收件人: java-user@lucene.apache.org
主题: Lucene Analyzer that can handle C++ vs C#
Can someone please point me in the right direction.
We are creating an application that needs to beable to searc
What we did in DBSight is to provide a reserved list of words for every
Lucene Analyzer.
This way you can handle any special characters like C++ and C#.
Any common analyzers usually are not suitable for these special words.
--
Chris Lu
-
Instant Scalable Full-Text Search
The index *should* grow after merging/optimizing, but it will only do this,
if the fields you had compressed were not bigger then without compression.
One of the tests showed: A string field with 80 ascii chars needed
compressed about 250 bytes, which is 3 times (as chars are UTF-8 encoded)
the unc
Hi Tom,
Pt 3: As per my knowledge, it wouldn't be a 'mixture' of 2 index types.
Rather, as soon as you optimize (or do a IndexWriter operation on the
current index), it would expand the index to a non compressed format. I read
it somewhere in the release notes that on doing so, a growth in the inde
> Can someone please point me in the right direction.
>
> We are creating an application that needs to beable to
> search on C++ and get
> back doc's that have C++ in it. The StandardAnalyzer
> does not seem to index
> the "+", so a search for "C++" will bring back docs that
> contain, C++, C,
>
I'm upgrading from 2.3.1 to 3.0.0. I have 3.0.0 index readers ready to go
into production and writers in the process of upgrading to 3.0.0.
I think understand the implications of
http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats for
the upgrade, but I'd love it if someone coul
Hi Michael,
I am reporting my experience with the codec interface. I have
successfully implemented my own encoding, which is a kind of simplified
tree-based encoding (similarly to what you can find in XML IR). You can
have more information about my project (siren) on [1]. The basic idea is
to
Can someone please point me in the right direction.
We are creating an application that needs to beable to search on C++ and get
back doc's that have C++ in it. The StandardAnalyzer does not seem to index
the "+", so a search for "C++" will bring back docs that contain, C++, C,
C#, etc. The
Hi
Sounds very odd. I suggest you break it down into the smallest
self-contained program/test case that demonstrates the problem. If
that doesn't help you find the problem, post it here.
--
Ian.
On Fri, Dec 11, 2009 at 8:10 AM, Michel Nadeau wrote:
> By the way the same search + filter com
How long does Lucene take to build the ords for the toplevel reader?
You should be able to just time FieldCache.getStringIndex(topLevelReader).
I think your 8.5 seconds for first Lucene search was with the
StringIndex computed per segment?
Mike
On Fri, Dec 11, 2009 at 8:30 AM, Toke Eskildsen
Thanks, Koji
On Fri, Dec 11, 2009 at 7:59 PM, Koji Sekiguchi wrote:
> MappingCharFilter can be used to convert c++ to cplusplus.
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>
>
> Anshum wrote:
>
>> How about getting the original token stream and then converting c++ to
>> cplusplus or anyothe
I've spend the last day working on a multipass order builder, where the
order is defined by a Collator and stored in an int-array. Compromising
a bit on the "minimal memory at all cost"-approach resulted in a fair
boost in speed, but it's still very slow for the first sorted search,
compared to Luc
MappingCharFilter can be used to convert c++ to cplusplus.
Koji
--
http://www.rondhuit.com/en/
Anshum wrote:
How about getting the original token stream and then converting c++ to
cplusplus or anyother such transform. Or perhaps you might look at
using/extending(in the non java sense) some ot
How about getting the original token stream and then converting c++ to
cplusplus or anyother such transform. Or perhaps you might look at
using/extending(in the non java sense) some other tokenized!
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everyb
By the way the same search + filter combination but with a sort on another
field (string) works. It seems only the float sort isn't working. The float
sort is working correctly in other conditions though.
I'm very puzzled !
- Mike
aka...@gmail.com
On Fri, Dec 11, 2009 at 2:52 AM, Michel Nadeau
Exactly.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Nigel [mailto:nigelspl...@gmail.com]
> Sent: Friday, December 11, 2009 2:56 AM
> To: java-user@lucene.apache.org
> Subject: Re: Index file compatib
16 matches
Mail list logo