Re: surrogate pairs

2010-03-11 Thread Simon Willnauer
Hi Yuta, Are you looking for a specific analyzer like CJKANalyzer or do you look for tokenstreams like lowercaseTokenFilter etc. A fair bit of the token filters are already converted to support handle surrogate pairs correctly. If you need help to figure out how to use stuff from trunk I'm happy to

RE: File descriptor leak in ParallelReader.reopen()

2010-03-11 Thread Alexey Lef
I believe there is in fact a difference between 2.4.1 and 3.0.1 as far as leaking file descriptors is concerned. When sharing an IndexSearchere among multiple threads, instead of closing an old IndexSearcher/IndexReader, we used to follow the good old advice to "just drop it on the floor and let

Re: surrogate pairs

2010-03-11 Thread Yuta Kawadai
Thank you. Now I use own Analyzer which based on "MeCab" (It's open source Japanese morphological analyzer). I try to modify it to support surrogate pairs. And I'm expecting the next release! Yuta 2010/3/11 Robert Muir : > On Wed, Mar 10, 2010 at 6:52 PM, Yuta Kawadai wrote: >> Hi >> >> Can Lu

Re: surrogate pairs

2010-03-11 Thread Yuta Kawadai
I'm sorry for lack of talk. I try to treat the text which contains "surrogate pairs" in Lucene. So, I want to confirm whether Lucene(core part, Analyzer, TokenFilter and so on) can treat terms which contains "surrogate pairs" or not. Thanks, Yuta 2010/3/11 Erick Erickson : > Please describe the

Re: DisjunctionMaxQuery with tie breaker=1 same as MultiFieldQueryParser?

2010-03-11 Thread Chris Hostetter
: If I want to search let's say "ipod" in three different fields (device, : sound,technology) : Would be the same to use a DisjunctionMaxQuery with the tie braker = 1 than : to use a MultiFieldQueryParser with and OR to build the boolean queries? strictly speaking even with tie breaker of 1, a Di

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Michael McCandless
On Thu, Mar 11, 2010 at 4:10 PM, Peter Keegan wrote: > I want the TFC to do all the cool things it does like custom sorting, saving > the field values, max score, etc. I suppose the custom Collector could > explicitly delegate all TFC's methods, but this doesn't seem right. Right, that's what I w

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Yonik Seeley
On Thu, Mar 11, 2010 at 4:10 PM, Peter Keegan wrote: > I want the TFC to do all the cool things it does like custom sorting, saving > the field values, max score, etc. I suppose the custom Collector could > explicitly delegate all TFC's methods, but this doesn't seem right. No need to delegate th

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
I want the TFC to do all the cool things it does like custom sorting, saving the field values, max score, etc. I suppose the custom Collector could explicitly delegate all TFC's methods, but this doesn't seem right. Peter On Thu, Mar 11, 2010 at 3:40 PM, Peter Keegan wrote: > Yes, but none of th

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
Yes, but none of the other TFC methods would get called because none of the TFC classes can be extended. Or am I missing something? On Thu, Mar 11, 2010 at 3:37 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > In your collector, create the TFC and save it as tfc. > > Then in each of C

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Michael McCandless
In your collector, create the TFC and save it as tfc. Then in each of Collectors methods that you implement, do your own stuff (setting the bit) but also then call tfc.XXX (eg tfc.collect). That should work? Mike On Thu, Mar 11, 2010 at 2:57 PM, Peter Keegan wrote: > Yes. Could you give me a h

Re: Can 2.3 read indexes created by 2.4?

2010-03-11 Thread Michael McCandless
Urgh, I failed to update the opening in fileformats.html (describing what's changed on each version). We also had a change in 3.0, from removing compressed fields. I'll fix... But: 2.3 can't read indexes created with 2.4 (and in general older Lucene releases very likely will not be able to read

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
Yes. Could you give me a hint on how to delegate? On Thu, Mar 11, 2010 at 2:50 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Can you make your own collector and then just delegate internally to TFC? > > Mike > > On Thu, Mar 11, 2010 at 2:30 PM, Peter Keegan > wrote: > > Is it poss

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Michael McCandless
Can you make your own collector and then just delegate internally to TFC? Mike On Thu, Mar 11, 2010 at 2:30 PM, Peter Keegan wrote: > Is it possible to issue a single search that combines a TopFieldCollector > (MultiComparatorScoringMaxScoreCollector) with a custom Collector? The > custom Collec

Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
Is it possible to issue a single search that combines a TopFieldCollector (MultiComparatorScoringMaxScoreCollector) with a custom Collector? The custom Collector just collects the doc IDs into a BitSet (or DocIdSet). The collect() methods of the various TopFieldCollectors cannot be overridden. Tha

Can 2.3 read indexes created by 2.4?

2010-03-11 Thread Nathanael D. Jones
Lucene 2.4 introduced a change not documented on the File Formats page *LUCENE-510: The index now stores strings as true UTF-8 bytes (previously it was Java's modified UTF-8). If any text, either stored fields or a token, has illegal UTF-16 surrogate characters, these characters are now silently r

google's index layout, lucene on hbase(?)

2010-03-11 Thread Thomas Koch
Hi, are there any informations that leaked about google's index layout? How do they process my query that fast over such a vast number of documents? Somehow related: ( or http://tin

Re: Call for presentations - Berlin Buzzwords - Summer 2010

2010-03-11 Thread Isabel Drost
On 11.03.2010 Isabel Drost wrote: > Call for Presentations Berlin Buzzwords http://berlinbuzzwords.de > Berlin Buzzwords 2010 - Search, Store, Scale >7/8 June 2010 > > > This is to announce the Berlin Buzzwords 2010. The first co

Call for presentations - Berlin Buzzwords - Summer 2010

2010-03-11 Thread Isabel Drost
Call for Presentations Berlin Buzzwords http://buzzwordsberlin.de Berlin Buzzwords 2010 - Search, Store, Scale 7/8 June 2010 This is to announce the Berlin Buzzwords 2010. The first conference on scalable and open search, data process

DisjunctionMaxQuery with tie breaker=1 same as MultiFieldQueryParser?

2010-03-11 Thread Marc Sturlese
Hey there, If I want to search let's say "ipod" in three different fields (device, sound,technology) Would be the same to use a DisjunctionMaxQuery with the tie braker = 1 than to use a MultiFieldQueryParser with and OR to build the boolean queries? As far as I understood in the api documentation

Re: search on documents which DO NOT have field defined

2010-03-11 Thread Anshum
Hi, How about indexing a dummy token for empty docs? that way you may pick up all docs that are actually null/empty by querying for the dummy token. Make sure that the dummy token is never a part of any actual document (token stream). Perhaps this should work! -- Anshum Gupta Naukri Labs! http://

Re: surrogate pairs

2010-03-11 Thread Simon Willnauer
On Thu, Mar 11, 2010 at 2:28 AM, Robert Muir wrote: > On Wed, Mar 10, 2010 at 6:52 PM, Yuta Kawadai wrote: >> Hi >> >> Can Lucene use surrogate pairs (and its term positions or length) ? >> >> Thanks, >> Yuta > > Yes, just make sure you use an Analyzer that supports them... > unfortunately most o