Re: Searching a untokenized field using SnowballAnalyzer

2006-08-22 Thread Lorenzo Di Gaetano
need to be tokenized. You solved my problem!!! Thank you all very much! Regards, Lorenzo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Searching a untokenized field using SnowballAnalyzer

2006-08-21 Thread Lorenzo Di Gaetano
ltiple field values at once. Please help me! Thank you in advance. Lorenzo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Reuters

2006-04-21 Thread Lorenzo Viscanti
think you should consider converting the files to XML. Otherwise post a sketch of your indexing code to get some help. Lorenzo On 4/21/06, Malcolm Clark <[EMAIL PROTECTED]> wrote: > > Hi all, > I didn't know whether to add this to the thread asking about TREC indexing > or st

Indexing with SnowballAnalyzer and multiple languages in a single index

2006-04-20 Thread Lorenzo Di Gaetano
age (in different directories)? Thank you in advance. Lorenzo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene + LSI

2005-11-30 Thread Lorenzo Viscanti
on a specific Similarity implementation?), I have to tell you that It wouldn't be so easy to integrate LSI into the Lucene's APIs. Lorenzo On 12/1/05, Chandana <[EMAIL PROTECTED]> wrote: > > Have any one implemented LSI in Lucene? > Kindly let me know how hard/easy it is. > > thanks > chandana > >

Re: Search clustering question

2005-11-24 Thread Lorenzo Viscanti
Clustering is an intensive task. Carrot2 is an excellent framework that clusters documents and even labels them, and it takes a few seconds (up to two seconds) to cluster 100 search results snippets. If you are going to cluster entire documents you'll have to wait longer. Lorenzo On 11/

Re: Regarding Lucene and LSI

2005-10-07 Thread Lorenzo Viscanti
ocuments. But how to choose the initial subset? Maybe just searching the index and then using the first n documents retrieved. Any idea? Lorenzo On 10/7/05, Paul Libbrecht <[EMAIL PROTECTED]> wrote: > > > I've met other persons with such needs and we would also be interested.

StandardTokenizer

2005-09-27 Thread Lorenzo Viscanti
ammar. Can someone help me with the right syntax? Thanks, Lorenzo

Re: Lucene search clusters

2005-06-08 Thread Lorenzo
, I'm just trying to offer a simpler (much simpler!) clustering opportunity for lucene users. Hope I can get some good advices from you! ciao, Lorenzo On 6/8/05, Dawid Weiss <[EMAIL PROTECTED]> wrote: > > > You should state your requirements clearly: > > 1. What d

Re: Lucene search clusters

2005-06-08 Thread Lorenzo
ome tests. bye Lorenzo

Re: Lucene search clusters

2005-06-08 Thread Lorenzo
Daniel, could you explain to me why you are using em clustering? Is there any best field or case for that technique? I don't have any em experience and would like to know something about that (just studying some papers...) Thanks, Lorenzo

Re: Lucene search clusters

2005-06-07 Thread Lorenzo
k, based on a tf idf analysis (not thinking of k-means or EM 'til now). The most interesting problem is creating the architecture for such a system, being general purpose but also very efficient. Thanks, Lorenzo On 6/8/05, Daniel Stephan <[EMAIL PROTECTED]> wrote: > > I am curr

Re: Lucene search clusters

2005-06-07 Thread Lorenzo
SoC if we find some other people interested in it. Lorenzo On 6/7/05, Lorenzo <[EMAIL PROTECTED]> wrote: > > I'm writing this message trying to find some people interested in creating > a 'general purpose' lucene search results' clustering extension. > I w

Lucene search clusters

2005-06-07 Thread Lorenzo
ring implementation. I know that maybe each project need a different implementation but that would be a useful basis for everyone to develop his own project. Is anyone interested in it? Lorenzo

Re: Difference between minMergeDocs and mergeFactor

2005-05-09 Thread Lorenzo Viscanti
; in addition to that mergeFactor sets at the same time the maximum number of segments that will be present in the index. When the number is equal to mergeFactor Lucene tries to merge all the segments into a new one. Ciao, Lorenzo On 5/8/05, Barbara Krausz <[EMAIL PROTECTED]> wrote: &