Start/end offsets in analyzers

2007-03-27 Thread Antony Bowesman
I'm fiddling with custom anaylyzers to analyze email addresses to store the full email address and the component parts. It's based on Solr's analyzer framework, so I have a StandardTokenizerFactory followed by a EmailFilterFactory. It produces Analyzing "<[EMAIL PROTECTED]>" 1: [EMAIL PROTEC

RE: Reverse search

2007-03-27 Thread Melanie Langlois
Thanks, make sense. Just another question about the memoryIndex. In your example you said I can do memoryIndex. getReader().terms(); but in fact there is no public access to the reader from memory index... If this is not possible, I will list the docs terms while I'm indexing. Mélanie -O

Re: Reverse search

2007-03-27 Thread markharw00d
>>I just want to make sure there is no API either No, but your code looks like it should do the job. That code can be improved by something like [psuedo code]: query.extractTerms(terms); if(query instanceof PhraseQuery) { //find and index rarest term only using an existing index

RE: Reverse search

2007-03-27 Thread Melanie Langlois
Mark, When, I extract the terms from my query, I can not use add them directly? I have to do something like: Set terms=new HashSet(); query.extractTerms(terms); Document doc=new Document(); for(Term term:terms){ doc.add(new Field(term.field(),term.text(),Field.Store.NO,Field.Index.TOKENIZED); }

Re: IndexWriter.deleteDocuments(term) deletes everything

2007-03-27 Thread Roger Keays
Roger Keays wrote: Hi there, I'm trying to delete a single document by using its uuid field: uuid = new Term("uuid", item.getUuid().toString()); writer.deleteDocuments(uuid); writer.close(); However, it appears that this operation is deleting *every* document, whether the uuid mat

Re: IndexWriter.deleteDocuments(term) deletes everything

2007-03-27 Thread Doron Cohen
Hi Roger, The method usage seems correct to me. Are you saying that search with TermQuery(Term("uuid","76")) returns only one of many existing documents, but deleteDocuments(Term("uuid","76")) deletes all docs? (also docs not returned by the search for this term?) Could you send here a small progr

IndexWriter.deleteDocuments(term) deletes everything

2007-03-27 Thread Roger Keays
Hi there, I'm trying to delete a single document by using its uuid field: uuid = new Term("uuid", item.getUuid().toString()); writer.deleteDocuments(uuid); writer.close(); However, it appears that this operation is deleting *every* document, whether the uuid matches or not. The uui

Re: why Apache doesnt create a nice forum like the others???

2007-03-27 Thread Erick Erickson
I haven't had to do anything. All the replies I do just magically get to the correct list Not helpful I know, but I'm lazy .. Erick On 3/27/07, Lukas Vlcek <[EMAIL PROTECTED]> wrote: Eric, How do you manage Reply-to: field in your gmail? I always have to change Reply-to field in Setting (

Re: why Apache doesnt create a nice forum like the others???

2007-03-27 Thread Lukas Vlcek
Eric, How do you manage Reply-to: field in your gmail? I always have to change Reply-to field in Setting (which requires more then three clicks!) and since this is a manual (and tedious) process it can introduce mistakes (mis-addressed addresses). The problem is that I am signed up to more mail-l

Re: Contextual text-link ads

2007-03-27 Thread Doron Cohen
Assuming you don't mean UI design - how about a small auxiliary sponsor index containing sponsor data - doc per sponsor, sponsor text and sponsor url as stored fields, sponsor doc statically boosted by sponsor's $importance$, and highlighting of user query words in the excerpt from suggested sponso

Contextual text-link ads

2007-03-27 Thread Peter W.
Howdy, Does anyone have any design considerations for implementing a contextual text-link advertising system using Lucene? The emphasis would be strictly on monetizing search results with light, non-intrusive behavior (query terms match sponsored results). Thanks, Peter W. --

Re: Synonyms and Aliases query

2007-03-27 Thread Erick Erickson
See below... On 3/27/07, daveburns <[EMAIL PROTECTED]> wrote: Hi, afriad I'm a noobie at Luncene but read Otis/Eriks book and was hoping someone can answer a quick question on the AliasAnalyzer (Chap 4). I want to build a search for names (Companies/surname, firstname etc) but need to match th

Re: Custom Analyzer Help please

2007-03-27 Thread Grant Ingersoll
Hi Tim, From the StandardAnalyzer code, the TokenStream looks like: /** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL PROTECTED] StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL PROTECTED] StopFilter}. */ public TokenStream tokenStream(String fi

Re: PorterStemFilter

2007-03-27 Thread Yonik Seeley
On 3/27/07, sandeep chawla <[EMAIL PROTECTED]> wrote: Well in any case.. is there a implemention of Porter2 Stemming algorithim in java.. I dont want to make a snowballfilter based on snowball English Stemmer. You mean you don't want to use the snowball lucene-contrib package ? Why not? -Y

Re: Synonyms and Aliases query

2007-03-27 Thread daveburns
Thanks for the quick reply I'm using the synonym engine from LIA for both parsing queries and building the index. Do you have the code for a synonym engine that would work for all matches. I'm using ver 2.1 of lucene core. Thanks again Dave -- View this message in context: http://www.nabbl

RE: How can I use SortComparator in my case?

2007-03-27 Thread Ramana Jelda
Actually I don't like well my proposed way of implementation. I wanna play with score to implement the similar logic as I mentioned in my solution. But how? Any suggestions, I would really appreciate. :) Jelda > -Original Message- > From: Ramana Jelda [mailto:[EMAIL PROTECTED] > Sent: T

Re: Synonyms and Aliases query

2007-03-27 Thread sandeep chawla
in a synonym Engine... suppose synonyms of word x is syn(x)... then if y = syn(x) then x = syn(y) doesn't hold true always .( you might not get any synonyms of y..it depends on the data of synonym engine) so your synonym engine might be providing alias of bob as robert, rob, bobby...

Re: why Apache doesnt create a nice forum like the others???

2007-03-27 Thread Erick Erickson
Gmail has been good to me for this list... Erick On 3/27/07, karl wettin <[EMAIL PROTECTED]> wrote: 27 mar 2007 kl. 08.28 skrev Mohammad Norouzi: > Karl, > Maybe I am out of date! > do you mean with Nabble I can access this mailing list? Yes. -- karl > > On 3/27/07, karl wettin <[EMAIL P

Synonyms and Aliases query

2007-03-27 Thread daveburns
Hi, afriad I'm a noobie at Luncene but read Otis/Eriks book and was hoping someone can answer a quick question on the AliasAnalyzer (Chap 4). I want to build a search for names (Companies/surname, firstname etc) but need to match thing s like Robert= bob, bobby, rob etc (or margaret=peggy etc).

Re: how to search over another search

2007-03-27 Thread Mohammad Norouzi
sorry I cant comprehend, so why we should use two separate index? we can merge it in one index file? On 3/27/07, Steven Rowe <[EMAIL PROTECTED]> wrote: Mohammad Norouzi wrote: > Steven, > what this means: > "Each index added must have the same number of documents, but > typically each contains

Re: how to search over another search

2007-03-27 Thread Steven Rowe
Mohammad Norouzi wrote: > Steven, > what this means: > "Each index added must have the same number of documents, but > typically each contains different fields. Each document contains the > union of the fields of all documents with the same document number. > When searching, matches for a query ter

Re: PorterStemFilter

2007-03-27 Thread sandeep chawla
Well in any case.. is there a implemention of Porter2 Stemming algorithim in java.. I dont want to make a snowballfilter based on snowball English Stemmer. On 27/03/07, thomas arni <[EMAIL PROTECTED]> wrote: Write your own analyzer, which calls the appropriate Filter in the method "tokenStre

Re: PorterStemFilter

2007-03-27 Thread thomas arni
Write your own analyzer, which calls the appropriate Filter in the method "tokenStream". In the method "tokenStream" you can define, how the input should be analyzed and parsed. Your analyzer must extend the abstract class Analyzer. The easiest way is to create a new class (Analyzer), which

PorterStemFilter

2007-03-27 Thread sandeep.chawla
Hi, Lucene provides a PorterStemFilter which uses PorterStemmer. Is there any way I can use a PorterStemFilter ( by extending it or something) which uses porter2 stemming algorithm not the original porter algorithm. I know , this is possible using snowball filter but for some reason I d

RE: How can I use SortComparator in my case?

2007-03-27 Thread Ramana Jelda
Thanks for all your help. Here I am coming with the best solution I can see and I am planning to implement this. Suppose 20 unique customers && 90,000 results found && to be returned offset results 0-20 I can think of only following solution.. //Hope pseudo code is self understandable.. Public

Re: how to search over another search

2007-03-27 Thread Mohammad Norouzi
Steven, what this means: "Each index added must have the same number of documents, but typically each contains different fields. Each document contains the union of the fields of all documents with the same document number. When searching, matches for a query term are from the first index added th