Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread bhecht
ion. > > You're right that asking for a non-phrase search *will* work > though. > > Best > Erick > > On 5/21/07, bhecht <[EMAIL PROTECTED]> wrote: >> >> >> I will never have "mainstrasse" in my lucene index, since strasse is >&g

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread bhecht
e in the lucene index I have "schöne main strasse". Daniel Naber-5 wrote: > > On Monday 21 May 2007 22:53, bhecht wrote: > >> If someone searches for mainstrasse, my tools will split it again to >> main and strasse, and then lucene will be able to find it. > > &

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread bhecht
el Naber-5 wrote: > > On Monday 21 May 2007 22:05, bhecht wrote: > >> Is there any point for me to start creating custom analyzers with filter >> for stop words, synonyms, and implementing my own "sub string" filter, >> for separating tokens into "sub words&q

stop words, synonyms... what's in it for me?

2007-05-21 Thread bhecht
Hi there, I have started using Lucene not long ago, with plans to replace my current sql queries in my application with it. As I wasn't aware of Lucene before, I have implemented some similar tools (filters) as Lucene includes. For example I have implemented a "stop word" tool. In my case I have

Implement a tokenizer

2007-05-21 Thread bhecht
Hi there, I was interested in changing the StandardTokenzier so it will not remove the "+" (plus) sign from my stream. Looking in the code and documentation, it reads: "If this tokenizer does not suit your application, please consider copying this source code directory to your project and maint

Re: How to Update the Index once it is created

2007-05-21 Thread bhecht
If you are using Orcale and Lucene, check out http://www.hibernate.org/410.html "Hibernate Search" , this will automaticly update your lucene index, on any change to your database table Erick Erickson wrote: > > You have to delete the old document and add it a new one. > > See IndexModifier c

What is the best way to split substring words

2007-05-19 Thread bhecht
Hi there, I want to be able to split tokens by giving a list of substring words. So I can give a list f subwords like: "strasse", "gasse", And the token "mainstrasse" or "maingasse" will be split to 2 tokens "main" and "strasse". Thanks -- View this message in context: http://www.nabble.com/

Re: FuzzyLikeThisQuery what does maxNumTerms mean

2007-05-09 Thread bhecht
Thanks again. I wasn't aware of the problematic part with updating posts. Sorry for that, and thanks for the answer. Good day. -- View this message in context: http://www.nabble.com/FuzzyLikeThisQuery-what-does-maxNumTerms-mean-tf3716547.html#a10407540 Sent from the Lucene - Java Users mailing l

Re: FuzzyLikeThisQuery what does maxNumTerms mean

2007-05-09 Thread bhecht
Thanks Mark, I have updated my previous post I guess, before you had a chance to read it. Can you please re-read my post again? Thanks -- View this message in context: http://www.nabble.com/FuzzyLikeThisQuery-what-does-maxNumTerms-mean-tf3716547.html#a10402974 Sent from the Lucene - Java Users

Re: FuzzyLikeThisQuery what does maxNumTerms mean

2007-05-09 Thread bhecht
Thanks Mark for the detailed explanation. So one more question if I may: How is the list shortened to to include terms only? In your example you had 2 stop words which of course are not included in the token stream. But what happens if you get more than maxNumTerms terms, how are the maxNumTerms

FuzzyLikeThisQuery what does maxNumTerms mean

2007-05-09 Thread bhecht
Hello all, I am new to lucene and want to use the FuzzyLikeThisQuery. I have read the documentation for this class, and read the following for what maxNumTerms means: "maxNumTerms - The total number of terms clauses that will appear once rewritten as a BooleanQuery". In addition in the classes d

Re: Multi language indexing

2007-05-08 Thread bhecht
Hi Doron, Thank you very much for your time and for the detailed explanations. This is exactly what I meant and I am happy to see I understood correctly. I am now using the Snowball which seems to work very good. Thanks again and good day, Barak Hecht. -- View this message in context: htt

Re: Multi language indexing

2007-05-07 Thread bhecht
Sorry, I didn't understand I need to use the PerFieldanalyzerWrapper for this task, and tried to index the document twice. Sorry for the previous post. thanks for the great help. But if you already asked, I will be happy to explain what my goal is, and maybe see if i'm approaching this correctly

Re: Multi language indexing

2007-05-07 Thread bhecht
OK, thanks, I think I got it. Just to see if I understood correctly: When I do the search on both stemmed and unstemmed fields, I will do the following: 1) If I know the country of the requested search - I will use the stemmed analyzer, and then the unstemmed field

Re: Multi language indexing

2007-05-07 Thread bhecht
OK, thanks for the reply. The last option seems to be the right one for me, using a stemmed and unstemmed field. I assume when you mean "unstemmed", you mean indexing the field using the UN_TOKENIZED parameter. Now my problem starts, when trying to implement this with "Hibernate Search", which al

Re: Multi language indexing

2007-05-07 Thread bhecht
I know indexing and searching need to use the same analyzer. My question regarding "the way to go", was if it is a good solution to index a content of a table, using more than 1 analyzer, determining the analyzer by the country value of each record. Couldn't find a post that describes exactly my

Multi language indexing

2007-05-07 Thread bhecht
Hello all, I need to index a table containing company details (name, address, city ... country). Each record contains data written in the language appropriate to the records country. I was thinking of indexing each record using an analyzer according to the records country value. Then when searchi