Re: Multi language indexing

2007-05-07 Thread Doron Cohen
bhecht <[EMAIL PROTECTED]> wrote on 07/05/2007 10:26:27: > I have implemented my own analyzer for each country. > So as I see it, when I index these records, I want to > provide lucene, with a specific analyzer per record > i'm indexing. > > When a user performs a query in my JSF form, I will > us

Re: Keyphrase Extraction

2007-05-07 Thread Mark Miller
The only commercial options that I have seen do not have a web presence (that I know of or can find) and I don't recall the company names (only peripherally involved). Here is a web page where a guy does a nice writeup on a few options: http://dsanalytics.com/dsblog/the-start-of-the-art-in-key

Re: Porter 2 Stemming algorithim in java

2007-05-07 Thread Mark Miller
http://snowball.tartarus.org/ That is the Snowball page. There exists a Snowball version of the Porter2 Stemming algorithm. If you hunt around the download page you will find it. - Mark sandeep chawla wrote: Hi All. is there a implemention of Porter2 Stemming algorithim in java.. I dont w

Re: Questions regarding Lucene query syntax

2007-05-07 Thread Doron Cohen
> Is there a way to require a portion of a query only if there are values for > > > that field in the document? > > > e.g. If I know that I only want to match movies made between 1973 and > > > 1975, > > > I would like to be able to say in my query that if the document has a > > > year, > > > it mu

Re: Possible bug in SpanNearQuery

2007-05-07 Thread Moti Nisenson
Sure thing. I actually haven't taken a sufficiently close look at NearSpansOrdered (I was concentrating more on NearSpansUnordered, which has got next to no documentation). - Moti On 5/7/07, Paul Elschot <[EMAIL PROTECTED]> wrote: Moti, I have not yet looked into all the details of your comme

Re: Possible bug in SpanNearQuery

2007-05-07 Thread Paul Elschot
Moti, I have not yet looked into all the details of your comments, but I remember I had some trouble in trying to define the precise semantics of NearSpansOrdered. I'll have another look at being more precise for the overlaps. NearSpansUnordered is a specialisation of the previous NearSpans for t

Re: Language detection library

2007-05-07 Thread Bob Carpenter
Anyone knows of a good language detection library that can detect what language a document (text) is ? Language detection is easy. It's just a simple text classification problem. One way you can do this is using Lucene itself. Create a so-called pseudo-document for each language consisting

Re: Multi language indexing

2007-05-07 Thread bhecht
Sorry, I didn't understand I need to use the PerFieldanalyzerWrapper for this task, and tried to index the document twice. Sorry for the previous post. thanks for the great help. But if you already asked, I will be happy to explain what my goal is, and maybe see if i'm approaching this correctly

Re: Multi language indexing

2007-05-07 Thread karl wettin
7 maj 2007 kl. 15.45 skrev bhecht: OK, thanks, I think I got it. Just to see if I understood correctly: When I do the search on both stemmed and unstemmed fields, I will do the following: 1) If I know the country of the requested search - I will use the stemmed analyzer, and then the

Re: Multi language indexing

2007-05-07 Thread bhecht
OK, thanks, I think I got it. Just to see if I understood correctly: When I do the search on both stemmed and unstemmed fields, I will do the following: 1) If I know the country of the requested search - I will use the stemmed analyzer, and then the unstemmed field

Re: Multi language indexing

2007-05-07 Thread karl wettin
7 maj 2007 kl. 13.27 skrev bhecht: The last option seems to be the right one for me, using a stemmed and unstemmed field. I assume when you mean "unstemmed", you mean indexing the field using the UN_TOKENIZED parameter. No, I mean TOKENIZED, but not using a stemmer analyzer. -- karl

Scope-based crawling and indexing

2007-05-07 Thread Vikas
Hi All: Can I make nutch to crawl and create separate indices based on scope , where scope is determined from the querystring? For example: Let's assume that I'm having URL like: http://localhost/admin/orchindex/crawl.asp?lCrpID=0&lPrjID=609&lStrtID=3605&l then, lCrpId=0 is one scope lCorpi

Re: Multi language indexing

2007-05-07 Thread bhecht
OK, thanks for the reply. The last option seems to be the right one for me, using a stemmed and unstemmed field. I assume when you mean "unstemmed", you mean indexing the field using the UN_TOKENIZED parameter. Now my problem starts, when trying to implement this with "Hibernate Search", which al

Re: Multi language indexing

2007-05-07 Thread karl wettin
7 maj 2007 kl. 12.16 skrev bhecht: My question regarding "the way to go", was if it is a good solution to index a content of a table, using more than 1 analyzer, determining the analyzer by the country value of each record. I'm not sure what you mean, but I'll try. Do you ask if it makes

Re: Multi language indexing

2007-05-07 Thread bhecht
I know indexing and searching need to use the same analyzer. My question regarding "the way to go", was if it is a good solution to index a content of a table, using more than 1 analyzer, determining the analyzer by the country value of each record. Couldn't find a post that describes exactly my

Re: Multi language indexing

2007-05-07 Thread karl wettin
7 maj 2007 kl. 10.02 skrev bhecht: This means I index and search using the same analyzer. I was interested to know if this is the way to go? That would be the way to go (unless you are really sure what you're doing). -- karl --

Porter 2 Stemming algorithim in java

2007-05-07 Thread sandeep chawla
Hi All. is there a implemention of Porter2 Stemming algorithim in java.. I dont want to make a snowballfilter based on snowball English Stemmer. Thanks Sandeep -- SANDEEP CHAWLA House No- 23 10th main BTM 1st

Multi language indexing

2007-05-07 Thread bhecht
Hello all, I need to index a table containing company details (name, address, city ... country). Each record contains data written in the language appropriate to the records country. I was thinking of indexing each record using an analyzer according to the records country value. Then when searchi

Re: Possible bug in SpanNearQuery

2007-05-07 Thread Moti Nisenson
Paul, The comment should be moved up into SpanNearQuery itself (as opposed to the comments in the package private implementation classes). Still though, that comment is inaccurate (regarding overlap - only "exact" overlap is handled). Here are some additional tests for SpanNearQuery. They all fai