AW: Does Lucene support partition-by-keyword indexing?

2008-03-01 Thread Uwe Goetzke
Hi, I do not yet fully understand what you want to achieve. You want to spread the index split by keywords to reduce the time to distribute indexes? And you want the distribute queries to the nodes based on the same split mechanism? You have several nodes with different kind of documents. Y

Re: Does Lucene support partition-by-keyword indexing?

2008-03-01 Thread 仇寅
Hi, I agree with your point that it is easier to partition index by document. But the partition-by-keyword approach has much greater scalability over the partition-by-document approach. Each query involves communicating with constant number of nodes; while partition-by-doc requires spreading the q

Re: Does Lucene support partition-by-keyword indexing?

2008-03-01 Thread Mathieu Lecarme
The easiest way is to split index by Document. In Lucene, index contains Document and inverse index of Term. If you wont to put Term in different place, Document will be duplicated on each index, with only a part of their Term. How will you manage node failure in your network? They were so

Re: Lucene for Sentiment Analysis

2008-03-01 Thread Aaron Schon
Thanks to those who responded. I was wondering if taking a bag of words approach might work. For example chunking the sentences to be analyzed and running a Lucene query against an index storing sentiment polarity. Has anyone had success with this approach? I do not need a super accurate system,

searching for "Nothing"

2008-03-01 Thread Ghinwa Choueiter
Hi, I am trying to do a search as follows (this is a very simplified example): I want to search for: (1) the little boy or (2) one little boy or (3) little boy Can I write the query as: "the OR one OR "" " AND "little" AND "boy" note that what I mean by "" is "Nothing". thank you, -Ghinwa PS

Re: Corrupted Indexes Under Lucene 2.3 (and 2.3.1)

2008-03-01 Thread Tyler V
Thanks for the reply Yonik. Our workflow is as follows: We build a very large document and put the document on a queue to be added to our "complete" index. This queue is serviced by a separate thread, which actually adds the document to the "complete" index. Once the document has been placed on

RE: Rebuilding Document from index?

2008-03-01 Thread Itamar Syn-Hershko
This is exactly where Hebrew is different from all Latin languages. I did think about the approach you mentioned, of having 2 fields - one is stemmed and the other is not - but even with it the search will be performed on the non-stemmed field by default. The stemmed field will only be searched up

Does Lucene support partition-by-keyword indexing?

2008-03-01 Thread Yin Qiu
Hi, I'm planning to implement a search infrastructure on a P2P overlay. To achieve this, I want to first distribute the indices to various nodes connected by this overlay. My approach is to partition the indices by keyword, that is, one node takes care of certain keywords (or terms). When a simple

Re: Lucene for Sentiment Analysis

2008-03-01 Thread Vivek Balaraman
We've been working on Sentiment Analysis. We use GATE and Wordnet for the lexical / semantic analysis and J Free Charts for the visualization. The domain is reviews on retail banking and in general our accuracy is around 75% and recall around 25% We tried out lingpipe as well which also gave good

Re: SOC: Lulu, a Lua implementation of Lucene

2008-03-01 Thread Michael McCandless
Marvin Humphrey wrote: I haven't been feeding back my KinoSearch commits into the Lucy repository, true. Not much has changed status-wise since this: (Link to mail-archives.apache.org). I miss Dave :( but work continues apace. I just figured I wouldn't bring anyt

Re: SOC: Lulu, a Lua implementation of Lucene

2008-03-01 Thread Michael McCandless
I am looking forward to a pure Lua version of Lucene. Lua looks like a neat and amazingly compact programming language. I especially like Lua's "table" container type which is both a list and a hash/dictionary. One issue to watch out for is LUCENE-510, which has been an issue with other

Re: feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-01 Thread Michael McCandless
That is a nice result -- thanks for reporting this Uwe! Mike On Mar 1, 2008, at 3:45 AM, Uwe Goetzke wrote: This week I switched the lucene library version on one customer system. The indexing speed went down from 46m32s to 16m20s for the complete task including optimisation. Great Job!

Re: Corrupted Indexes Under Lucene 2.3 (and 2.3.1)

2008-03-01 Thread Michael McCandless
Note that there are actually two concurrency issues to guard against here: * Document itself cannot be changed (fields added or removed) from multiple threads without external synchronization. * Document cannot be changed from one thread while another thread is calling writer.addDoc

feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-01 Thread Uwe Goetzke
This week I switched the lucene library version on one customer system. The indexing speed went down from 46m32s to 16m20s for the complete task including optimisation. Great Job! We index product catalogs from several suppliers, in this case around 56.000 product groups and 360.000 products inclu

Re: SOC: Lulu, a Lua implementation of Lucene

2008-03-01 Thread Petite Abeille
Hi Marvin, On Mar 1, 2008, at 2:33 AM, Marvin Humphrey wrote: How fast is Lua's method dispatch, compared to Java's? Fast enough. http://luajit.org/luajit_performance.html That has a huge impact on performance, since *everything* is a method in Lucene -- down to writeByte(). The plan i