Re: Overriding Lucene's term weights computation

2010-06-23 Thread Naama Kraus
ok, thanks Yuval. I'll take a look. Could you (or anyone) please elaborate why payloads "seem like a worse fit" ? TX, Naama On Wed, Jun 23, 2010 at 11:00 PM, Yuval Feinstein wrote: > Naama, Maybe you could use the new flexible indexing mechanism. > Some information is in this lecture: > > http:/

Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Otis Gospodnetic
Coincidentally, just after I replied to this thread I received an email from one of our customers. In that email was a quote from one of the commercial search vendors. My jaw didn't drop because I've seen similar numbers from other commercial search vendors before, but I won't mention the

RE: Help with Numeric Range

2010-06-23 Thread Uwe Schindler
Are you sure that the term enum return the terms in correct order? For all types of RangeQueries, the term enumeration has to be correctly sorted as specified in the docs, if this is not correct, the enumeration may be incomplete. It’s a good thing to turn on assertions for the lucene package, a

Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Otis Gospodnetic
I won't comment on Attivio, as I think I might have signed some NDA with them. But they do claim to combine full-text search with DB-like joins. Can't MarkLogic do that, too? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.co

Re: Problems with homebrew ParallelWriter

2010-06-23 Thread Shai Erera
How do you add documents to the index? Is it synchronized (such that basically only one thread can add documents at a time)? The same goes for removing documents as well. Also, did you encounter any exceptions during the run - if say an addDoc fails on one of the slices, then you need to revert th

RE: Help with Numeric Range

2010-06-23 Thread Todd Nine
Hi Uwe, Thank you for your help, it is greatly appreciated. Unfortunately, my tests all fail except for RangeInclusive. I've changed the step to be 6 as per your recommendation. I had it at max to eliminate step precision as the cause of the test failure. Essentially, all keys in Cassandra a

Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Erick Erickson
Otis's comments reminded me of one of the astonishing things I've seen in the Lucene/SOLR ecosystem; I've seen issues reported, commented on, fixed, and patches made available *for free* in a matter of hours. Of course, you have to be willing to use a patched version, but it sure beats waiting six

Problems with homebrew ParallelWriter

2010-06-23 Thread Justin
Hi all, We've been waiting for LUCENE-1879 and LUCENE-2425 and have written our own ParallelWriter class in the meantime. Apparently our indexes are falling out of sync (I suspect my colleague is seeing error messages come from ParallelReader stating the the number of documents must be the sam

Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread jm
yes, in my case the competition is one of the list... On Wed, Jun 23, 2010 at 11:41 PM, Otis Gospodnetic wrote: > Off the top of my head: > > FAST > Endeca > Coveo > Attivio > Vivisimo > Google Search Appliance > (tell me when to stop) > Dieselpoint > IBM OmniFind > Exalead > Autonomy > dtSearch

RE: arguments in favour of lucene over commercial competition

2010-06-23 Thread Itamar Syn-Hershko
Otis, I'm 99% sure Attivio is just a wrapper arround Lucene... And I personally wouldn't count full text search solutions such as Oracle's. Itamar. > -Original Message- > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > Sent: Thursday, June 24, 2010 12:42 AM > To: java-user@

Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Otis Gospodnetic
Off the top of my head: FAST Endeca Coveo Attivio Vivisimo Google Search Appliance (tell me when to stop) Dieselpoint IBM OmniFind Exalead Autonomy dtSearch ISYS Oracle ... ... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com

Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Hans Merkl
Just curious. What commercial alternatives are out there? On Wed, Jun 23, 2010 at 04:01, jm wrote: > Hi, > > I am trying to compile some arguments in favour of lucene as > management is deciding weather to standardize on lucene or a competing > commercial product (we have a couple of produc, one

Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread jm
thanks guys, those links are cool. I welcome any other positive thing anyone can add. Specially references of products/sites moving to lucene/solr javier On Wed, Jun 23, 2010 at 10:49 PM, Otis Gospodnetic wrote: > Lucene/Solr choice typically means: > > * lower cost of ownership (think about var

Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Otis Gospodnetic
Lucene/Solr choice typically means: * lower cost of ownership (think about various crazy licensing models some of the commercial search vendors have: per doc, per server, per query, per year) * faster implementation (just think about the duration of the sales/negotiation phase for commerci

RE: Overriding Lucene's term weights computation

2010-06-23 Thread Yuval Feinstein
Naama, Maybe you could use the new flexible indexing mechanism. Some information is in this lecture: http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf Alternatively, you may use payloads, but they seem like a worse fit. Good Luck, Yuval _

RE: URL Tokenization

2010-06-23 Thread Steven A Rowe
Hi Sudha, There is such a tokenizer, named NewStandardTokenizer, in the most recent patch on the following JIRA issue: https://issues.apache.org/jira/browse/LUCENE-2167 It keeps (HTTP(S), FTP, and FILE) URLs together as single tokens, and e-mails too, in accordance with the relevant IETF R

URL Tokenization

2010-06-23 Thread Sudha Verma
Hi, I am new to lucene and I am using Lucene 3.0.2. I am using Lucene to parse text which may contain URLs. I noticed the StandardTokenizer keeps the email addresses in one token, but not the URLs. I also looked at Solr wiki pages, and even though the wiki page for solr.StandardTokenizerFactory s

Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Erick Erickson
One thing to consider is that you have access to the source, so worst-case you won't be cut off at the knees by the commercial vendor. Case in point: Fast was acquired by Microsoft, who have since dropped all future Unix development. Hope all Fast users really like running their apps on Windows se

Re: Stop words filter

2010-06-23 Thread Erick Erickson
On the chance that this is an XY problem (http://people.apache.org/~hossman/#xyproblem), why can't you use StopFilter and PorterStemFilter in your filter chain rather than try to do this yourself? Best Erick On Tue, Jun 22, 2010 at 10:49 PM, Vinicius Carvalho < viniciusccarva...@gmail.com> wrote:

Overriding Lucene's term weights computation

2010-06-23 Thread Naama Kraus
Hi, Is there a way for an application to index a document along with its "term weighted vector" (Lucene's TermFreqVector). I.e., override the term frequencies computed by Lucene, with an application's computed term weights (non frequency based) ? I don't think I want to use Scorer#score() for appl

arguments in favour of lucene over commercial competition

2010-06-23 Thread jm
Hi, I am trying to compile some arguments in favour of lucene as management is deciding weather to standardize on lucene or a competing commercial product (we have a couple of produc, one using lucene, another using commercial product, imagine what am i using). I searched the lists but could not f