Re: Bet you didn't know Lucene can...

2011-10-31 Thread Andrzej Bialecki
On 31/10/2011 21:42, Petite Abeille wrote: On Oct 31, 2011, at 9:32 PM, Andrzej Bialecki wrote: similarity-preserving hash function was calculated on each sentence, and the hash was added as a field. The property of the hash was that similar documents (sentences) would produce a similar hash

Re: Bet you didn't know Lucene can...

2011-10-31 Thread Petite Abeille
On Oct 31, 2011, at 9:32 PM, Andrzej Bialecki wrote: > similarity-preserving hash function was calculated on each sentence, and the > hash was added as a field. The property of the hash was that similar > documents (sentences) would produce a similar hash, with only some bit-level > perturbati

Re: Bet you didn't know Lucene can...

2011-10-31 Thread Andrzej Bialecki
On 22/10/2011 11:11, Grant Ingersoll wrote: Hi All, I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." (http://na11.apachecon.com/talks/18396). It's based on my observation, that over the years, a number of us in the community have done some

Re: Bet you didn't know Lucene can...

2011-10-26 Thread Dawid Weiss
m also using public domain Wikipedia data so can release the code and data > somewhere if that's of interest. > > Cheers > Mark > > > > - Original Message - > From: Dawid Weiss > To: java-user@lucene.apache.org > Cc: > Sent: Tuesday, 25 October 2011,

Re: Bet you didn't know Lucene can...

2011-10-26 Thread mark harwood
pache.org Cc: Sent: Tuesday, 25 October 2011, 23:17 Subject: Re: Bet you didn't know Lucene can... > Lucene started out at an avg 3ms but subsequent runs took it down > dramatically due to OS file caching. The all-in-memory hashset implementation > clearly did not demonstrate th

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Dawid Weiss
> Lucene started out at an avg 3ms but subsequent runs took it down > dramatically due to OS file caching. The all-in-memory hashset implementation > clearly did not demonstrate the same speed ups between runs. I don't say the benchmark was wrong or anything, but this is surprising. I mean, the

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Mark Harwood
ashSet tests are loaded entirely from file (hence the long start-up >>> time) and are not a scalable solution because of RAM costs. >>> MySQL requires an inter-process call as it was not embedded but even using >>> a remoted Lucene call I get significantly better performance (avg 0.5ms &g

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Dawid Weiss
uires an inter-process call as it was not  embedded but even using >> a remoted Lucene call I get significantly better performance (avg 0.5ms >> lookup vs MySQL 10ms) >> >> >> Cheers >> Mark >> >> >> >> - Original Message - >> From: Gran

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Grant Ingersoll
l as it was not embedded but even using a > remoted Lucene call I get significantly better performance (avg 0.5ms lookup > vs MySQL 10ms) > > > Cheers > Mark > > > > ----- Original Message ----- > From: Grant Ingersoll > To: java-user@lucene.apache.org > C

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Erik Hatcher
here: http://www.juxtasoftware.org/ Erik On Oct 22, 2011, at 05:11 , Grant Ingersoll wrote: > Hi All, > > I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." > (http://na11.apachecon.com/talks/18396). It's based on my observation

Re: Bet you didn't know Lucene can...

2011-10-25 Thread mark harwood
cene call I get significantly better performance (avg 0.5ms lookup vs MySQL 10ms)   Cheers Mark - Original Message - From: Grant Ingersoll To: java-user@lucene.apache.org Cc: Sent: Saturday, 22 October 2011, 10:11 Subject: Bet you didn't know Lucene can... Hi All, I'm giv

Re: Bet you didn't know Lucene can...

2011-10-23 Thread Dawid Weiss
Hi Grant, In Carrot2 (and Carrot Search's commercial products) we're not using Lucene as an indexing/ search service directly, but we are re-using a lot of internal infrastructure (like analyzers, ported snowball stemmers and other segmentation stuff). We also plan on using the new language identi

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Shashi Kant
Using Lucene as a recommendation engine. On Sat, Oct 22, 2011 at 6:33 PM, Grant Ingersoll wrote: > > On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote: > >> Hi Grant, >> >> Not sure if this qualifies as a "bet you didn't know", but one could use >> Lucene term vectors to construct document vectors for

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote: > Hi Grant, > > Not sure if this qualifies as a "bet you didn't know", but one could use > Lucene term vectors to construct document vectors for similarity, > clustering and classification tasks. I found this out recently (although > I am probably no

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Wouter Heijke
vidual (product) pages. A deeper URL results in a Lucene BooleanQuery with more clauses. Hope this is enough (ab)use... Wouter > Hi All, > > I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." > (http://na11.apachecon.com/talks/18396). It'

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Sujit Pal
be quite useful. -sujit On Sat, 2011-10-22 at 11:11 +0200, Grant Ingersoll wrote: > Hi All, > > I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." > (http://na11.apachecon.com/talks/18396). It's based on my observation, that > over th

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Paul Libbrecht
Tell me if you need more details, I am sure the pure storage option is something very common. paul Le 22 oct. 2011 à 11:11, Grant Ingersoll a écrit : > Hi All, > > I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." > (http://na11.apa

Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll
Hi All, I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." (http://na11.apachecon.com/talks/18396). It's based on my observation, that over the years, a number of us in the community have done some pretty cool things using Lucene that don't