Re: Measuring document similarity

2012-03-12 Thread Koji Sekiguchi
(12/03/13 2:38), Hassane Cabir wrote: Hi guys, I'm using Lucene for my project and I need to calcule how similar two (or more) documents are, using TFIDF. How to get TFIDF with lucene? Any insights on this? Solr has TermVectorComponent which can return tf, df and tf-idf of each term in a docu

Re: TO Mike McCandless : ToParentBlockJoinQuery inconsistent return

2012-03-12 Thread Michael McCandless
Hi, Actually, this is a hard requirement for BlockJoinQuery: the parent document must always be last in the doc block; the package.html describes this I think? Mike McCandless http://blog.mikemccandless.com On Mon, Mar 12, 2012 at 12:57 PM, Jean-Marc MORAS wrote: > Dear > > Bravo for your work

Measuring document similarity

2012-03-12 Thread Hassane Cabir
Hi guys, I'm using Lucene for my project and I need to calcule how similar two (or more) documents are, using TFIDF. How to get TFIDF with lucene? Any insights on this? Thank you for your support . -- Hassane

Re: Preserving TokenFilters

2012-03-12 Thread Brandon Mintern
Everything that we've read seems to indicate that heavy Lucene users inevitably write their own Filter streams. We just did this ourselves a month or two ago, and it really wasn't too bad. Just make sure that you reference the latest Lucene release when you're writing your own filter. There's a spl

Preserving TokenFilters

2012-03-12 Thread Alan Woodward
Hello, I have a number of operations that I want to apply to a TokenStream, supplementing the original tokens with modified forms. For example, I want to reverse tokens, to allow prefix wildcard queries, and I want to index both lowercased and original terms. I initially tried to wrap Reverse

RE: MemoryIndex "field must not be added more than once"

2012-03-12 Thread Dave Seltzer
Thanks for your help Uwe. I've created an issue: https://issues.apache.org/jira/browse/LUCENE-3865 -Dave -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Wednesday, March 07, 2012 3:39 PM To: java-user@lucene.apache.org Subject: RE: MemoryIndex "field must not be add