compare paragraphs of text - which Query Class to use?

2013-06-14 Thread Malgorzata Urbanska
Hello, I've just started using Lucene and I'm not sure which Query Classes I should use in my project. My goal is to compare paragraphs of text. Paragraph A is a query and paragraph B is a document for which I would like to calculate similarity score. the paragraphs A and B can be in some situat

Re: compare paragraphs of text - which Query Class to use?

2013-06-14 Thread Malgorzata Urbanska
ok at > the parsed query that edismax generates and do the same in your Lucene Java > code. > > -- Jack Krupansky > > -Original Message- From: Malgorzata Urbanska > Sent: Friday, June 14, 2013 12:23 PM > To: java-user@lucene.apache.org > Subject: compare paragraphs of tex

ngrams in Lucene 4.3.0

2013-07-15 Thread Malgorzata Urbanska
Hi, I've been trying to figure out how to use ngrams in Lucene 4.3.0 I found some examples for earlier version but I'm still confused. How I understand it, I should: 1. create a new analyzer which uses ngrams 2. apply it to my indexer 3. search using the same analyzer I found in a documentation:

Re: ngrams in Lucene 4.3.0

2013-07-15 Thread Malgorzata Urbanska
thanks !! On Mon, Jul 15, 2013 at 1:31 PM, Ivan Krišto wrote: > On 07/15/2013 07:50 PM, Malgorzata Urbanska wrote: >> Hi, >> >> I've been trying to figure out how to use ngrams in Lucene 4.3.0 >> I found some examples for earlier version but I'm stil

Re: ngrams in Lucene 4.3.0

2013-07-16 Thread Malgorzata Urbanska
looks like everything works perfectly however my searcher do not find any "hits" I suspect my indexer code, so I tried to check index. But Luke does not work with Lucene 4.3.0 :( Could someone give me hint what is happening? Thanks, gosia On Mon, Jul 15, 2013 at 1:45 PM, Malgorzata Urban

Re: ngrams in Lucene 4.3.0

2013-07-16 Thread Malgorzata Urbanska
Ok, I solved it I figured out; instead of NGramQuery in IndexSearcher I was using String :) gosia On Tue, Jul 16, 2013 at 12:28 PM, Malgorzata Urbanska wrote: > Hi, > > I built Indexer with NGramAnalizer which uses ShingleFilter > > Next I built Searcher with NGramQuery which us

ShingleFilter

2013-07-18 Thread Malgorzata Urbanska
Hello, For some time I have been trying to apply ShingleFilter. I have a string: "The users get program in the User RPC API in Apache Rave" and I would like to get: [the users get] [users get program] [get program in] [program in the] [in the user] [the user rpc] [user rpc api] [rpc api in] [a

Re: ShingleFilter

2013-07-18 Thread Malgorzata Urbanska
,StopAnalyzer.ENGLISH_STOP_WORDS_SET); > > return new Analyzer.TokenStreamComponents(source, sf); > > > Not sure the stopFilter will do you any good if you're extracting only > trigrams. > -Original Message- > From: murba...@rams.colostate.edu [mailto:murba...@rams.co

raw cosine similarity

2013-07-21 Thread Malgorzata Urbanska
Hi, I would like to calculate raw cosine similarity between query and document. I read documentation about lucene scoring but I'm still confused. Does exist any implementation in Luscen 4.3.0 to do that. If not, what is the easiest way to do this. So far I'm retrieving a TermVector for document