Calculating a IDF value for a document collection

2013-01-15 Thread Kasun Perera
newly created documents; it would be much computational intensive. -- Regards Kasun Perera

How to create a Lucene in-memory index at webapp deployment time

2012-09-06 Thread Kasun Perera
be created only once and I can access in-memory index as long as web app is live? -- Regards Kasun Perera

Different Weights to Lucene fields with Okapi Similarity

2012-07-16 Thread Kasun Perera
Resending again, since my question didn't get much attention -- Forwarded message -- From: Kasun Perera Date: Tue, Jun 19, 2012 at 3:26 PM Subject: Different Weights to Lucene fields with Okapi Similarity To: java-user@lucene.apache.org Based on this link http://www200

Re: with regarding to the blog post reply done on [AnonymousJune 28, 2012 5:19 AM]

2012-07-02 Thread Kasun Perera
tested it sometimes back and code worked for me, but I think it needed Lucene-core-2.9.jar. Hope this helps. I can't see any java code files in your attached ZIP file? it only contains some text files Regards Kasun Perera On Mon, Jul 2, 2012 at 12:09 PM, nadeesha meththananda < neranja

Different Weights to Lucene fields with Okapi Similarity

2012-06-19 Thread Kasun Perera
freq(t, doc) is the frequency of term t in document doc. Choosing b=0.25 and k = 1.2 you get w(t, doc) = idf(t) * 2.2*freq(t, doc) / (1.2*(0.25+0.75*ls(doc)) + freq(t, doc)) -- Regards Kasun Perera

Re: Calculating Average Document Length with Lucene

2012-06-19 Thread Kasun Perera
On Mon, Jun 18, 2012 at 8:48 AM, Kasun Perera wrote: > I want to calculate average document length for document collection which > each document having 3 different fields(filed1, field2,field3) > > This is the program to calculate average length when only one field is > th

Calculating Average Document Length with Lucene

2012-06-17 Thread Kasun Perera
calculating Doc average length for 3 field is correct? -- Regards Kasun Perera

Better Way of calculating Cosine Similarity between documents

2012-05-18 Thread Kasun Perera
equation that I can use for calculating cosine similarity between documents? -- Regards Kasun Perera

Re: Getting the frequencies by corresponding order of documents were indexed

2012-05-11 Thread Kasun Perera
Lucene? Thanks > -- > Ian. > > > On Fri, May 11, 2012 at 8:58 AM, Kasun Perera > wrote: > > I have collection of documents (say 10 documents)and i'm indexing them > this > > way, by storing the term vector > > > > StringReader strRdElt = new Str

Getting the frequencies by corresponding order of documents were indexed

2012-05-11 Thread Kasun Perera
I have collection of documents (say 10 documents)and i'm indexing them this way, by storing the term vector StringReader strRdElt = new StringReader(content); Document doc = new Document(); String docname=docNames[docNo]; doc.add(new Field("doccontent", strRdElt, Field.TermVector.Y

Indexing with Semantics

2012-04-27 Thread Kasun Perera
ne that can be used to index by semantics? so that it indexes "owe" "owed" "owing" as one word "owe" with term frequency =3 ? If not I'd welcome any suggestions achieving this task? -- Regards Kasun Perera

Calculating IDF value more efficiently

2012-04-27 Thread Kasun Perera
eDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true); searcher.search(q, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; return hits.length; } -- Regards Kasun Perera

Re: Weighted cosine similarity calculation using Lucene

2012-04-20 Thread Kasun Perera
eight to Taxonomy and Ontology terms in > > document similarity calculation? > > > > > > Are there Lucene functions that can be used to give higher weights to the > > certain fields when calculating TFIDF values using TermFreqVector? can I > > jus

Weighted cosine similarity calculation using Lucene

2012-04-20 Thread Kasun Perera
y calculation? Are there Lucene functions that can be used to give higher weights to the certain fields when calculating TFIDF values using TermFreqVector? can I just use the setboost() function for this purpose, then how? -- Regards Kasun Perera