Term/Phrase frequencies

2010-05-06 Thread manjula wijewickrema
Hi, I am new to Lucene. If I want to know the term or phrase frequency of an input document, will it be possible through Lucene? Thanks, Manjula

Re: Term/Phrase frequencies

2010-05-06 Thread manjula wijewickrema
http://people.apache.org/~hossman/#xyproblem). > > Best > Erick > > On Thu, May 6, 2010 at 6:39 AM, manjula wijewickrema >wrote: > > > Hi, > > > > I am new to Lucene. If I want to know the term or phrase frequency of an > > input document, will it be possible through Lucene? > > > > Thanks, > > Manjula > > >

Trace only exactly matching terms!

2010-05-07 Thread manjula wijewickrema
Hi, I am using Lucene 2.9.1 . I have downloaded and run the 'HelloLucene.java' class by modifing the input document and user query in various ways. Once I put the document sentenses as 'Lucene in actions' insted of 'Lucene in action', and I gave the query as 'action' and run the programme. But it

Re: Trace only exactly matching terms!

2010-05-10 Thread manjula wijewickrema
here belong to everybody, the opinions to me. The > > distinction is yours to draw > > > > > > On Fri, May 7, 2010 at 2:22 PM, manjula wijewickrema < > manjul...@gmail.com > > >wrote: > > > > > Hi, > > > > > > I am usi

Class_for_HighFrequencyTerms

2010-05-10 Thread manjula wijewickrema
Hi, If I index a document (single document) in Lucene, then how can I get the term frequencies (even the first and second highest occuring terms) of that document? Is there any class/method to do taht? If anybody knows, pls. help me. Thanks Manjula

Re: Class_for_HighFrequencyTerms

2010-05-11 Thread manjula wijewickrema
qVector? > > Best > Erick > > On Mon, May 10, 2010 at 8:10 AM, manjula wijewickrema > wrote: > > > Hi, > > > > If I index a document (single document) in Lucene, then how can I get the > > term frequencies (even the first and second highest occuring terms) of

Re: Class_for_HighFrequencyTerms

2010-05-13 Thread manjula wijewickrema
ckBerry® from Orange > > -Original Message- > From: manjula wijewickrema > Date: Tue, 11 May 2010 15:13:12 > To: > Subject: Re: Class_for_HighFrequencyTerms > > Dear Erick, > > I lokked for it and even added IndexReader.java and TermFreqVector.java > from > &

Error of the code

2010-05-13 Thread manjula wijewickrema
Dear All, I am trying to get the term frequencies (through TermFreqVector) of a document (using Lucene 2.9.1). In order to do that I have used the following code. But there is a compile time error in the code and I can't figure it out. Could somebody can guide me what's wrong with it. Compile time

Re: Error of the code

2010-05-13 Thread manjula wijewickrema
; TermFreqVector vector = IndexReader.getTermFreqVector(0, "fieldname" ); > > with > > IndexReader ir = whatever(...); > TermFreqVector vector = ir.getTermFreqVector(0, "fieldname" ); > > And you'll need to move it to after the writer.close() call if

Re: Error of the code

2010-05-14 Thread manjula wijewickrema
You don't appear to be doing anything > with the String term in "for ( String term : vector.getTerms() )" - > presumably you intend to. > > > -- > Ian. > > On Thu, May 13, 2010 at 1:16 PM, manjula wijewickrema > wrote: > > Dear Ian, > > > >

Access indexed terms

2010-05-14 Thread manjula wijewickrema
Hi, Is it possible to put the indexed terms into an array in lucene. For example, imagine I have indexed a single document in Lucene and now I want to acces those terms in the index. Is it possible to retrieve (call) those terms as array elements? If it is possible, then how? Thanks, Manjula

Re: Access indexed terms

2010-05-14 Thread manjula wijewickrema
, Andrzej Bialecki wrote: > On 2010-05-14 11:35, manjula wijewickrema wrote: > > Hi, > > > > Is it possible to put the indexed terms into an array in lucene. For > > example, imagine I have indexed a single document in Lucene and now I > want > > to acces those t

Re: Access indexed terms

2010-05-14 Thread manjula wijewickrema
class in my code. But I was unable to find any guidence of how to do it? If you can pls. be kind enough to tell me how can I use this class in my code. Thanx Manjula On Fri, May 14, 2010 at 6:16 PM, Andrzej Bialecki wrote: > On 2010-05-14 14:24, manjula wijewickrema wrote: > > H

How to call high fre. terms using HighFreTerms class

2010-05-14 Thread manjula wijewickrema
Hi, I am struggling with using HighFreTerms class for the purpose of find high fre. terms in my index. My target is to get the high frequency terms in an indexed document (single document). To do that I have added org.apache.lucene.misc package into my project. I think upto that point I am correct

Re: How to call high fre. terms using HighFreTerms class

2010-05-17 Thread manjula wijewickrema
nstructions here for getting the source: > http://wiki.apache.org/lucene-java/HowToContribute > > HTH > Erick > > On Sat, May 15, 2010 at 1:49 AM, manjula wijewickrema > wrote: > > > Hi, > > > > I am struggling with using HighFreTerms class for the purpose

Problem of getTermFrequencies()

2010-05-17 Thread manjula wijewickrema
Hi, I wrote a code with a view to display the indexed terms and get their term frequencies of a single document. Although it displys those terms in the index, it does not give the term frequencies. Instead it displays ' frequencies are:[...@80fa6f '. What's the reason for this. The code I have wri

Re: Problem of getTermFrequencies()

2010-05-17 Thread manjula wijewickrema
Dear Ian, I changed it as you said and now it is working nicely. Thanks a lot for your kind help. Manjula On Mon, May 17, 2010 at 6:46 PM, Ian Lea wrote: > terms and freqs are arrays. Try terms[i] and freqs[i]. > > > -- > Ian. > > > On Mon, May 17, 2010 at 12:23

Re: Problem of getTermFrequencies()

2010-05-20 Thread manjula wijewickrema
> > > terms and freqs are arrays. Try terms[i] and freqs[i]. > > > > > > -- > > Ian. > > > > > > On Mon, May 17, 2010 at 12:23 PM, manjula wijewickrema > > wrote: > >> Hi, > >> > >> I wrote a code with a view to

Arrange terms[i]

2010-05-20 Thread manjula wijewickrema
Hi, I wrote aprogram to get the ferquencies and terms of an indexed document. The output comes as follows; If I print : +tfv[0] Output: array terms are:{title: capabl/1, code/2, frequenc/1, lucen/4, over/1, sampl/1, term/4, test/1} In the same way I can print terms[i] and freqs[i], but the pr

Re: Arrange terms[i]

2010-05-25 Thread manjula wijewickrema
Dear Grant, Thanks for your reply. Manjula On Mon, May 24, 2010 at 4:37 PM, Grant Ingersoll wrote: > > On May 20, 2010, at 5:15 AM, manjula wijewickrema wrote: > > > Hi, > > > > I wrote aprogram to get the ferquencies and terms of an indexed document. >

How to get file names instead of paths?

2010-06-11 Thread manjula wijewickrema
Hi, Using the following programme I was able to get the entire file path of indexed files which matched with the given queries. But my intention is to get only the file names even without .txt extention as I need to send these file names as labels to another application. So, pls. let me know how c

Re: How to get file names instead of paths?

2010-06-15 Thread manjula wijewickrema
.")); > > > -- > Ian. > > > On Fri, Jun 11, 2010 at 11:20 AM, manjula wijewickrema > wrote: > > Hi, > > > > Using the following programme I was able to get the entire file path of > > indexed files which matched with the given queries. But my intention

Lucene Scoring

2010-07-05 Thread manjula wijewickrema
Hi, In my application, I input only single term query (at one time) and get back the corresponding scorings for those queries. But I am little struggling of understanding Lucene scoring. I have reffered http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html and some other

Re: Lucene Scoring

2010-07-05 Thread manjula wijewickrema
hit.score,doc.get(FIELD_CONTENTS)); System.*out*.println(hit.score); Searcher.explain("rice",0); } Iterator it = hits.iterator(); *while* (it.hasNext()) { Hit hit = it.next(); Document document = hit.getDocument(); String path = document.get(*FIELD_PATH*); System.*out*.println(&qu

Re: Lucene Scoring

2010-07-07 Thread manjula wijewickrema
omething like > > System.out.println(indexSearcher.explain(query, 0)); > > > See the javadocs for details. > > > -- > Ian. > > > On Tue, Jul 6, 2010 at 7:39 AM, manjula wijewickrema > wrote: > > Dear Grant, > > > > Thanks a lot for your guidence.

Why not normalization?

2010-07-07 Thread manjula wijewickrema
Hi, In my application, I input only one index file and enter only single term query to check the lucene score. I used explain method to see the way of obtaining results and system gave me the result as product of tf, idf, fieldNorm. 1) Although Lucene uses tf to calculate scoring it seems to me t

Re: Why not normalization?

2010-07-08 Thread manjula wijewickrema
Hi Rebecca, Thanks for your valuble comments. Yes I observed tha, once the number of terms of the goes up, fieldNorm value goes down correspondingly. I think, therefore there won't be any default due to the variation of total number of terms in the document. Am I right? Manjula. On Thu, Jul 8, 2

scoring and index size

2010-07-09 Thread manjula wijewickrema
Hi, I run a single programme to see the way of scoring by Lucene for single indexed document. The explain() method gave me the following results. *** Searching for 'metaphysics' Number of hits: 1 0.030706111 0.030706111 = (MATCH) fieldWeight(contents:metaphys in 0), product of:

Re: scoring and index size

2010-07-09 Thread manjula wijewickrema
ldLength.LIMITED instead of UNLIMITED? Then the number > of terms per document is limited. > > The calculation precision is limited by the float norm encoding, but also > if > your analyzer removed stop words, so the norm is not what you exspect? > > - > Uwe Schindler >

Re: Why not normalization?

2010-07-09 Thread manjula wijewickrema
Thanx On Fri, Jul 9, 2010 at 1:10 PM, Uwe Schindler wrote: > > Thanks for your valuble comments. Yes I observed tha, once the number of > > terms of the goes up, fieldNorm value goes down correspondingly. I think, > > therefore there won't be any default due to the variation of total number > of

Re: scoring and index size

2010-07-12 Thread manjula wijewickrema
Hi Koji, Thanks for your information Manjula On Fri, Jul 9, 2010 at 5:04 PM, Koji Sekiguchi wrote: > (10/07/09 19:30), manjula wijewickrema wrote: > >> Uwe, thanx for your comments. Following is the code I used in this case. >> Could you pls. let me know where I have t

MaxFieldLength

2010-07-12 Thread manjula wijewickrema
Hi, I have seen that, onece the field length of a document goes over a certain limit ( http://lucene.apache.org/java/2_9_3/api/all/org/apache/lucene/index/IndexWriter.html#DEFAULT_MAX_FIELD_LENGTH gives it as 10,000 terms-default) Lucene truncates those documents. Is there any possibility to trunc

Re: MaxFieldLength

2010-07-12 Thread manjula wijewickrema
rms will take up just as much space > with any MaxfieldLength > 5,000. > > HTH > Erick > > On Mon, Jul 12, 2010 at 4:00 AM, manjula wijewickrema > wrote: > > > Hi, > > > > I have seen that, onece the field length of a document goes over a > cer

Databases

2010-07-22 Thread manjula wijewickrema
Hi, Normally, when I am building my index directory for indexed documents, I used to keep my indexed files simply in a directory called 'filesToIndex'. So in this case, I do not use any standar database management system such as mySql or any other. 1) Will it be possible to use mySql or any other

Re: Databases

2010-07-27 Thread manjula wijewickrema
Hi, Thanks a lot for your information. Regards, Manjula. On Fri, Jul 23, 2010 at 12:48 PM, tarun sapra wrote: > You can use HibernateSearch to maintain the synchronization between Lucene > index and Mysql RDBMS. > > On Fri, Jul 23, 2010 at 11:16 AM, manjula wijewickrema >

Analyzer

2010-11-29 Thread manjula wijewickrema
Hi, In my work, I am using Lucene and two java classes. In the first one, I index a document and in the second one, I try to search the most relevant document for the indexed document in the first one. In the first java class, I use the SnowballAnalyzer in the createIndex method and StandardAnalyz

Re: Analyzer

2010-11-29 Thread manjula wijewickrema
ery terms. You'll likely get better > results using WhitespaceAnalyzer, which tokenizes on whitespace and does no > further analysis, rather than StandardAnalyzer. > > Steve > > > -Original Message- > > From: manjula wijewickrema [mailto:manjul...@gmail.com] &

Re: Analyzer

2010-12-02 Thread manjula wijewickrema
plication, please consider copying this source code directory > to > your project and maintaining your own grammar-based tokenizer. > > > Best > > Erick > > On Tue, Nov 30, 2010 at 12:06 AM, manjula wijewickrema > wrote: > > > Hi Steve, > > > > Than

Editing StopWordList

2010-12-20 Thread manjula wijewickrema
Hi, 1) In my application, I need to add more words to the stop word list. Therefore, is it possible to add more words into the default lucene stop word list? 2) If is it possible, then how can I do this? Appreciate any comment from you. Thanks, Manjula.

Re: Editing StopWordList

2010-12-21 Thread manjula wijewickrema
2010 at 10:36 AM, Anshum wrote: > Hi Manjula, > You could initialize the Analyzer using a modified stop word set. Use > the *StopAnalyzer.ENGLISH_STOP_WORDS_SET > *to get the default stopset and then add your own words to it. You could > then initialize the analyzer using this

Phrase indexing and searching

2013-12-18 Thread Manjula Wijewickrema
Dear list, My Lucene programme is able to index single words and search the most matching documents (based on term frequencies) documents from a corpus to the input document. Now I want to index two word phrases and search the matching corpus documents (based on phrase frequencies) to the input do

Phrase indexing and searching

2013-12-22 Thread Manjula Wijewickrema
Dear All, My Lucene programme is able to index single words and search the most matching documents (based on term frequencies) documents from a corpus to the input document. Now I want to index two word phrases and search the matching corpus documents (based on phrase frequencies) to the input doc

Re: Phrase indexing and searching

2013-12-23 Thread Manjula Wijewickrema
wrote: > Hi Manjula, > > Sounds like ShingleFilter will do what you want: < > > http://lucene.apache.org/core/4_6_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html > > > > Steve > www.lucidworks.com > On Dec 22, 2013 11:25 PM, "Manju

Re: Is it wrong to create index writer on each query request.

2014-06-05 Thread Manjula Wijewickrema
Hi, What are the other disadvantages (other than the time factor) of creating index for every request? Manjula. On Thu, Jun 5, 2014 at 2:34 PM, Aditya wrote: > Hi Rajendra > > You should NOT create index writer for every request. > > >>Whether it is time consuming to update index writer when

ShingleAnalyzerWrapper question

2014-06-10 Thread Manjula Wijewickrema
Hi, In my programme, I can index and search a document based on unigrams. I modified the code as follows to obtain the results based on bigrams. However, it did not give me the desired output. * *public* *static* *void* createIndex() *throws* CorruptIndexException, LockObtainFail

Re: ShingleAnalyzerWrapper question

2014-06-16 Thread Manjula Wijewickrema
Dear Steve, It works. Thanks. On Wed, Jun 11, 2014 at 6:18 PM, Steve Rowe wrote: > You should give sw rather than analyzer in the IndexWriter actor. > > Steve > www.lucidworks.com > On Jun 11, 2014 2:24 AM, "Manjula Wijewickrema" > wrote: > > > Hi, >

Why bigram tf-idf is 0?

2014-06-24 Thread Manjula Wijewickrema
Hi, In my programme, I tried to select the most relevant document based on bigrams. System gives me the following output. {contents: /1, assist librarian/1, assist manjula/2, assist sabaragamuwa/1, fine manjula/1, librari manjula/1, librarian sabaragamuwa/1, main librari/2, manjula assist/4, man

bigram problem

2014-07-02 Thread Manjula Wijewickrema
Hi, Could please explain me how to determine the tf-idf score for bigrams. My program is able to index and search bigrams correctly, but it does not calculate the tf-idf for bigrams. If someone can, please help me to resolve this. Regards, Manjula.

Re: bigram problem

2014-07-02 Thread Manjula Wijewickrema
ocs having the bigram. I hope this is fine. > > Alternatively, use NGramTokenizer where ( n=2 in your case) while > indexing. In such a case, each bigram can interpreted as a normal lucene > term. > > Thanks, > Parnab > > > On Wed, Jul 2, 2014 at 8:45 AM, Manjula Wi

Why hit is 0 for bigrams?

2014-07-07 Thread Manjula Wijewickrema
Hi, I tried to index bigrams from a documhe system gave and the system gave me the following output with the frequencies of the bigrams(output 1): array size:15 array terms are:{contents: /1, assist librarian/1, assist manjula/2, assist sabaragamuwa/1, fine manjula/1, librari manjula/1, librarian

hit.score

2017-03-27 Thread Manjula Wijewickrema
Hi, Can someone help me to understand the value given by 'hit.score' in Lucene. I indexed a single document with five different words with different frequencies and try to understand this value. However, it doesn't seem to be normalized term frequency or tf-idf. I am using Lucene 2.91. Any help w

Re: hit.score

2017-03-27 Thread Manjula Wijewickrema
Thanks Adrien. On Mon, Mar 27, 2017 at 6:56 PM, Adrien Grand wrote: > You can use IndexSearcher.explain to see how the score was computed. > > Le lun. 27 mars 2017 à 14:46, Manjula Wijewickrema a > écrit : > > > Hi, > > > > Can someone help me to understand

Only term frequencies

2017-04-06 Thread Manjula Wijewickrema
Hi, I have a document collection with hundreds of documents. I need to do know the term frequency for a given query term in each document. I know that 'hit.score' will give me the Lucene score for each document (and it includes term frequency as well). But I need to call only term frequencies in e

Total of term frequencies

2017-04-16 Thread Manjula Wijewickrema
Hi, Is there any way to get the total count of terms in the Term Frequency Vector (tvf)? I need to calculate the Normalized term frequency of each term in my tvf. I know how to obtain the length of the tvf, but it doesn't work since I need to count duplicate occurrences as well. Highly appreciat

TermFrequency for a String

2017-04-28 Thread Manjula Wijewickrema
IndexReader.getTermFreqVectors(2)[0].getTermFrequencies()[5]; In the above example, Lucene gives me the term frequency of the 5th term (e.g. say "planet") in the tfv of the corpus document "2". But I need to get the term frequency for a specified term using its string value. E.g.: term frequency