Hi,
I am new to Lucene. If I want to know the term or phrase frequency of an
input document, will it be possible through Lucene?
Thanks,
Manjula
http://people.apache.org/~hossman/#xyproblem).
>
> Best
> Erick
>
> On Thu, May 6, 2010 at 6:39 AM, manjula wijewickrema >wrote:
>
> > Hi,
> >
> > I am new to Lucene. If I want to know the term or phrase frequency of an
> > input document, will it be possible through Lucene?
> >
> > Thanks,
> > Manjula
> >
>
Hi,
I am using Lucene 2.9.1 . I have downloaded and run the 'HelloLucene.java'
class by modifing the input document and user query in various ways. Once I
put the document sentenses as 'Lucene in actions' insted of 'Lucene in
action', and I gave the query as 'action' and run the programme. But it
here belong to everybody, the opinions to me. The
> > distinction is yours to draw
> >
> >
> > On Fri, May 7, 2010 at 2:22 PM, manjula wijewickrema <
> manjul...@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I am usi
Hi,
If I index a document (single document) in Lucene, then how can I get the
term frequencies (even the first and second highest occuring terms) of that
document? Is there any class/method to do taht? If anybody knows, pls. help
me.
Thanks
Manjula
qVector?
>
> Best
> Erick
>
> On Mon, May 10, 2010 at 8:10 AM, manjula wijewickrema
> wrote:
>
> > Hi,
> >
> > If I index a document (single document) in Lucene, then how can I get the
> > term frequencies (even the first and second highest occuring terms) of
ckBerry® from Orange
>
> -Original Message-
> From: manjula wijewickrema
> Date: Tue, 11 May 2010 15:13:12
> To:
> Subject: Re: Class_for_HighFrequencyTerms
>
> Dear Erick,
>
> I lokked for it and even added IndexReader.java and TermFreqVector.java
> from
>
&
Dear All,
I am trying to get the term frequencies (through TermFreqVector) of a
document (using Lucene 2.9.1). In order to do that I have used the following
code. But there is a compile time error in the code and I can't figure it
out. Could somebody can guide me what's wrong with it.
Compile time
; TermFreqVector vector = IndexReader.getTermFreqVector(0, "fieldname" );
>
> with
>
> IndexReader ir = whatever(...);
> TermFreqVector vector = ir.getTermFreqVector(0, "fieldname" );
>
> And you'll need to move it to after the writer.close() call if
You don't appear to be doing anything
> with the String term in "for ( String term : vector.getTerms() )" -
> presumably you intend to.
>
>
> --
> Ian.
>
> On Thu, May 13, 2010 at 1:16 PM, manjula wijewickrema
> wrote:
> > Dear Ian,
> >
> >
Hi,
Is it possible to put the indexed terms into an array in lucene. For
example, imagine I have indexed a single document in Lucene and now I want
to acces those terms in the index. Is it possible to retrieve (call) those
terms as array elements? If it is possible, then how?
Thanks,
Manjula
, Andrzej Bialecki wrote:
> On 2010-05-14 11:35, manjula wijewickrema wrote:
> > Hi,
> >
> > Is it possible to put the indexed terms into an array in lucene. For
> > example, imagine I have indexed a single document in Lucene and now I
> want
> > to acces those t
class in my
code. But I was unable to find any guidence of how to do it? If you can pls.
be kind enough to tell me how can I use this class in my code.
Thanx
Manjula
On Fri, May 14, 2010 at 6:16 PM, Andrzej Bialecki wrote:
> On 2010-05-14 14:24, manjula wijewickrema wrote:
> > H
Hi,
I am struggling with using HighFreTerms class for the purpose of find high
fre. terms in my index. My target is to get the high frequency terms in an
indexed document (single document). To do that I have added
org.apache.lucene.misc package into my project. I think upto that point I am
correct
nstructions here for getting the source:
> http://wiki.apache.org/lucene-java/HowToContribute
>
> HTH
> Erick
>
> On Sat, May 15, 2010 at 1:49 AM, manjula wijewickrema
> wrote:
>
> > Hi,
> >
> > I am struggling with using HighFreTerms class for the purpose
Hi,
I wrote a code with a view to display the indexed terms and get their term
frequencies of a single document. Although it displys those terms in the
index, it does not give the term frequencies. Instead it displays ' frequencies
are:[...@80fa6f '. What's the reason for this. The code I have wri
Dear Ian,
I changed it as you said and now it is working nicely. Thanks a lot for your
kind help.
Manjula
On Mon, May 17, 2010 at 6:46 PM, Ian Lea wrote:
> terms and freqs are arrays. Try terms[i] and freqs[i].
>
>
> --
> Ian.
>
>
> On Mon, May 17, 2010 at 12:23
>
> > terms and freqs are arrays. Try terms[i] and freqs[i].
> >
> >
> > --
> > Ian.
> >
> >
> > On Mon, May 17, 2010 at 12:23 PM, manjula wijewickrema
> > wrote:
> >> Hi,
> >>
> >> I wrote a code with a view to
Hi,
I wrote aprogram to get the ferquencies and terms of an indexed document.
The output comes as follows;
If I print : +tfv[0]
Output:
array terms are:{title: capabl/1, code/2, frequenc/1, lucen/4, over/1,
sampl/1, term/4, test/1}
In the same way I can print terms[i] and freqs[i], but the pr
Dear Grant,
Thanks for your reply.
Manjula
On Mon, May 24, 2010 at 4:37 PM, Grant Ingersoll wrote:
>
> On May 20, 2010, at 5:15 AM, manjula wijewickrema wrote:
>
> > Hi,
> >
> > I wrote aprogram to get the ferquencies and terms of an indexed document.
>
Hi,
Using the following programme I was able to get the entire file path of
indexed files which matched with the given queries. But my intention is to
get only the file names even without .txt extention as I need to send these
file names as labels to another application. So, pls. let me know how c
."));
>
>
> --
> Ian.
>
>
> On Fri, Jun 11, 2010 at 11:20 AM, manjula wijewickrema
> wrote:
> > Hi,
> >
> > Using the following programme I was able to get the entire file path of
> > indexed files which matched with the given queries. But my intention
Hi,
In my application, I input only single term query (at one time) and get back
the corresponding scorings for those queries. But I am little struggling of
understanding Lucene scoring. I have reffered
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
and
some other
hit.score,doc.get(FIELD_CONTENTS));
System.*out*.println(hit.score);
Searcher.explain("rice",0);
}
Iterator it = hits.iterator();
*while* (it.hasNext()) {
Hit hit = it.next();
Document document = hit.getDocument();
String path = document.get(*FIELD_PATH*);
System.*out*.println(&qu
omething like
>
> System.out.println(indexSearcher.explain(query, 0));
>
>
> See the javadocs for details.
>
>
> --
> Ian.
>
>
> On Tue, Jul 6, 2010 at 7:39 AM, manjula wijewickrema
> wrote:
> > Dear Grant,
> >
> > Thanks a lot for your guidence.
Hi,
In my application, I input only one index file and enter only single term
query to check the lucene score. I used explain method to see the way of
obtaining results and system gave me the result as product of tf, idf,
fieldNorm.
1) Although Lucene uses tf to calculate scoring it seems to me t
Hi Rebecca,
Thanks for your valuble comments. Yes I observed tha, once the number of
terms of the goes up, fieldNorm value goes down correspondingly. I think,
therefore there won't be any default due to the variation of total number of
terms in the document. Am I right?
Manjula.
On Thu, Jul 8, 2
Hi,
I run a single programme to see the way of scoring by Lucene for single
indexed document. The explain() method gave me the following results.
***
Searching for 'metaphysics'
Number of hits: 1
0.030706111
0.030706111 = (MATCH) fieldWeight(contents:metaphys in 0), product of:
ldLength.LIMITED instead of UNLIMITED? Then the number
> of terms per document is limited.
>
> The calculation precision is limited by the float norm encoding, but also
> if
> your analyzer removed stop words, so the norm is not what you exspect?
>
> -
> Uwe Schindler
>
Thanx
On Fri, Jul 9, 2010 at 1:10 PM, Uwe Schindler wrote:
> > Thanks for your valuble comments. Yes I observed tha, once the number of
> > terms of the goes up, fieldNorm value goes down correspondingly. I think,
> > therefore there won't be any default due to the variation of total number
> of
Hi Koji,
Thanks for your information
Manjula
On Fri, Jul 9, 2010 at 5:04 PM, Koji Sekiguchi wrote:
> (10/07/09 19:30), manjula wijewickrema wrote:
>
>> Uwe, thanx for your comments. Following is the code I used in this case.
>> Could you pls. let me know where I have t
Hi,
I have seen that, onece the field length of a document goes over a certain
limit (
http://lucene.apache.org/java/2_9_3/api/all/org/apache/lucene/index/IndexWriter.html#DEFAULT_MAX_FIELD_LENGTH
gives
it as 10,000 terms-default) Lucene truncates those documents. Is there any
possibility to trunc
rms will take up just as much space
> with any MaxfieldLength > 5,000.
>
> HTH
> Erick
>
> On Mon, Jul 12, 2010 at 4:00 AM, manjula wijewickrema
> wrote:
>
> > Hi,
> >
> > I have seen that, onece the field length of a document goes over a
> cer
Hi,
Normally, when I am building my index directory for indexed documents, I
used to keep my indexed files simply in a directory called 'filesToIndex'.
So in this case, I do not use any standar database management system such
as mySql or any other.
1) Will it be possible to use mySql or any other
Hi,
Thanks a lot for your information.
Regards,
Manjula.
On Fri, Jul 23, 2010 at 12:48 PM, tarun sapra wrote:
> You can use HibernateSearch to maintain the synchronization between Lucene
> index and Mysql RDBMS.
>
> On Fri, Jul 23, 2010 at 11:16 AM, manjula wijewickrema
>
Hi,
In my work, I am using Lucene and two java classes. In the first one, I
index a document and in the second one, I try to search the most relevant
document for the indexed document in the first one. In the first java class,
I use the SnowballAnalyzer in the createIndex method and StandardAnalyz
ery terms. You'll likely get better
> results using WhitespaceAnalyzer, which tokenizes on whitespace and does no
> further analysis, rather than StandardAnalyzer.
>
> Steve
>
> > -Original Message-
> > From: manjula wijewickrema [mailto:manjul...@gmail.com]
&
plication, please consider copying this source code directory
> to
> your project and maintaining your own grammar-based tokenizer.
>
>
> Best
>
> Erick
>
> On Tue, Nov 30, 2010 at 12:06 AM, manjula wijewickrema
> wrote:
>
> > Hi Steve,
> >
> > Than
Hi,
1) In my application, I need to add more words to the stop word list.
Therefore, is it possible to add more words into the default lucene stop
word list?
2) If is it possible, then how can I do this?
Appreciate any comment from you.
Thanks,
Manjula.
2010 at 10:36 AM, Anshum wrote:
> Hi Manjula,
> You could initialize the Analyzer using a modified stop word set. Use
> the *StopAnalyzer.ENGLISH_STOP_WORDS_SET
> *to get the default stopset and then add your own words to it. You could
> then initialize the analyzer using this
Dear list,
My Lucene programme is able to index single words and search the most
matching documents (based on term frequencies) documents from a corpus to
the input document.
Now I want to index two word phrases and search the matching corpus
documents (based on phrase frequencies) to the input do
Dear All,
My Lucene programme is able to index single words and search the most
matching documents (based on term frequencies) documents from a corpus to
the input document.
Now I want to index two word phrases and search the matching corpus
documents (based on phrase frequencies) to the input doc
wrote:
> Hi Manjula,
>
> Sounds like ShingleFilter will do what you want: <
>
> http://lucene.apache.org/core/4_6_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html
> >
>
> Steve
> www.lucidworks.com
> On Dec 22, 2013 11:25 PM, "Manju
Hi,
What are the other disadvantages (other than the time factor) of creating
index for every request?
Manjula.
On Thu, Jun 5, 2014 at 2:34 PM, Aditya wrote:
> Hi Rajendra
>
> You should NOT create index writer for every request.
>
> >>Whether it is time consuming to update index writer when
Hi,
In my programme, I can index and search a document based on unigrams. I
modified the code as follows to obtain the results based on bigrams.
However, it did not give me the desired output.
*
*public* *static* *void* createIndex() *throws* CorruptIndexException,
LockObtainFail
Dear Steve,
It works. Thanks.
On Wed, Jun 11, 2014 at 6:18 PM, Steve Rowe wrote:
> You should give sw rather than analyzer in the IndexWriter actor.
>
> Steve
> www.lucidworks.com
> On Jun 11, 2014 2:24 AM, "Manjula Wijewickrema"
> wrote:
>
> > Hi,
>
Hi,
In my programme, I tried to select the most relevant document based on
bigrams.
System gives me the following output.
{contents: /1, assist librarian/1, assist manjula/2, assist sabaragamuwa/1,
fine manjula/1, librari manjula/1, librarian sabaragamuwa/1, main
librari/2, manjula assist/4, man
Hi,
Could please explain me how to determine the tf-idf score for bigrams. My
program is able to index and search bigrams correctly, but it does not
calculate the tf-idf for bigrams. If someone can, please help me to resolve
this.
Regards,
Manjula.
ocs having the bigram. I hope this is fine.
>
> Alternatively, use NGramTokenizer where ( n=2 in your case) while
> indexing. In such a case, each bigram can interpreted as a normal lucene
> term.
>
> Thanks,
> Parnab
>
>
> On Wed, Jul 2, 2014 at 8:45 AM, Manjula Wi
Hi,
I tried to index bigrams from a documhe system gave and the system gave me
the following output with the frequencies of the bigrams(output 1):
array size:15
array terms are:{contents: /1, assist librarian/1, assist manjula/2, assist
sabaragamuwa/1, fine manjula/1, librari manjula/1, librarian
Hi,
Can someone help me to understand the value given by 'hit.score' in Lucene.
I indexed a single document with five different words with different
frequencies and try to understand this value. However, it doesn't seem to
be normalized term frequency or tf-idf. I am using Lucene 2.91.
Any help w
Thanks Adrien.
On Mon, Mar 27, 2017 at 6:56 PM, Adrien Grand wrote:
> You can use IndexSearcher.explain to see how the score was computed.
>
> Le lun. 27 mars 2017 à 14:46, Manjula Wijewickrema a
> écrit :
>
> > Hi,
> >
> > Can someone help me to understand
Hi,
I have a document collection with hundreds of documents. I need to do know
the term frequency for a given query term in each document. I know that
'hit.score' will give me the Lucene score for each document (and it
includes term frequency as well). But I need to call only term frequencies
in e
Hi,
Is there any way to get the total count of terms in the Term Frequency
Vector (tvf)? I need to calculate the Normalized term frequency of each
term in my tvf. I know how to obtain the length of the tvf, but it doesn't
work since I need to count duplicate occurrences as well.
Highly appreciat
IndexReader.getTermFreqVectors(2)[0].getTermFrequencies()[5];
In the above example, Lucene gives me the term frequency of the 5th term
(e.g. say "planet") in the tfv of the corpus document "2".
But I need to get the term frequency for a specified term using its string
value.
E.g.:
term frequency
55 matches
Mail list logo