[
https://issues.apache.org/jira/browse/LUCENE-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-2393:
---------------------------------------
Attachment: LUCENE-2393.patch
Patch looks good Tom!
I cleaned things up a bit -- eg, you don't need to use the class members when
interacting w/ the bulk DocsEnum API.
I think it's ready to go in!
> Utility to output total term frequency and df from a lucene index
> -----------------------------------------------------------------
>
> Key: LUCENE-2393
> URL: https://issues.apache.org/jira/browse/LUCENE-2393
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Tom Burton-West
> Priority: Trivial
> Attachments: LUCENE-2393, LUCENE-2393.patch, LUCENE-2393.patch,
> LUCENE-2393.patch, LUCENE-2393.patch, LUCENE-2393.patch, LUCENE-2393.patch
>
>
> This is a pair of command line utilities that provide information on the
> total number of occurrences of a term in a Lucene index. The first takes a
> field name, term, and index directory and outputs the document frequency for
> the term and the total number of occurrences of the term in the index (i.e.
> the sum of the tf of the term for each document). The second reads the
> index to determine the top N most frequent terms (by document frequency) and
> then outputs a list of those terms along with the document frequency and the
> total number of occurrences of the term. Both utilities are useful for
> estimating the size of the term's entry in the *prx files and consequent Disk
> I/O demands.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]