[
https://issues.apache.org/jira/browse/LUCENE-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867164#action_12867164
]
Michael McCandless commented on LUCENE-2393:
--------------------------------------------
Patch looks good Tom! I'll re-merge my small changes from the prior patch, add
a CHANGES, and commit.
I don't think we need to upgrade to CL processing lib...
> Utility to output total term frequency and df from a lucene index
> -----------------------------------------------------------------
>
> Key: LUCENE-2393
> URL: https://issues.apache.org/jira/browse/LUCENE-2393
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Tom Burton-West
> Priority: Trivial
> Attachments: LUCENE-2393.patch, LUCENE-2393.patch, LUCENE-2393.patch,
> LUCENE-2393.patch, LUCENE-2393.patch, LUCENE-2393.patch, LUCENE-2393.patch
>
>
> This is a pair of command line utilities that provide information on the
> total number of occurrences of a term in a Lucene index. The first takes a
> field name, term, and index directory and outputs the document frequency for
> the term and the total number of occurrences of the term in the index (i.e.
> the sum of the tf of the term for each document). The second reads the
> index to determine the top N most frequent terms (by document frequency) and
> then outputs a list of those terms along with the document frequency and the
> total number of occurrences of the term. Both utilities are useful for
> estimating the size of the term's entry in the *prx files and consequent Disk
> I/O demands.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]