[ 
https://issues.apache.org/jira/browse/HIVE-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11444:
----------------------------------
    Description: 
Compaction should generate stats about number of files it reads, min/max/avg 
size etc.  It should also generate alerts if it looks like the system is not 
configured correctly.

For example, if there are lots of delta files with very small files, it's a 
good sign that Streaming API is configured with batches that are too small.

Simplest idea is to add another periodic task to AcidHouseKeeperService to
        //periodically do select count(*), min(txnid),max(txnid), type from 
txns group by type.
        //1. dump that to log file at info
        //2. could also keep counts for last 10min, hour, 6 hours, 24 hours, etc
        //2.2 if a large increase is detected - issue alert (at least to the 
log for now) at warn/error


  was:
Compaction should generate stats about number of files it reads, min/max/avg 
size etc.  It should also generate alerts if it looks like the system is not 
configured correctly.

For example, if there are lots of delta files with very small files, it's a 
good sign that Streaming API is configured with batches that are too small.


> ACID Compactor should generate stats/alerts
> -------------------------------------------
>
>                 Key: HIVE-11444
>                 URL: https://issues.apache.org/jira/browse/HIVE-11444
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> Compaction should generate stats about number of files it reads, min/max/avg 
> size etc.  It should also generate alerts if it looks like the system is not 
> configured correctly.
> For example, if there are lots of delta files with very small files, it's a 
> good sign that Streaming API is configured with batches that are too small.
> Simplest idea is to add another periodic task to AcidHouseKeeperService to
>         //periodically do select count(*), min(txnid),max(txnid), type from 
> txns group by type.
>         //1. dump that to log file at info
>         //2. could also keep counts for last 10min, hour, 6 hours, 24 hours, 
> etc
>         //2.2 if a large increase is detected - issue alert (at least to the 
> log for now) at warn/error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to