[ 
https://issues.apache.org/jira/browse/HIVE-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11444:
----------------------------------
    Description: 
Compaction should generate stats about number of files it reads, min/max/avg 
size etc.  It should also generate alerts if it looks like the system is not 
configured correctly.

For example, if there are lots of delta files with very small files, it's a 
good sign that Streaming API is configured with batches that are too small.

Simplest idea is to add another periodic task to AcidHouseKeeperService to
        //periodically do select count(*), min(txnid),max(txnid), type from 
txns group by type.
        //1. dump that to log file at info
        //2. could also keep counts for last 10min, hour, 6 hours, 24 hours, etc
        //2.2 if a large increase is detected - issue alert (at least to the 
log for now) at warn/error

Should also alert if there is ACID activity but no compactions running.
One way to do this is to add logic to TxnHandler to periodically check contents 
of COMPACTION_QUEUE table and keep  a simple histogram of compactions over last 
few hours.
Similarly can run a periodic check of transactions started (or 
committed/aborted) and keep a simple histogram.  Then the 2 can be used to 
detect that there is ACID write activity but no compaction activity.

  was:
Compaction should generate stats about number of files it reads, min/max/avg 
size etc.  It should also generate alerts if it looks like the system is not 
configured correctly.

For example, if there are lots of delta files with very small files, it's a 
good sign that Streaming API is configured with batches that are too small.

Simplest idea is to add another periodic task to AcidHouseKeeperService to
        //periodically do select count(*), min(txnid),max(txnid), type from 
txns group by type.
        //1. dump that to log file at info
        //2. could also keep counts for last 10min, hour, 6 hours, 24 hours, etc
        //2.2 if a large increase is detected - issue alert (at least to the 
log for now) at warn/error



> ACID Compactor should generate stats/alerts
> -------------------------------------------
>
>                 Key: HIVE-11444
>                 URL: https://issues.apache.org/jira/browse/HIVE-11444
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> Compaction should generate stats about number of files it reads, min/max/avg 
> size etc.  It should also generate alerts if it looks like the system is not 
> configured correctly.
> For example, if there are lots of delta files with very small files, it's a 
> good sign that Streaming API is configured with batches that are too small.
> Simplest idea is to add another periodic task to AcidHouseKeeperService to
>         //periodically do select count(*), min(txnid),max(txnid), type from 
> txns group by type.
>         //1. dump that to log file at info
>         //2. could also keep counts for last 10min, hour, 6 hours, 24 hours, 
> etc
>         //2.2 if a large increase is detected - issue alert (at least to the 
> log for now) at warn/error
> Should also alert if there is ACID activity but no compactions running.
> One way to do this is to add logic to TxnHandler to periodically check 
> contents of COMPACTION_QUEUE table and keep  a simple histogram of 
> compactions over last few hours.
> Similarly can run a periodic check of transactions started (or 
> committed/aborted) and keep a simple histogram.  Then the 2 can be used to 
> detect that there is ACID write activity but no compaction activity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to