[ 
https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160813#comment-14160813
 ] 

Lefty Leverenz commented on HIVE-6500:
--------------------------------------

Good catch, [~szehon].  Yes, the "Newly Created Tables" section of the StatsDev 
wikidoc needs to be updated, keeping in mind that releases 0.7 though 0.12 have 
"jdbc:derby" as the default for *hive.stats.dbclass* so we can't just swap in 
the new default value.  Linking to/from *hive.stats.dbclass* in the 
Configuration Properties doc will help with future maintenance.

* [StatsDev -- Newly Created Tables | 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-NewlyCreatedTables]
* [Configuration Properties -- hive.stats.dbclass | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.dbclass]

Also, the HiveConf.java description of *hive.stats.dbclass* omits the "fs" 
value.  I can correct that in the next patch for HIVE-6586, perhaps using the 
wiki description or a variant of it:

{quote}
The storage that stores temporary Hive statistics. In FS based statistics 
collection, each task writes statistics it has collected in a file on the 
filesystem, which will be aggregated after the job has finished. Supported 
values are fs (filesystem), jdbc(:.*), hbase, counter and custom (HIVE-6500).
{quote}

Suggested changes to that description:  (1) change "FS" to "filesystem (fs)", 
(2) remove or move "(HIVE-6500)" so it doesn't imply that HIVE-6500 added 
"custom", (3) change "jdbc(:.*)" to "jdbc:<database>" and explain that 
<database> can be derby, mysql, ... and what others -- is there a complete list 
anywhere?

P.S.  What do you mean by "It is actually not linked from the top"?  Top of 
what?  Maybe you mean it belongs on the Home page.  Currently it's listed on 
the LanguageManual page, but that's easy to change -- we can even list it both 
places.

> Stats collection via filesystem
> -------------------------------
>
>                 Key: HIVE-6500
>                 URL: https://issues.apache.org/jira/browse/HIVE-6500
>             Project: Hive
>          Issue Type: New Feature
>          Components: Statistics
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>              Labels: TODOC14
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6500.2.patch, HIVE-6500.3.patch, HIVE-6500.patch
>
>
> Recently, support for stats gathering via counter was [added | 
> https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has 
> following issues:
> * [Length of counter group name is limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
> * [Length of counter name is limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
> * [Number of distinct counter groups are limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
> * [Number of distinct counters are limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
> Although, these limits are configurable, but setting them to higher value 
> implies increased memory load on AM and job history server.
> Now, whether these limits makes sense or not is [debatable | 
> https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that 
> Hive doesn't make use of counters features of framework so that it we can 
> evolve this feature without relying on support from framework. Filesystem 
> based counter collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to