[ https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar updated HIVE-15396: -------------------------------- Description: Basic stats are not collected when a managed table is created with a specified {{LOCATION}} clause. {code} 0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int); 0: jdbc:hive2://localhost:10000> describe formatted hdfs_1; +-------------------------------+----------------------------------------------------+-----------------------------+ | col_name | data_type | comment | +-------------------------------+----------------------------------------------------+-----------------------------+ | # col_name | data_type | comment | | | NULL | NULL | | col | int | | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | default | NULL | | Owner: | anonymous | NULL | | CreateTime: | Wed Mar 22 18:09:19 PDT 2017 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Retention: | 0 | NULL | | Location: | file:/Users/stakiar/Documents/idea/apache-hive/warehouse/hdfs_2 | NULL | | Table Type: | MANAGED_TABLE | NULL | | Table Parameters: | NULL | NULL | | | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} | | | numFiles | 0 | | | numRows | 0 | | | rawDataSize | 0 | | | totalSize | 0 | | | transient_lastDdlTime | 1490231359 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | -1 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | | Storage Desc Params: | NULL | NULL | | | serialization.format | 1 | +-------------------------------+----------------------------------------------------+-----------------------------+ 0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location 's3a://[bucket]/test-tables/s3-1'; 0: jdbc:hive2://localhost:10000> describe formatted s3_1; +-------------------------------+----------------------------------------------------+-----------------------+ | col_name | data_type | comment | +-------------------------------+----------------------------------------------------+-----------------------+ | # col_name | data_type | comment | | | NULL | NULL | | col | int | | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | default | NULL | | Owner: | anonymous | NULL | | CreateTime: | Wed Mar 22 18:10:01 PDT 2017 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Retention: | 0 | NULL | | Location: | s3a://cloudera-dev-hive-on-s3/test-tables/s3-6 | NULL | | Table Type: | MANAGED_TABLE | NULL | | Table Parameters: | NULL | NULL | | | transient_lastDdlTime | 1490231401 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | -1 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | | Storage Desc Params: | NULL | NULL | | | serialization.format | 1 | +-------------------------------+----------------------------------------------------+-----------------------+ {code} was: {{numRows}} is not collected when running {{INSERT ... INTO ...}} commands against tables backed by S3 (and maybe even other blobstores). The COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"} entry is missing from the {{describe extended}} output. Repro steps: {code} hive> drop table s3_table; OK Time taken: 1.87 seconds hive> create table s3_table (col int) location 's3a://[bucket-name]/stats-test/'; OK Time taken: 3.069 seconds hive> insert into s3_table values (1), (2), (3); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = stakiar_20161208160105_fb3df340-d5fb-4ad6-8776-4f3cae02216d Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2016-12-08 16:01:12,741 Stage-1 map = 0%, reduce = 0% 2016-12-08 16:01:16,759 Stage-1 map = 100%, reduce = 0% Ended Job = job_local688636529_0004 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Loading data to table default.s3_table MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 23.0 seconds hive> select * from s3_table; OK 1 2 3 Time taken: 0.096 seconds, Fetched: 3 row(s) hive> describe extended s3_table; OK col int Detailed Table Information Table(tableName:s3_table, dbName:default, owner:stakiar, createTime:1481241657, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], location:s3a://[bucket-name]/stats-test, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1481241687, totalSize=6, numFiles=1}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.037 seconds, Fetched: 3 row(s) {code} > Basic Stats are not collected when for managed tables with LOCATION specified > ----------------------------------------------------------------------------- > > Key: HIVE-15396 > URL: https://issues.apache.org/jira/browse/HIVE-15396 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Attachments: HIVE-15396.1.patch > > > Basic stats are not collected when a managed table is created with a > specified {{LOCATION}} clause. > {code} > 0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int); > 0: jdbc:hive2://localhost:10000> describe formatted hdfs_1; > +-------------------------------+----------------------------------------------------+-----------------------------+ > | col_name | data_type > | comment | > +-------------------------------+----------------------------------------------------+-----------------------------+ > | # col_name | data_type > | comment | > | | NULL > | NULL | > | col | int > | | > | | NULL > | NULL | > | # Detailed Table Information | NULL > | NULL | > | Database: | default > | NULL | > | Owner: | anonymous > | NULL | > | CreateTime: | Wed Mar 22 18:09:19 PDT 2017 > | NULL | > | LastAccessTime: | UNKNOWN > | NULL | > | Retention: | 0 > | NULL | > | Location: | > file:/Users/stakiar/Documents/idea/apache-hive/warehouse/hdfs_2 | NULL > | > | Table Type: | MANAGED_TABLE > | NULL | > | Table Parameters: | NULL > | NULL | > | | COLUMN_STATS_ACCURATE > | {\"BASIC_STATS\":\"true\"} | > | | numFiles > | 0 | > | | numRows > | 0 | > | | rawDataSize > | 0 | > | | totalSize > | 0 | > | | transient_lastDdlTime > | 1490231359 | > | | NULL > | NULL | > | # Storage Information | NULL > | NULL | > | SerDe Library: | > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL > | > | InputFormat: | org.apache.hadoop.mapred.TextInputFormat > | NULL | > | OutputFormat: | > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL > | > | Compressed: | No > | NULL | > | Num Buckets: | -1 > | NULL | > | Bucket Columns: | [] > | NULL | > | Sort Columns: | [] > | NULL | > | Storage Desc Params: | NULL > | NULL | > | | serialization.format > | 1 | > +-------------------------------+----------------------------------------------------+-----------------------------+ > 0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location > 's3a://[bucket]/test-tables/s3-1'; > 0: jdbc:hive2://localhost:10000> describe formatted s3_1; > +-------------------------------+----------------------------------------------------+-----------------------+ > | col_name | data_type > | comment | > +-------------------------------+----------------------------------------------------+-----------------------+ > | # col_name | data_type > | comment | > | | NULL > | NULL | > | col | int > | | > | | NULL > | NULL | > | # Detailed Table Information | NULL > | NULL | > | Database: | default > | NULL | > | Owner: | anonymous > | NULL | > | CreateTime: | Wed Mar 22 18:10:01 PDT 2017 > | NULL | > | LastAccessTime: | UNKNOWN > | NULL | > | Retention: | 0 > | NULL | > | Location: | > s3a://cloudera-dev-hive-on-s3/test-tables/s3-6 | NULL | > | Table Type: | MANAGED_TABLE > | NULL | > | Table Parameters: | NULL > | NULL | > | | transient_lastDdlTime > | 1490231401 | > | | NULL > | NULL | > | # Storage Information | NULL > | NULL | > | SerDe Library: | > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | > | InputFormat: | org.apache.hadoop.mapred.TextInputFormat > | NULL | > | OutputFormat: | > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL > | > | Compressed: | No > | NULL | > | Num Buckets: | -1 > | NULL | > | Bucket Columns: | [] > | NULL | > | Sort Columns: | [] > | NULL | > | Storage Desc Params: | NULL > | NULL | > | | serialization.format > | 1 | > +-------------------------------+----------------------------------------------------+-----------------------+ > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)