[ https://issues.apache.org/jira/browse/HIVE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796283#comment-13796283 ]
Jason Dere commented on HIVE-5559: ---------------------------------- In the case that the path gets too long for the stats publishing table, HIVE-3750 allows the stats publisher to generate a hash of the path to insert into the table instead. Looks like there's a couple of issues here in regards to list bucketing: 1) The stats publishing adds an extra '/' to the path, whereas the stats aggregation does not. Thus, the hashed path value is different when publishing and aggregating. 2013-10-15 14:05:51,785 INFO jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(137)) - Stats publishing for key w???Z?u =?7?!?/col2=82/col4=val_82000000 2013-10-15 14:05:51,787 INFO jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(137)) - Stats publishing for key w???Z?u =?7?!?/col2=466/col4=val_466000000 2013-10-15 14:05:51,787 INFO jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(137)) - Stats publishing for key w???Z?u =?7?!?/col2=287/col4=val_287000000 2013-10-15 14:05:51,788 INFO jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(137)) - Stats publishing for key w???Z?u =?7?!?/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME000000 2013-10-15 14:05:53,224 INFO exec.StatsTask (StatsTask.java:aggregateStats(380)) - Stats aggregator : #)5?tc?k??^:N??/ 2) hive.stats.key.prefix.max.length is set too high (200 chars) - it's possible for the values appended to the key prefix can exceed the 55 chars that are left to use, which can result in: 2013-10-15 14:50:12,541 INFO jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(137)) - Stats publishing for key pfile:/home/jdere/grid/0/workspace/UnitTest-hive-bigwheel-GA/label/sles1116/hdp-BUILDS/hive/build/ql/scratchdir/hive_2013-10-15_14-50-09_561_1692411440620554593-1/-ext-10000/ds=2008-04-08/hr=11//HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME000000 2013-10-15 14:50:12,544 ERROR jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(193)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/jdere/grid/0/workspace/UnitTest-hive-bigwheel-GA&' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:142) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:139) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2473) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:155) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1048) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:911) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:596) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:233) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/jdere/grid/0/workspace/UnitTest-hive-bigwheel-GA&' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/jdere/grid/0/workspace/UnitTest-hive-bigwheel-GA&' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source) at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source) at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source) at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(Unknown Source) at org.apache.derby.impl.sql.execute.DMLWriteResultSet.getNextRowCore(Unknown Source) at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown Source) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source) ... 24 more > Stats publisher fails for list bucketing when IDs are too long > -------------------------------------------------------------- > > Key: HIVE-5559 > URL: https://issues.apache.org/jira/browse/HIVE-5559 > Project: Hive > Issue Type: Bug > Components: Statistics > Reporter: Jason Dere > Assignee: Jason Dere > > Several of the list_bucket_* q files fail if the hive source path gets too > long. It looks like the numRows and rawDataSize stats aren't getting updated > properly in this situation. -- This message was sent by Atlassian JIRA (v6.1#6144)