[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Damien Carol updated HIVE-7669: ------------------------------- Description: The source table has 600 Million rows and it has a String column "l_shipinstruct" which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column "l_shipinstruct" as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=10000000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.<init>(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} was: The source table has 600 Million rows and it has a String column "l_shipinstruct" which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column "l_shipinstruct" as shown in the below HiveQL with the following parameters. set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=10000000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; Stack Trace Diagnostic Messages for this Task: Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.<init>(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more > parallel order by clause on a string column fails with IOException: Split > points are out of order > ------------------------------------------------------------------------------------------------- > > Key: HIVE-7669 > URL: https://issues.apache.org/jira/browse/HIVE-7669 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Query Processor, SQL > Affects Versions: 0.12.0 > Environment: Hive 0.12.0-cdh5.0.0 > OS: Redhat linux > Reporter: Vishal Kamath > Labels: orderby > > The source table has 600 Million rows and it has a String column > "l_shipinstruct" which has 4 unique values. (Ie. these 4 values are repeated > across the 600 million rows) > We are sorting it based on this string column "l_shipinstruct" as shown in > the below HiveQL with the following parameters. > {code:sql} > set hive.optimize.sampling.orderby=true; > set hive.optimize.sampling.orderby.number=10000000; > set hive.optimize.sampling.orderby.percent=0.1f; > insert overwrite table lineitem_temp_report > select > l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, > l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, > l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment > from > lineitem > order by l_shipinstruct; > {code} > Stack Trace > Diagnostic Messages for this Task: > {noformat} > Error: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at > org.apache.hadoop.mapred.MapTask$OldOutputCollector.<init>(MapTask.java:569) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 10 more > Caused by: java.lang.IllegalArgumentException: Can't read partitions file > at > org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) > at > org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) > at > org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) > ... 15 more > Caused by: java.io.IOException: Split points are out of order > at > org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) > ... 17 more > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)