Hi, I have a hive script [given below] which calls a python script using transform and for large datasets [ > 100M rows ], the reducer is not able to start the python process and the error message is "argument list too long". The detailed error stack is given below.
The python script takes only 1 static argument "hbase". For small datasets [ 1M rows ], the script works fine. 1. Is this problem related to the number of open file handles on the reducer box? 2. How do I get the correct error message? 3. Is there a way to get to the actual unix process with arguments it is instantiating? Thanks, Irfan >>>>>>>>>>>>>> script >>>>>>>>>>>>>>>>>>>>>>>>>>> true && echo " set hive.exec.compress.output=true; set io.seqfile.compression.type=BLOCK; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; set hive.merge.mapfiles=false; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.max.dynamic.partitions=20000; set hive.map.aggr=true; set mapred.reduce.tasks=200; add file /home/irfan/scripts/user_id_output.py; from ( select user_id, source_id, load_id, user_id_json, pid from user_id_bucket where 1 = 1 and length(user_id) = 40 distribute by pid sort by user_id, load_id desc ) T1 insert overwrite table user_id_hbase_stg1 partition (pid) SELECT transform ( T1.user_id , T1.source_id , T1.load_id , T1.user_id_json , T1.pid ) using 'python2.6 user_id_output.py hbase' as user_id, user_id_info1, user_id_info2, user_id_info3, user_id_info4, user_id_info5, user_id_info3, pid ; " > ${SQL_FILE}; true && ${HHIVE} -f ${SQL_FILE} 1>${TXT_FILE} 2>${LOG_FILE}; >>>>>>>>>>>>>> error log >>>>>>>>>>>>>>>>>>>>>>>>> 2011-03-01 14:46:13,705 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing Self 5 OP 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator: Operator 5 OP initialized 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing children of 5 OP 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing child 6 SEL 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self 6 SEL 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string> 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Operator 6 SEL initialized 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children of 6 SEL 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing child 7 SCR 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing Self 7 SCR 2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Operator 7 SCR initialized 2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing children of 7 SCR 2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 8 FS 2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 8 FS 2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 8 FS initialized 2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 8 FS 2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Initialization Done 7 SCR 2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done 6 SEL 2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator: Initialization Done 5 OP 2011-03-01 14:46:13,733 INFO ExecReducer: ExecReducer: processing 1 rows: used memory = 89690888 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator: 5 forwarding 1 rows 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarding 1 rows 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing [/usr/bin/python2.6, user_id_output.py, hbase] 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: tablename=null 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: partname=null 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: alias=null 2011-03-01 14:46:13,777 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07': 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244) ... 7 more Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6": java.io.IOException: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279) ... 15 more Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.<init>(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) ... 16 more 2011-03-01 14:46:13,779 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07': 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07': 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244) ... 7 more Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6": java.io.IOException: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279) ... 15 more Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.<init>(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) ... 16 more 2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task