Re: Hive MAP/REDUCE/TRANSFORM output creates many small files

2011-08-16 Thread Dave Brondsema
T tkey, tvalue > > In my case, 32 reducers are launched, and dest1 always ends up with 32 > files. If I set hive.exec.reducers.max=1, it does launch only 1 reducer > (instead of 32), but I still get 32 teeny output files. Setting the > various "hive.merge.*” options does not see

Re: cannot start the transform script. reason : "argument list too long"

2011-03-02 Thread Dave Brondsema
{"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"x","_col1":"m1","_col2":"20110210_02","_col3":"{'m07': >> 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0} >> >> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265) >> >> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467) >> >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415) >> >> at org.apache.hadoop.mapred.Child$4.run(Child.java:217) >> >> at java.security.AccessController.doPrivileged(Native Method) >> >> at javax.security.auth.Subject.doAs(Subject.java:396) >> >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) >> >> at org.apache.hadoop.mapred.Child.main(Child.java:211) >> >> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime >> Error while processing row >> (tag=0) >> {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"x","_col1":"m1","_col2":"20110210_02","_col3":"{'m07': >> 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0} >> >> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253) >> >> ... 7 more >> >> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot >> initialize ScriptOperator >> >> at >> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320) >> >> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) >> >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) >> >> at >> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) >> >> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) >> >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) >> >> at >> org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) >> >> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) >> >> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244) >> >> ... 7 more >> >> Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6": >> java.io.IOException: error=7, Argument list too long >> >> at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) >> >> at >> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279) >> >> ... 15 more >> >> Caused by: java.io.IOException: java.io.IOException: error=7, Argument >> list too long >> >> at java.lang.UNIXProcess.(UNIXProcess.java:148) >> >> at java.lang.ProcessImpl.start(ProcessImpl.java:65) >> >> at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) >> >> ... 16 more >> >> 2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning >> cleanup for the task >> >> >> >> >> >> >> > > -- Dave Brondsema Software Engineer Geeknet www.geek.net

Re: Stopping Hive Metastore Service

2011-01-27 Thread Dave Brondsema
@01C6BF9D.4CF84EB0]** > > 4 Park Plaza, suite 1500, Irvine, CA 92614 > > > > *we deliver specific audiences to advertisers* > > > > Visit* www.specificmedia.com*. > > > > > -- Dave Brondsema Software Engineer Geeknet www.geek.net

Re: hive not stable,weird exception

2010-11-29 Thread Dave Brondsema
eal with it as a failed job, but > hive can't return the correct result. > > 2010-11-29 > -- > shangan > -- Dave Brondsema Software Engineer Geeknet www.geek.net

Re: Hive produces very small files despite hive.merge...=true settings

2010-11-19 Thread Dave Brondsema
InputFormat, which is used for the new merge job. Someone > reported previously merge was not successful because of this. If that's the > case, you can turn off CombineHiveInputFormat and use the old > HiveInputFormat (though slower) by setting hive.mergejob.maponly=false. > >>> > >>> Ning > >>> On Nov 17, 2010, at 6:00 PM, Leo Alekseyev wrote: > >>> > >>>> I have jobs that sample (or generate) a small amount of data from a > >>>> large table. At the end, I get e.g. about 3000 or more files of 1kb > >>>> or so. This becomes a nuisance. How can I make Hive do another pass > >>>> to merge the output? I have the following settings: > >>>> > >>>> hive.merge.mapfiles=true > >>>> hive.merge.mapredfiles=true > >>>> hive.merge.size.per.task=25600 > >>>> hive.merge.size.smallfiles.avgsize=1600 > >>>> > >>>> After setting hive.merge* to true, Hive started indicating "Total > >>>> MapReduce jobs = 2". However, after generating the > >>>> lots-of-small-files table, Hive says: > >>>> Ended Job = job_201011021934_1344 > >>>> Ended Job = 781771542, job is filtered out (removed at runtime). > >>>> > >>>> Is there a way to force the merge, or am I missing something? > >>>> --Leo > >>> > >>> > > > > > -- Dave Brondsema Software Engineer Geeknet www.geek.net

Re: Merging small files with dynamic partitions

2010-11-12 Thread Dave Brondsema
I copied Hadoop19Shims' implementation of getCombineFileInputFormat (HIVE-1121) into Hadoop18Shims and it worked, if anyone is interested. And hopefully we can upgrade our Hadoop version soon :) On Fri, Nov 12, 2010 at 12:44 PM, Dave Brondsema wrote: > It seems that I can't use thi

Re: Merging small files with dynamic partitions

2010-11-12 Thread Dave Brondsema
on of getCombineFileInputFormat into Hadoop18Shims? On Wed, Nov 10, 2010 at 4:31 PM, yongqiang he wrote: > I think the problem was solved in hive trunk. You can just try hive trunk. > > On Wed, Nov 10, 2010 at 10:05 AM, Dave Brondsema > wrote: > > Hi, has there been any resolution to thi

Re: Merging small files with dynamic partitions

2010-11-10 Thread Dave Brondsema
> >> raised hive.merge.smallfiles.avgsize. I'm wondering if the filtering > >> at runtime is causing the merge process to be skipped. Attached are > >> the hive output and log files. > >> > >> > >> Thanks, > >> Sammy > >> > > > > > > > > -- > Chief Architect, BrightEdge > email: s...@brightedge.com | mobile: 650.539.4867 | fax: > 650.521.9678 | address: 1850 Gateway Dr Suite 400, San Mateo, CA > 94404 > -- Dave Brondsema Software Engineer Geeknet www.geek.net

USING .. AS column names

2010-10-13 Thread Dave Brondsema
'foo', 'bar', 'baz') USING '/bin/cat' AS (x, y, z) limit 1 select * from test2 > ['foo', 'bar', 'baz'] I'd recommend that Hive either support column reordering with the AS statement, or make it completely optional (although this may be backwards-incompatible with the docs at the link above). -- Dave Brondsema Software Engineer Geeknet www.geek.net

Re: boolean types thru a transform script

2010-10-13 Thread Dave Brondsema
; When I log the 'folder' value from inside reduce.py, it shows: > > 2010-10-12 15:32:10,914 - dstat - INFO - reduce to stdout, h[folder]: > > i.e., an empty string. But when the INSERT executes, it seems to treat the > value as TRUE (or string 'true')? > > > select folder from dl_day > ['true'] > ['true'] > ['true'] > ['true'] > ... > > How can I preserve the FALSE value thru the transform script? > > Thanks, > -L > -- Dave Brondsema Software Engineer Geeknet www.geek.net