T tkey, tvalue
>
> In my case, 32 reducers are launched, and dest1 always ends up with 32
> files. If I set hive.exec.reducers.max=1, it does launch only 1 reducer
> (instead of 32), but I still get 32 teeny output files. Setting the
> various "hive.merge.*” options does not see
{"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"x","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
>> 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
>>
>> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265)
>>
>> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
>>
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
>>
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>>
>> at java.security.AccessController.doPrivileged(Native Method)
>>
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
>>
>> at org.apache.hadoop.mapred.Child.main(Child.java:211)
>>
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
>> Error while processing row
>> (tag=0)
>> {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"x","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
>> 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
>>
>> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
>>
>> ... 7 more
>>
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
>> initialize ScriptOperator
>>
>> at
>> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
>>
>> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
>>
>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
>>
>> at
>> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>>
>> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
>>
>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
>>
>> at
>> org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
>>
>> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
>>
>> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
>>
>> ... 7 more
>>
>> Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
>> java.io.IOException: error=7, Argument list too long
>>
>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
>>
>> at
>> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
>>
>> ... 15 more
>>
>> Caused by: java.io.IOException: java.io.IOException: error=7, Argument
>> list too long
>>
>> at java.lang.UNIXProcess.(UNIXProcess.java:148)
>>
>> at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>>
>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
>>
>> ... 16 more
>>
>> 2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning
>> cleanup for the task
>>
>>
>>
>>
>>
>>
>>
>
>
--
Dave Brondsema
Software Engineer
Geeknet
www.geek.net
@01C6BF9D.4CF84EB0]**
>
> 4 Park Plaza, suite 1500, Irvine, CA 92614
>
>
>
> *we deliver specific audiences to advertisers*
>
>
>
> Visit* www.specificmedia.com*.
>
>
>
>
>
--
Dave Brondsema
Software Engineer
Geeknet
www.geek.net
eal with it as a failed job, but
> hive can't return the correct result.
>
> 2010-11-29
> --
> shangan
>
--
Dave Brondsema
Software Engineer
Geeknet
www.geek.net
InputFormat, which is used for the new merge job. Someone
> reported previously merge was not successful because of this. If that's the
> case, you can turn off CombineHiveInputFormat and use the old
> HiveInputFormat (though slower) by setting hive.mergejob.maponly=false.
> >>>
> >>> Ning
> >>> On Nov 17, 2010, at 6:00 PM, Leo Alekseyev wrote:
> >>>
> >>>> I have jobs that sample (or generate) a small amount of data from a
> >>>> large table. At the end, I get e.g. about 3000 or more files of 1kb
> >>>> or so. This becomes a nuisance. How can I make Hive do another pass
> >>>> to merge the output? I have the following settings:
> >>>>
> >>>> hive.merge.mapfiles=true
> >>>> hive.merge.mapredfiles=true
> >>>> hive.merge.size.per.task=25600
> >>>> hive.merge.size.smallfiles.avgsize=1600
> >>>>
> >>>> After setting hive.merge* to true, Hive started indicating "Total
> >>>> MapReduce jobs = 2". However, after generating the
> >>>> lots-of-small-files table, Hive says:
> >>>> Ended Job = job_201011021934_1344
> >>>> Ended Job = 781771542, job is filtered out (removed at runtime).
> >>>>
> >>>> Is there a way to force the merge, or am I missing something?
> >>>> --Leo
> >>>
> >>>
> >
> >
>
--
Dave Brondsema
Software Engineer
Geeknet
www.geek.net
I copied Hadoop19Shims' implementation of getCombineFileInputFormat
(HIVE-1121) into Hadoop18Shims and it worked, if anyone is interested.
And hopefully we can upgrade our Hadoop version soon :)
On Fri, Nov 12, 2010 at 12:44 PM, Dave Brondsema wrote:
> It seems that I can't use thi
on of getCombineFileInputFormat into
Hadoop18Shims?
On Wed, Nov 10, 2010 at 4:31 PM, yongqiang he wrote:
> I think the problem was solved in hive trunk. You can just try hive trunk.
>
> On Wed, Nov 10, 2010 at 10:05 AM, Dave Brondsema
> wrote:
> > Hi, has there been any resolution to thi
> >> raised hive.merge.smallfiles.avgsize. I'm wondering if the filtering
> >> at runtime is causing the merge process to be skipped. Attached are
> >> the hive output and log files.
> >>
> >>
> >> Thanks,
> >> Sammy
> >>
> >
> >
>
>
>
> --
> Chief Architect, BrightEdge
> email: s...@brightedge.com | mobile: 650.539.4867 | fax:
> 650.521.9678 | address: 1850 Gateway Dr Suite 400, San Mateo, CA
> 94404
>
--
Dave Brondsema
Software Engineer
Geeknet
www.geek.net
'foo', 'bar', 'baz')
USING '/bin/cat' AS (x, y, z) limit 1
select * from test2
> ['foo', 'bar', 'baz']
I'd recommend that Hive either support column reordering with the AS
statement, or make it completely optional (although this may be
backwards-incompatible with the docs at the link above).
--
Dave Brondsema
Software Engineer
Geeknet
www.geek.net
; When I log the 'folder' value from inside reduce.py, it shows:
>
> 2010-10-12 15:32:10,914 - dstat - INFO - reduce to stdout, h[folder]:
>
> i.e., an empty string. But when the INSERT executes, it seems to treat the
> value as TRUE (or string 'true')?
>
> > select folder from dl_day
> ['true']
> ['true']
> ['true']
> ['true']
> ...
>
> How can I preserve the FALSE value thru the transform script?
>
> Thanks,
> -L
>
--
Dave Brondsema
Software Engineer
Geeknet
www.geek.net
10 matches
Mail list logo