Hi Avrilia,
In org.apache.hadoop.hive.ql.io.orc.WriterImpl, the block size is
determined by Math.min(1.5GB, 2 * stripeSize). Also, you can use
"orc.block.padding" in the table property to control whether the writer to
pad HDFS blocks to prevent stripes from straddling blocks. The default
value of
Hi Avrilia,
It is caused by distinct aggregations in TPC-H Q21. Because Hive adds those
distinct columns in the key columns of ReduceSinkOperators and correlation
optimizer only check exact same key columns right now, this query will not
be optimized. The jira of this issue is
https://issues.apach
I remember that textfiles are used in those scripts. With 0.12, I think ORC
should be used. Also, I think those sub-queries should be merged into a
single query. With a single query, if a reduce join is converted to a map
join, this map join can be merged to its child job. But, if this join is
eval
That is exactly the type of explanation of settings I'd like to
> see. More than just what it does, but the tradeoffs, and how things are
> applied in the real world. Have you played with the stride length at all?
>
>
> On Wed, Nov 13, 2013 at 1:13 PM, Yin Huai wrote:
>
>
Hi John,
Here is my experience on the stripe size. For a given table, when the
stripe size is increased, the size of a column in a stripe increases, which
means the ORC reader can read a column from disks in a more efficient way
because the reader can sequentially read more data (assuming the read
Congratulations, Brock and Thejas!
On Thu, Oct 24, 2013 at 6:36 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:
> Congrats Thejas and Brock!!
>
> Thanks
> Prasanth Jayachandran
>
> On Oct 24, 2013, at 3:29 PM, Vaibhav Gumashta
> wrote:
>
> Congrats Brock and Thejas!
>
>
> On T
Seems you did not set the number of
columns (RCFileOutputFormat.setColumnNumber(Configuration conf, int
columnNum)). Can you set it in your main method and see if your MR program
works?
Thanks,
Yin
On Mon, Oct 21, 2013 at 2:38 PM, Krishnan K wrote:
> Hi All,
>
> I have a scenario where I've t
Can you try to set serde properties?
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties
I have not tried it, but seems it is the right way to pass configurations
to serde class.
Thanks,
Yin
On Mon, Oct 14, 2013 at 8:20 AM, Rui Martins wrot
Hello Xinyang,
Can you attach the query plan (the output of "EXPLAIN")? I think a bad plan
caused the error.
Also, can you try hive trunk? Looks like it is a bug fixed after the
release of 0.11.
Thanks,
Yin
On Fri, Oct 11, 2013 at 9:21 AM, xinyan Yang wrote:
> Development environment,hive 0
Hello Keith,
Hive will not launch a MR job for your query because it basically reads all
columns from a table. Hive will fetch the data for you directly from the
underlying filesystem.
Thanks,
Yin
On Wed, Oct 2, 2013 at 2:48 PM, Keith Wiley wrote:
> I'm trying to create a subset of a large
eKeyTextOutputFormat
>
> Stage: Stage-0
> Fetch Operator
> limit: -1
>
> Using set hive.optimize.reducededuplication=false;
> I get 2 mapreduce jobs and the correct number of rows (24).
>
> Can I verify somehow, maybe through looking in the source code, tha
ncorrectly assumes one job is enough?
>
> I will get back with results from your suggestions ASAP; unfortunately I
> don't have the machines available until Thursday.
>
> / Sincerely Mikael
>
>*Från:* Yin Huai
> *Till:* user@hive.apache.org; Mikael Öhman
> *Skickat:*
Hello Mikael,
Seems your case is related to the bug reported in
https://issues.apache.org/jira/browse/HIVE-5149. Basically, when hive uses
a single MapReduce job to evaluate your query, "c.Symbol" and "c.catid" are
used to partitioning data, and thus, rows with the same value of "c.Symbol"
are not
Hi,
Can you also attach the query plan (the result of EXPLAIN)? It may help to
find where is the problem.
Thanks,
Yin
On Thu, Sep 12, 2013 at 1:00 PM, Chuck Hardin wrote:
> Please bear with me, because this is a pretty large query.
>
> TL;DR: I'm doing a UNION ALL on a bunch of subqueries.
> set hive.auto.convert.join.noconditionaltask=false;
>
>
> makes it work (though it does way more map reduce jobs than it should).
> When I get some time I will test against the latest trunk.
>
> Thanks,
> Nate
>
>
> On Sep 3, 2013, at 6:09 PM, Yin Huai wrote:
>
Based on the log, it may be also related to
https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a not
very optimized way), can you try "set
hive.auto.convert.join.noconditionaltask=false;" ? If you still get the
error, give "set hive.auto.convert.join=false;" a try (it will turn of
forgot to add in my last reply To generate correct results, you can
set hive.optimize.reducededuplication to false to turn off
ReduceSinkDeDuplication
On Sun, Aug 25, 2013 at 9:35 PM, Yin Huai wrote:
> Created a jira https://issues.apache.org/jira/browse/HIVE-5149
>
>
> On
Created a jira https://issues.apache.org/jira/browse/HIVE-5149
On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai wrote:
> Seems ReduceSinkDeDuplication picked the wrong partitioning columns.
>
>
> On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP wrote:
>
>> I think the problem lies
Seems ReduceSinkDeDuplication picked the wrong partitioning columns.
On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP wrote:
> I think the problem lies with in the group by operation. For this
> optimization to work the group bys partitioning should be on the column 1
> only.
>
> It wont effect th
If the join is a reduce side join,
https://issues.apache.org/jira/browse/HIVE-2206 will optimize this query
and generate a single MR job. The optimizer introduced by HIVE-2206 is in
trunk. Currently, it only handles the same column(s).
If the join is a MapJoin, hive 0.11 can generate a single MR j
I just uploaded a patch to https://issues.apache.org/jira/browse/HIVE-4968.
You can try it and see if the problem has been resolved for your query.
On Wed, Jul 31, 2013 at 11:21 AM, Yin Huai wrote:
> Seems it is another problem.
> Can you try
>
>
> SELECT *
> FROM
my hadoop version is 1.0.1. I use default hive configuration.
>
>
> --
> wzc1...@gmail.com
> 已使用 Sparrow <http://www.sparrowmailapp.com/?sig>
>
> 已使用 Sparrow <http://www.sparrowmailapp.com/?sig>
>
> 在 2013年7月29日星期一,下午1:08,Yin Huai 写道:
>
> Hi,
>
> Can yo
Hi,
Can you also post the output of EXPLAIN? The execution plan may be helpful
to locate the problem.
Thanks,
Yin
On Sun, Jul 28, 2013 at 8:06 PM, wrote:
> What I mean by "not pass the testcase in HIVE-4650" is that I compile the
> trunk code and run the query in HIVE-4650:
> SELECT *
> FROM
ces, but is cpu efficient.
> You tests aligns with our internal tests long time ago.
>
> On Tue, Mar 6, 2012 at 8:58 AM, Yin Huai wrote:
> > Hi,
> >
> > Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in
> > general?
> >
> > Let me make my
Hi,
Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in
general?
Let me make my question more specific.
I generated two tables from the table lineitem of TPC-H
using ColumnarSerDe and LazyBinaryColumnarSerDe as follows...
CREATE TABLE lineitem_rcfile_lazybinary
ROW FORMAT SERDE
I have some experiences using RCFile with new MapReduce API from the
project HCatalog ( http://incubator.apache.org/hcatalog/ ).
For the output part,
In your main, you need ...
> job.setOutputFormatClass(RCFileMapReduceOutputFormat.class);
>
> RCFileMapReduceOutputFormat.setColumnNumber(job.getCo
26 matches
Mail list logo