red here, although there is no definitive guidance
>> as far as I know:
>>
>> https://cwiki.apache.org/confluence/display/Hive/Unit+Testin
>> g+Hive+SQL#UnitTestingHiveSQL-Modularisation
>>
>> On 15 December 2016 at 17:08, Saumitra Shahapure <
>> s
Hello,
We are running and maintaining quite big and complex Hive SELECT query
right now. It's basically a single SELECT query which performs JOIN of
about ten other SELECT query outputs.
A simplest way to refactor that we can think of is to break this query down
into multiple views and then join
Hello,
I am using using Hive 0.13.1 in EMR and trying to create Hive table on top
of our custom file system (which is a thin wrapper on top of S3) and I am
getting error while accessing the data in the table. Stack trace and
command history below.
I had a doubt that CombineFileInputFormat is tryi
would generate quite similar
execution plans for this query, what exactly is making difference. My
question is from the point of understanding both the systems,
Answering your questions inline,
--
Regards,
Saumitra Shahapure
On Fri, Jan 23, 2015 at 5:01 AM, Gopal V wrote:
> On 1/22/15, 3:03
Hello,
We were comparing performance of some of our production hive queries
between Hive and Spark. We compared Hive(0.13)+hadoop (1.2.1) against both
Spark 0.9 and 1.1. We could see that the performance gains have been good
in Spark.
We tried a very simple query,
select count(*) from T where col
ce job to create data hierarchies. In our case, the
hierarchy is already created.
--
Regards,
Saumitra Shahapure
Hi Rahman,
These are few lines from hadoop fsck / -blocks -files -locations
/mnt/hadoop/hive/warehouse/user.db/table1/000255_0 44323326 bytes, 1
block(s): OK
0. blk_-7919979022650423857_446500 len=44323326 repl=3 [ip1:50010,
ip2:50010, ip3:50010]
/mnt/hadoop/hive/warehouse/user.db/table1/000256
creating partition on dt field and
creating Hive index/view on *generated_by *field.
If anyone has insights around these, they would be really helpful.
Meanwhile we will try to solve our problem by buckets/indices.
--
Regards,
Saumitra Shahapure
On Tue, Mar 25, 2014 at 7:44 PM, Prasan Samtani
over-partitioning our table. Over-partitioning is giving
us benefit that query on 1-2 partitions is too fast. It's side-effect is
that If we try to query large number of partitions, query is too slow. Is
there a way to get good performance in both of the scenarios?
--
Regards,
Saumitra Shahapure
1000s of partitions to Hive. So queries on
analyze on one month are slowed down.
Is there any way to get rid of partitions, and at the same time maintain
good performance of queries which are fired on specific day and
*generated_by*?
--
Regards,
Saumitra Shahapure
10 matches
Mail list logo