Re: Issue while inserting data in the hive table using map side join

2014-04-23 Thread Db-Blog
Hi Anirudh, Below are some links depicting the problem MIGHT BE related to data nodes. Please go thru the same and let us know if it was useful. 1. http://hansmire.tumblr.com 2. http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo Hive Experts- Kindly share your suggestions/findings on the sa

Re: large small files vs one big file in hive table

2014-05-05 Thread Db-Blog
In general it is recommended to have Millions of Large files rather than billions of small files in hadoop. Please describe your issues in detail. Say for ex. -How are you planning to consume the data stored in this partition table? - Are you looking for storage and performance optimizations? E

Re: largest table last in joins

2014-05-05 Thread Db-Blog
Hi, If we have one big table joining with a small table and MAPJOIN hint is specified on the Smaller table, still the ordering will be required? We can always forcefully set the auto convert join property to false and enable mapjoin hints. Please let me know if I am off base on this topic.

Re: Hive Table : Read or load data in Hive Table from plural subdirectories

2014-05-26 Thread Db-Blog
Implement dynamic Partitioning on daily cadence. Example: ParentDirectory/partition=Day1/Day1_n_files.gz ParentDirectory/partition=Day2/Day2_n_files.gz ParentDirectory/partition=Day30/Day30_n_files.gz And so on... You can also opt for Monthly partitions rather than daily by comparing the file

Re: Hive huge 'startup time'

2014-07-18 Thread Db-Blog
Hello everyone, Thanks for sharing valuable inputs. I am working on similar kind of task, it will be really helpful if you can share the command for increasing the heap size of hive-cli/launching process. Thanks, Saurabh Sent from my iPhone, please avoid typos. > On 18-Jul-2014, at 8:23 pm

Re: Handling blob in hive

2014-08-11 Thread Db-Blog
You can store Blob data type as string in hive. Thanks, Saurabh Sent from my iPhone, please avoid typos. > On 08-Aug-2014, at 9:10 am, Chhaya Vishwakarma > wrote: > > Hi, > > I want to store and retrieve blob in hive.Is it possible to store blob in > hive? > If it is not supported what al

Learn Java for Hadoop

2014-08-14 Thread Db-Blog
Greetings to everyone. I am a newbie in Java and seeks guidance in learning "Java specifically required for Hadoop". It will be really helpful if someone can pass on the links/topics/online-courses which can be helpful to get started on it. I come from ETL & DB- SQL background and currently wo

Re: Learn Java for Hadoop

2014-08-15 Thread Db-Blog
he group once again, and hope you'll be able to start > contributing to the open-source community real quick! :) > > Best Regards, > Nishant Kelkar > > >> On Thu, Aug 14, 2014 at 3:27 PM, Db-Blog wrote: >> Greetings to everyone. >> >> I am a newb

Tez Optimisation Parameters

2015-08-22 Thread Db-Blog
Hi, I am trying to load aggregate data from one massive table containing historical data of ONE year. Partitioning is implemented on the historical table, however the number of files huge (>#100) and are gz compressed. When i trying to load it using Tez execution engine. Can someone suggest s

Bucketing- Identify Number of Buckets

2015-09-06 Thread Db-Blog
Hi, I need to join two big tables in hive. The join key is the grain of both these tables, hence clustering and sorting on the same will provide significant performance optimisation while joining. However, i am not sure how to calculate the exact number of buckets while creating these table

Re: Bucketing- Identify Number of Buckets

2015-09-06 Thread Db-Blog
Details of Hive Version: I am using Hive -14.0 with Tez as execution engine. Thanks, Saurabh Sent from my iPhone, please avoid typos. > On 07-Sep-2015, at 1:51 am, Db-Blog wrote: > > Hi, > > I need to join two big tables in hive. The join key is the grain of both > t

Re: Hive update operation

2016-09-01 Thread Db-Blog
Hi Mich, Nice explanation! The Update operation in hive work on row by row or it is performed in batches? We also observed multiple temp files getting generated in hdfs while performing the update operation. It will be really helpful if you can share details what hive does in the background.

Re: Controlling Number of small files while inserting into Hive table

2017-06-25 Thread Db-Blog
Hi Arpan, Include the partition column in the distribute by clause of DML, it will generate only one file per day. Hope this will resolve the issue. > "insert into 'target_table' select a,b,c from x where ... distribute by > (date)" > PS: Backdated processing will generate additional file(s).