Hi Anirudh,
Below are some links depicting the problem MIGHT BE related to data nodes.
Please go thru the same and let us know if it was useful.
1. http://hansmire.tumblr.com
2. http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
Hive Experts- Kindly share your suggestions/findings on the sa
In general it is recommended to have Millions of Large files rather than
billions of small files in hadoop.
Please describe your issues in detail. Say for ex.
-How are you planning to consume the data stored in this partition table?
- Are you looking for storage and performance optimizations? E
Hi,
If we have one big table joining with a small table and MAPJOIN hint is
specified on the Smaller table, still the ordering will be required?
We can always forcefully set the auto convert join property to false and enable
mapjoin hints.
Please let me know if I am off base on this topic.
Implement dynamic Partitioning on daily cadence.
Example:
ParentDirectory/partition=Day1/Day1_n_files.gz
ParentDirectory/partition=Day2/Day2_n_files.gz
ParentDirectory/partition=Day30/Day30_n_files.gz
And so on...
You can also opt for Monthly partitions rather than daily by comparing the file
Hello everyone,
Thanks for sharing valuable inputs. I am working on similar kind of task, it
will be really helpful if you can share the command for increasing the heap
size of hive-cli/launching process.
Thanks,
Saurabh
Sent from my iPhone, please avoid typos.
> On 18-Jul-2014, at 8:23 pm
You can store Blob data type as string in hive.
Thanks,
Saurabh
Sent from my iPhone, please avoid typos.
> On 08-Aug-2014, at 9:10 am, Chhaya Vishwakarma
> wrote:
>
> Hi,
>
> I want to store and retrieve blob in hive.Is it possible to store blob in
> hive?
> If it is not supported what al
Greetings to everyone.
I am a newbie in Java and seeks guidance in learning "Java specifically
required for Hadoop". It will be really helpful if someone can pass on the
links/topics/online-courses which can be helpful to get started on it.
I come from ETL & DB- SQL background and currently wo
he group once again, and hope you'll be able to start
> contributing to the open-source community real quick! :)
>
> Best Regards,
> Nishant Kelkar
>
>
>> On Thu, Aug 14, 2014 at 3:27 PM, Db-Blog wrote:
>> Greetings to everyone.
>>
>> I am a newb
Hi,
I am trying to load aggregate data from one massive table containing historical
data of ONE year. Partitioning is implemented on the historical table, however
the number of files huge (>#100) and are gz compressed.
When i trying to load it using Tez execution engine. Can someone suggest s
Hi,
I need to join two big tables in hive. The join key is the grain of both these
tables, hence clustering and sorting on the same will provide significant
performance optimisation while joining.
However, i am not sure how to calculate the exact number of buckets while
creating these table
Details of Hive Version:
I am using Hive -14.0 with Tez as execution engine.
Thanks,
Saurabh
Sent from my iPhone, please avoid typos.
> On 07-Sep-2015, at 1:51 am, Db-Blog wrote:
>
> Hi,
>
> I need to join two big tables in hive. The join key is the grain of both
> t
Hi Mich,
Nice explanation!
The Update operation in hive work on row by row or it is performed in batches?
We also observed multiple temp files getting generated in hdfs while performing
the update operation.
It will be really helpful if you can share details what hive does in the
background.
Hi Arpan,
Include the partition column in the distribute by clause of DML, it will
generate only one file per day. Hope this will resolve the issue.
> "insert into 'target_table' select a,b,c from x where ... distribute by
> (date)"
>
PS: Backdated processing will generate additional file(s).
13 matches
Mail list logo