RE: optimize joins in hive 1.2.1

2016-01-18 Thread Mich Talebzadeh
sage- From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: 18 January 2016 08:37 To: user@hive.apache.org Subject: Re: optimize joins in hive 1.2.1 Do you have some data model? Basically modern technologies, such as Hive, but also relational database, suggest to prejoin tables and

Re: optimize joins in hive 1.2.1

2016-01-18 Thread Jörn Franke
Do you have some data model? Basically modern technologies, such as Hive, but also relational database, suggest to prejoin tables and working on big flat tables. The reason is that they are distributed systems and you should avoid transferring for each query a lot of data between nodes. Hence,

Re: optimize joins in hive 1.2.1

2016-01-18 Thread Richa Sharma
Hi Divya Below are some quick tips that always helps: 1. Partition your data set and use partition keys while selecting data to reduce data set. 2.Also, if both data sets can be joined by the same partition key then use it in the join. 3. If one table being joined is a small table then you can

optimize joins in hive 1.2.1

2016-01-18 Thread Divya Gehlot
Hi, Need tips/guidance to optimize(increase perfomance) billion data rows joins in hive . Any help would be appreciated. Thanks, Divya