Saquib Khan, to unsubscribe you need to send a message to user-unsubscr...@hive.apache.org as described here: Mailing Lists <http://hive.apache.org/mailing_lists.html>.
Thanks. -- Lefty On Sun, Jun 25, 2017 at 7:14 PM, saquib khan <skhan...@gmail.com> wrote: > Please remove me from the user list. > > On Sun, Jun 25, 2017 at 5:10 PM Db-Blog <mpp.databa...@gmail.com> wrote: > >> Hi Arpan, >> Include the partition column in the distribute by clause of DML, it will >> generate only one file per day. Hope this will resolve the issue. >> >> "insert into 'target_table' select a,b,c from x where ... distribute by >> (date)" >> >> PS: Backdated processing will generate additional file(s). One file per >> load. >> >> Thanks, >> Saurabh >> >> Sent from my iPhone, please avoid typos. >> >> On 22-Jun-2017, at 11:30 AM, Arpan Rajani <arpan.raj...@whishworks.com> >> wrote: >> >> Hello everyone, >> >> >> I am sure many of you might have faced similar issue. >> >> We do "insert into 'target_table' select a,b,c from x where .." kind of >> queries for a nightly load. This insert goes in a new partition of the >> target_table. >> >> Now the concern is : *this inserts load hardly any data* ( I would say >> less than 128 MB per day) *but data is fregmented into1200 files*. Each >> file in a few KiloBytes. This is slowing down the performance. How can we >> make sure, this load does not generate lot of small files? >> >> I have already set : *hive.merge.mapfiles and **hive.merge.mapredfiles *to >> true in custom/advanced hive-site.xml. But still the load job loads data >> with 1200 small files. >> >> I know why 1200 is, this is the value of maximum number of >> reducers/containers available in one of the hive-sites. (I do not think its >> a good idea to do cluster wide setting to change this number, as this can >> affect other jobs which can use cluster when it has free containers) >> >> *What could be other way/settings, so that the hive insert do not take >> 1200 slots and generate lots of small files?* >> >> I also have another question which is partly contrary to above : (This is >> relatively less important) >> >> When I reload this table by creating a new table by doing select on >> target table, the newly created table does not contain too many small >> files. This newly created table's number of files drops down from 1200 to >> ±50. What could be the reason? >> >> PS: I did go through http://www.openkb.info/2014/12/how-to-control- >> file-numbers-of-hive.html >> >> >> Regards, >> Arpan >> >> The contents of this e-mail are confidential and for the exclusive use of >> the intended recipient. If you receive this e-mail in error please delete >> it from your system immediately and notify us either by e-mail or >> telephone. You should not copy, forward or otherwise disclose the content >> of the e-mail. The views expressed in this communication may not >> necessarily be the view held by WHISHWORKS. >> >>