Re: Optimize hive external tables with serde

2014-10-22 Thread Sanjay Subramanian
WHERE            attribute_X1='1'   AND         attribute_X2='1'  ) atON      jt.customerId = at.customerId From: ptrst To: user@hive.apache.org; Sanjay Subramanian Sent: Wednesday, October 22, 2014 1:02 AM Subject: Re: Optimize hive external tables with serde ad

Re: Optimize hive external tables with serde

2014-10-22 Thread ptrstpppp
ing to do a select count(*) without where clause might make hive > crawl. > > > ------ > *From:* Ja Sam > *To:* user@hive.apache.org > *Sent:* Tuesday, October 21, 2014 10:37 AM > *Subject:* Optimize hive external tables with serde > >

Re: Optimize hive external tables with serde

2014-10-21 Thread Sanjay Subramanian
you have datewise partitions and u have 5 years of data i.e. about  1825 partitions.   -- Trying to do a select count(*) without where clause might make hive crawl. From: Ja Sam To: user@hive.apache.org Sent: Tuesday, October 21, 2014 10:37 AM Subject: Optimize hive external tables with

Optimize hive external tables with serde

2014-10-21 Thread Ja Sam
*Part 1: my enviroment* I have following files uploaded to Hadoop: 1. The are plain text 2. Each line contains JSON like: {code:[int], customerId:[string], data:{[something more here]}} 1. code are numbers from 1 to 3000, 2. customerId are total up to 4 millions, daily up to 0.5 mil