WHERE
attribute_X1='1' AND attribute_X2='1' ) atON jt.customerId
= at.customerId
From: ptrst
To: user@hive.apache.org; Sanjay Subramanian
Sent: Wednesday, October 22, 2014 1:02 AM
Subject: Re: Optimize hive external tables with serde
ad
ing to do a select count(*) without where clause might make hive
> crawl.
>
>
> ------
> *From:* Ja Sam
> *To:* user@hive.apache.org
> *Sent:* Tuesday, October 21, 2014 10:37 AM
> *Subject:* Optimize hive external tables with serde
>
>
you have datewise partitions and u have 5 years of data i.e. about
1825 partitions. -- Trying to do a select count(*) without where clause might
make hive crawl.
From: Ja Sam
To: user@hive.apache.org
Sent: Tuesday, October 21, 2014 10:37 AM
Subject: Optimize hive external tables with
*Part 1: my enviroment*
I have following files uploaded to Hadoop:
1. The are plain text
2. Each line contains JSON like:
{code:[int], customerId:[string], data:{[something more here]}}
1. code are numbers from 1 to 3000,
2. customerId are total up to 4 millions, daily up to 0.5 mil