subject:"Re\: Optimize hive external tables with serde"

Re: Optimize hive external tables with serde

2014-10-22 Thread Sanjay Subramanian

WHERE attribute_X1='1' AND attribute_X2='1' ) atON jt.customerId = at.customerId From: ptrst To: user@hive.apache.org; Sanjay Subramanian Sent: Wednesday, October 22, 2014 1:02 AM Subject: Re: Optimize hive external tables with serde ad

Re: Optimize hive external tables with serde

2014-10-22 Thread ptrstpppp

ad 1) My files are not bigger than Block Size. To be precise all data from one day are up to 2GB gzipped. Unzipped they are ~15GB. The are split in one folder into files less then block size (block size in my case is 128MB, files are ~100MB). I can transform them to other format if you think it wil

Re: Optimize hive external tables with serde

2014-10-21 Thread Sanjay Subramanian

1. The gzip files are not splittable, so gzip itself will make the queries slower. 2. As a reference for JSON serdes , here is a example from my blog http://bigdatalatte.wordpress.com/2014/08/21/denormalizing-json-arrays-in-hive/ 3. Need to see your query first to try and optimize it 4. Even if y