Re: Index not getting used for the queries

2014-02-04 Thread Thilina Gunarathne
; ><https://www.facebook.com/pages/Trillium-Software/109184815778307> > > <https://twitter.com/TrilliumSW> > > <http://www.linkedin.com/company/17710> > > > > *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>* > > Be Certain

Index not getting used for the queries

2014-02-03 Thread Thilina Gunarathne
Dear all, I created a compact index for a table with several hundred million records as follows. The table is partitioned by the month. The index on A and B was created successfully, but I can't see it getting used in the queries. It would be great if one of you experts can shed some light on what

Re: Using Hive generated SeqeunceFiles and RC files with Java MapReduce and PIG

2014-01-28 Thread Thilina Gunarathne
et hive to produce exactly what you want by mixing and matching serde > and output format options. > > > On Tue, Jan 28, 2014 at 8:05 PM, Thilina Gunarathne wrote: > >> Hi, >> We have a requirement to store a large data set (more than 5TB) mapped to >> a Hive tabl

Using Hive generated SeqeunceFiles and RC files with Java MapReduce and PIG

2014-01-28 Thread Thilina Gunarathne
Hi, We have a requirement to store a large data set (more than 5TB) mapped to a Hive table. This Hive table would be populated (and appended periodically) using a Hive query from another Hive table. In addition to the Hive queries, we need to be able to run Java MapReduce and preferably Pig jobs as

Re: RCFile vs SequenceFile vs text files

2014-01-27 Thread Thilina Gunarathne
in a very direct row oriented form and then > there first map reduce job buckets/partitions/columnar-izes it. > > > On Mon, Jan 27, 2014 at 2:44 PM, Thilina Gunarathne wrote: > >> Thanks Eric and Sharath for the pointers to ORC. Unfortunately ORC would >> not be an option f

Re: RCFile vs SequenceFile vs text files

2014-01-27 Thread Thilina Gunarathne
//hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/ > > > > > On Mon, Jan 27, 2014 at 1:29 PM, Eric Hanson (BIG DATA) < > eric.n.han...@microsoft.com> wrote: > >> It sounds like ORC would be best. >> >> >> >>

RCFile vs SequenceFile vs text files

2014-01-27 Thread Thilina Gunarathne
Dear all, We are trying to pick the right data storage format for the Hive table with the following requirement and would really appreciate any insights you can provide to help our decision. 1. ~50Billion records per month. ~14 columns per record and each record is ~100 bytes. Table is partitione