;
><https://www.facebook.com/pages/Trillium-Software/109184815778307>
>
> <https://twitter.com/TrilliumSW>
>
> <http://www.linkedin.com/company/17710>
>
>
>
> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>*
>
> Be Certain
Dear all,
I created a compact index for a table with several hundred million records
as follows. The table is partitioned by the month. The index on A and B was
created successfully, but I can't see it getting used in the queries. It
would be great if one of you experts can shed some light on what
et hive to produce exactly what you want by mixing and matching serde
> and output format options.
>
>
> On Tue, Jan 28, 2014 at 8:05 PM, Thilina Gunarathne wrote:
>
>> Hi,
>> We have a requirement to store a large data set (more than 5TB) mapped to
>> a Hive tabl
Hi,
We have a requirement to store a large data set (more than 5TB) mapped to a
Hive table. This Hive table would be populated (and appended periodically)
using a Hive query from another Hive table. In addition to the Hive
queries, we need to be able to run Java MapReduce and preferably Pig jobs
as
in a very direct row oriented form and then
> there first map reduce job buckets/partitions/columnar-izes it.
>
>
> On Mon, Jan 27, 2014 at 2:44 PM, Thilina Gunarathne wrote:
>
>> Thanks Eric and Sharath for the pointers to ORC. Unfortunately ORC would
>> not be an option f
//hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/
>
>
>
>
> On Mon, Jan 27, 2014 at 1:29 PM, Eric Hanson (BIG DATA) <
> eric.n.han...@microsoft.com> wrote:
>
>> It sounds like ORC would be best.
>>
>>
>>
>>
Dear all,
We are trying to pick the right data storage format for the Hive table with
the following requirement and would really appreciate any insights you can
provide to help our decision.
1. ~50Billion records per month. ~14 columns per record and each record is
~100 bytes. Table is partitione