Re: Parsing and moving data to ORC from HDFS

2015-04-23 Thread Kjell Tore Fossbakk
Hello. thank you for your information and tips. I will try a UDF with inspiration from get_json_object(). Thanks, Kjell Tore 22. apr. 2015 22:00 skrev "Gopal Vijayaraghavan" : > > > In production we run HDP 2.2.4. Any thought when crazy stuff like bloom > >filters might move to GA? > > I¹d say

Re: Parsing and moving data to ORC from HDFS

2015-04-22 Thread Gopal Vijayaraghavan
> In production we run HDP 2.2.4. Any thought when crazy stuff like bloom >filters might move to GA? I¹d say that it will be in the next release, considering it is already checked into hive-trunk. Bloom filters aren¹t too crazy today. They are written within the ORC file right next to the row-in

Re: Parsing and moving data to ORC from HDFS

2015-04-22 Thread Gopal Vijayaraghavan
> In production we run HDP 2.2.4. Any thought when crazy stuff like bloom >filters might move to GA? I¹d say that it will be in the next release, considering it is already checked into hive-trunk. Bloom filters aren¹t too crazy today. They are written within the ORC file right next to the row-in

Re: Parsing and moving data to ORC from HDFS

2015-04-22 Thread Kjell Tore Fossbakk
Hey Gopal. Thanks for your answers. I did some followups; On Wed, Apr 22, 2015 at 3:46 PM, Gopal Vijayaraghavan wrote: > > > I have about 100 TB of data, approximately 180 billion events, in my > >HDFS cluster. It is my raw data stored as GZIP files. At the time of > >setup this was due to "sav

Re: Parsing and moving data to ORC from HDFS

2015-04-22 Thread Gopal Vijayaraghavan
> I have about 100 TB of data, approximately 180 billion events, in my >HDFS cluster. It is my raw data stored as GZIP files. At the time of >setup this was due to "saving the data" until we figured out what to do >with it. > > After attending @t3rmin4t0r's ORC 2015 session @hadoopsummit in Brusse

Re: Parsing and moving data to ORC from HDFS

2015-04-22 Thread Kjell Tore Fossbakk
It is worth to mention it is 100TB raw size, approximately 19TB with gzip -9 (best/slowed compression) On Wed, Apr 22, 2015 at 2:50 PM, Kjell Tore Fossbakk wrote: > Hello user@hive.apache.org > > I have about 100 TB of data, approximately 180 billion events, in my HDFS > cluster. It is my raw da

Parsing and moving data to ORC from HDFS

2015-04-22 Thread Kjell Tore Fossbakk
Hello user@hive.apache.org I have about 100 TB of data, approximately 180 billion events, in my HDFS cluster. It is my raw data stored as GZIP files. At the time of setup this was due to "saving the data" until we figured out what to do with it. After attending @t3rmin4t0r's ORC 2015 session @had