Re: ORC file across multiple HDFS blocks

2015-04-28 Thread Demai Ni
rint. >>> >>> This email may contain confidential and privileged material for the sole >>> use of the intended recipient. Any review, use, distribution or disclosure >>> by others is strictly prohibited. If you are not the intended recipient (or >>> authorized to receive for the recipient), please contact the sender by >>> reply email and delete all copies of this message. >>> >>> Please click here >>> <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for >>> Company Registration Information. >>> >>> >>> >>> >>> From: Alan Gates >>> Reply-To: "user@hive.apache.org" >>> Date: Monday, April 27, 2015 at 2:05 PM >>> To: "user@hive.apache.org" >>> Subject: Re: ORC file across multiple HDFS blocks >>> >>> to cross blocks and hence n >>> >> >> >

Re: ORC file across multiple HDFS blocks

2015-04-28 Thread Owen O'Malley
sole >> use of the intended recipient. Any review, use, distribution or disclosure >> by others is strictly prohibited. If you are not the intended recipient (or >> authorized to receive for the recipient), please contact the sender by >> reply email and delete all copies of this message. >> >> Please click here >> <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for >> Company Registration Information. >> >> >> >> >> From: Alan Gates >> Reply-To: "user@hive.apache.org" >> Date: Monday, April 27, 2015 at 2:05 PM >> To: "user@hive.apache.org" >> Subject: Re: ORC file across multiple HDFS blocks >> >> to cross blocks and hence n >> > >

Re: ORC file across multiple HDFS blocks

2015-04-28 Thread Demai Ni
e sender by > reply email and delete all copies of this message. > > Please click here > <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for > Company Registration Information. > > > > > From: Alan Gates > Reply-To: "user@hive.apache.org" > Date: Monday, April 27, 2015 at 2:05 PM > To: "user@hive.apache.org" > Subject: Re: ORC file across multiple HDFS blocks > > to cross blocks and hence n >

Re: ORC file across multiple HDFS blocks

2015-04-28 Thread Grant Overby (groverby)
an Gates mailto:alanfga...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Date: Monday, April 27, 2015 at 2:05 PM To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apach

Re: ORC file across multiple HDFS blocks

2015-04-27 Thread Alan Gates
No, you don't want to be designing ORC files to not cross block boundaries. Engines in Hadoop (MapReduce, Tez, etc.) are all built to handle the fact that files tend to cross blocks and hence nodes. There is value in lining up stripe size and HDFS block size so that your stripes don't straddl

ORC file across multiple HDFS blocks

2015-04-24 Thread Demai Ni
hi, Guys, I am working on directly READ ORC files from HDFS cluster, and hopefully to leverage HDFS local shortcuit READ ( http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html) as much as possible According to ORC design, each ORC file usually contain