No the size is closer to 10GB, the difference between the tables is only around 2000 bytes. I will try to get exact numbers for you soon, I am traveling right now, but I'll get you better data to work with shortly.
Thanks! On Mon, Feb 3, 2014 at 12:22 AM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Hi John > > Number of mappers is equal to the number of splits generated. Following > are the factors that go into split generation > 1) HDFS block size > 2) Max split size > > a split is cut when > 1) the cumulative size of all adjacent stripes are greater than HDFS block > size > 2) the cumulative size of all adjacent stripes are greater than max split > size > > HDFS block size for ORC files will be min(1.5GB, 2*stripe_size) in the > current version of hive (and probably hive 0.12 too). In older versions, > HDFS block size = min(2GB, 2*stripe_size). > > The other important thing to note is ORC split is generated only when > HiveInputFormat is used. By default hive uses CombineHiveInputFormat which > uses a different strategy to generate splits. In CombineHiveInputFormat, > many small files are combined together to form a large logical split. > > In any case for the size you had mentioned (2000 bytes) there should be > only one mapper. Can you provide the value for following configs so that we > can understand it better? > > 1) hive.input.format > 2) hive.min.split.size > 3) hive.max.split.size > 4) total size on disk for the table > > Thanks > Prasanth Jayachandran > > On Feb 2, 2014, at 5:25 PM, John Omernik <j...@omernik.com> wrote: > > > I have two clusters, but small dev clusters, and I loaded the same > dataset into both of them. The data size on disk is within 2000 Bytes. > Both are ORC, one is Hive 11 and one is Hive 12. One is allocating about 8 > more mappers to the exact same query. I am just curious what settings would > change that. I checked through all my setting, but can't see what would > cause the discrepancy. Is this an ORC v11 vs v12 thing? > > > > I'd be curious on the thoughts of the group. > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >