Re: RDD Partitions on HDFS file in Hive on Spark Query

2016-11-22 Thread yeshwanth kumar
ata. This is really is not a spark thing, but a hadoop input format discussion HTH? On Wed, Nov 23, 2016 at 10:00 AM, yeshwanth kumar wrote: > Hi Ayan, > > we have default rack topology. > > > > -Yeshwanth > Can you Imagine what I would do if I could do all I can - A

Re: RDD Partitions on HDFS file in Hive on Spark Query

2016-11-22 Thread yeshwanth kumar
5 is in a different rack than 227 or > 228? What does your topology file says? > On 22 Nov 2016 10:14, "yeshwanth kumar" wrote: > >> Thanks for your reply, >> >> i can definitely change the underlying compression format. >> but i am trying to understand the Loc

Re: RDD Partitions on HDFS file in Hive on Spark Query

2016-11-21 Thread yeshwanth kumar
. Another alternative would be bzip2 (but slower in general) or > Lzo (usually it is not included by default in many distributions). > > On 21 Nov 2016, at 23:17, yeshwanth kumar wrote: > > Hi, > > we are running Hive on Spark, we have an external table over snappy > co

RDD Partitions on HDFS file in Hive on Spark Query

2016-11-21 Thread yeshwanth kumar
Hi, we are running Hive on Spark, we have an external table over snappy compressed csv file of size 917.4 M HDFS block size is set to 256 MB as per my Understanding, if i run a query over that external table , it should launch 4 tasks. one for each block. but i am seeing one executor and one task

Re: How to generate a sequential key in rdd across executors

2016-08-03 Thread yeshwanth kumar
the record? > > > On Jul 23, 2016, at 7:53 PM, yeshwanth kumar > wrote: > > > > Hi, > > > > i am doing bulk load to hbase using spark, > > in which i need to generate a sequential key for each record, > > the key should be sequential across all

How to generate a sequential key in rdd across executors

2016-07-23 Thread yeshwanth kumar
Hi, i am doing bulk load to hbase using spark, in which i need to generate a sequential key for each record, the key should be sequential across all the executors. i tried zipwith index, didn't worked because zipwith index gives index per executor not across all executors. looking for some sugge

Spark HBase bulk load using hfile format

2016-07-13 Thread yeshwanth kumar
Hi i am doing bulk load into HBase as HFileFormat, by using saveAsNewAPIHadoopFile when i try to write i am getting an exception java.io.IOException: Added a key not lexically larger than previous. following is the code snippet case class HBaseRow(rowKey: ImmutableBytesWritable, kv: KeyValue)