unsubscribe

2024-02-24 Thread Ameet Kini

Re: unit testing with spark

2014-02-27 Thread Ameet Kini
serially, my tests are running fine with a shared spark context. That begs the question. Is anyone running suites in parallel with a shared spark context? If so, I'd love to learn how as it would cut down my test time considerably. Thanks, Ameet On Wed, Feb 19, 2014 at 11:26 AM, Ameet Kini

sort order after reduceByKey / groupByKey

2014-03-20 Thread Ameet Kini
val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some function) I see that rdd2's partitions are not internally sorted. Can someone confirm that this is expected behavior? And if so, the only way to get partitions internally sorted is to follow it with something like this val rdd2 = rdd.par

Re: sort order after reduceByKey / groupByKey

2014-03-20 Thread Ameet Kini
www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Thu, Mar 20, 2014 at 3:20 PM, Ameet Kini wrote: > >> >> val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some function) >> >> I see that rdd2's partitions are not

shuffle memory requirements

2014-04-09 Thread Ameet Kini
val hrdd = sc.hadoopRDD(..) val res = hrdd.partitionBy(myCustomPartitioner).reduceKey(..).mapPartitionsWithIndex( some code to save those partitions ) I'm getting OutOfMemoryErrors on the read side of partitionBy shuffle. My custom partitioner generates over 20,000 partitions, so there are 20,000

Re: shuffle memory requirements

2014-04-11 Thread Ameet Kini
or.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Thanks, Ameet On Wed, Apr 9, 2014 at 10:48 PM, Ameet Kini wrote: >

Re: shuffle memory requirements

2014-04-11 Thread Ameet Kini
A typo - I mean't section 2.1.2.5 "ulimit and nproc" of https://hbase.apache.org/book.html Ameet On Fri, Apr 11, 2014 at 10:32 AM, Ameet Kini wrote: > > Turns out that my ulimit settings were too low. I bumped up and the job > successfully completes. Here's wha

Re: question on setup() and cleanup() methods for map() and reduce()

2014-04-28 Thread Ameet Kini
I don't think there is a setup() or cleanup() in Spark but you can usually achieve the same using mapPartitions and having the "setup" code at the top of the mapPartitions and "cleanup" at the end. The reason why this usually works is that in Hadoop map/reduce, each map task runs over an input spl

customized comparator in groupByKey

2014-05-06 Thread Ameet Kini
I'd like to override the logic of comparing keys for equality in groupByKey. Kinda like how combineByKey allows you to pass in the combining logic for "values", I'd like to do the same for keys. My code looks like this: val res = rdd.groupBy(myPartitioner) Here, rdd is of type RDD[(MyKey, MyValue)