Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread Abhishek R. Singh
You had: > RDD.reduceByKey((x,y) => x+y) > RDD.take(3) Maybe try: > rdd2 = RDD.reduceByKey((x,y) => x+y) > rdd2.take(3) -Abhishek- On Aug 20, 2015, at 3:05 AM, satish chandra j wrote: > HI All, > I have data in RDD as mentioned below: > > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20)

Re: tachyon

2015-08-07 Thread Abhishek R. Singh
ed Yu wrote: > Looks like you would get better response on Tachyon's mailing list: > > https://groups.google.com/forum/?fromgroups#!forum/tachyon-users > > Cheers > > On Fri, Aug 7, 2015 at 9:56 AM, Abhishek R. Singh > wrote: > Do people use Tac

tachyon

2015-08-07 Thread Abhishek R. Singh
Do people use Tachyon in production, or is it experimental grade still? Regards, Abhishek - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: How to increase parallelism of a Spark cluster?

2015-08-02 Thread Abhishek R. Singh
I don't know if (your assertion/expectation that) workers will process things (multiple partitions) in parallel is really valid. Or if having more partitions than workers will necessarily help (unless you are memory bound - so partitions is essentially helping your work size rather than executio

Re: spark streaming disk hit

2015-07-21 Thread Abhishek R. Singh
son for end-to-end performance. You could > take a look at this. > > https://spark-summit.org/2015/events/towards-benchmarking-modern-distributed-streaming-systems/ > > On Tue, Jul 21, 2015 at 11:57 AM, Abhishek R. Singh > wrote: > Is it fair to say that Storm stream

spark streaming disk hit

2015-07-21 Thread Abhishek R. Singh
Is it fair to say that Storm stream processing is completely in memory, whereas spark streaming would take a disk hit because of how shuffle works? Does spark streaming try to avoid disk usage out of the box? -Abhishek- - To uns

Re: Grouping runs of elements in a RDD

2015-06-30 Thread Abhishek R. Singh
could you use a custom partitioner to preserve boundaries such that all related tuples end up on the same partition? On Jun 30, 2015, at 12:00 PM, RJ Nowling wrote: > Thanks, Reynold. I still need to handle incomplete groups that fall between > partition boundaries. So, I need a two-pass appr

Re: Dataframes Question

2015-04-18 Thread Abhishek R. Singh
I am no expert myself, but from what I understand DataFrame is grandfathering SchemaRDD. This was done for API stability as spark sql matured out of alpha as part of 1.3.0 release. It is forward looking and brings (dataframe like) syntax that was not available with the older schema RDD. On Ap

spark sql error with proto/parquet

2015-04-18 Thread Abhishek R. Singh
I have created a bunch of protobuf based parquet files that I want to read/inspect using Spark SQL. However, I am running into exceptions and not able to proceed much further: This succeeds successfully (probably because there is no action yet). I can also printSchema() and count() without any