You had:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
Maybe try:
> rdd2 = RDD.reduceByKey((x,y) => x+y)
> rdd2.take(3)
-Abhishek-
On Aug 20, 2015, at 3:05 AM, satish chandra j wrote:
> HI All,
> I have data in RDD as mentioned below:
>
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20)
ed Yu wrote:
> Looks like you would get better response on Tachyon's mailing list:
>
> https://groups.google.com/forum/?fromgroups#!forum/tachyon-users
>
> Cheers
>
> On Fri, Aug 7, 2015 at 9:56 AM, Abhishek R. Singh
> wrote:
> Do people use Tac
Do people use Tachyon in production, or is it experimental grade still?
Regards,
Abhishek
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
I don't know if (your assertion/expectation that) workers will process things
(multiple partitions) in parallel is really valid. Or if having more partitions
than workers will necessarily help (unless you are memory bound - so partitions
is essentially helping your work size rather than executio
son for end-to-end performance. You could
> take a look at this.
>
> https://spark-summit.org/2015/events/towards-benchmarking-modern-distributed-streaming-systems/
>
> On Tue, Jul 21, 2015 at 11:57 AM, Abhishek R. Singh
> wrote:
> Is it fair to say that Storm stream
Is it fair to say that Storm stream processing is completely in memory, whereas
spark streaming would take a disk hit because of how shuffle works?
Does spark streaming try to avoid disk usage out of the box?
-Abhishek-
-
To uns
could you use a custom partitioner to preserve boundaries such that all related
tuples end up on the same partition?
On Jun 30, 2015, at 12:00 PM, RJ Nowling wrote:
> Thanks, Reynold. I still need to handle incomplete groups that fall between
> partition boundaries. So, I need a two-pass appr
I am no expert myself, but from what I understand DataFrame is grandfathering
SchemaRDD. This was done for API stability as spark sql matured out of alpha as
part of 1.3.0 release.
It is forward looking and brings (dataframe like) syntax that was not available
with the older schema RDD.
On Ap
I have created a bunch of protobuf based parquet files that I want to
read/inspect using Spark SQL. However, I am running into exceptions and not
able to proceed much further:
This succeeds successfully (probably because there is no action yet). I can
also printSchema() and count() without any