first time I'm trying this and seemingly doing it incorrectly.
Can anyone show me how to correct this?
Thank you! =:)
nmv
--
PRISMALYTICS Sincerely yours,
Team PRISMALYTICS
PRISMALYTICS, LLC. <http://www.prismalytics
,
if the data was spread across partitions. The result is a tuple of sum
and count for each key.
Use mapValues to keep your partitioning by keys intact and minimize a
full shuffle for downstream keyed operations. It just calculates the
avg for each key.
From: Todd Nist
Date: Tuesday, Ap
Hello Friends:
I generated a Pair RDD with K/V pairs, like so:
>>>
>>> rdd1.take(10) # Show a small sample.
[(u'2013-10-09', 7.60117302052786),
(u'2013-10-10', 9.322709163346612),
(u'2013-10-10', 28.264462809917358),
(u'2013-10-07', 9.664429530201343),
(u'2013-10-07', 12.461538461538463),
aused when crossing Python
versions.
Thank you Marcelo!
On March 3, 2015 7:39:03 PM Marcelo Vanzin wrote:
Weird python errors like this generally mean you have different
versions of python in the nodes of your cluster. Can you check that?
On Tue, Mar 3, 2015 at 4:21 PM, subscripti...@prism
Hi Friends:
We noticed the following in 'pyspark' happens when running in
distributed Standalone Mode (MASTER=spark://vps00:7077),
but not in Local Mode (MASTER=local[n]).
See the following, particularly what is highlighted in *Red* (again the
problem only happens in Standalone Mode).
Any id