Help with datetime comparison in SparkSQL statement ...

2015-05-05 Thread subscripti...@prismalytics.io
first time I'm trying this and seemingly doing it incorrectly. Can anyone show me how to correct this? Thank you! =:) nmv -- PRISMALYTICS Sincerely yours, Team PRISMALYTICS PRISMALYTICS, LLC. <http://www.prismalytics

Re: Calculating the averages for each KEY in a Pairwise (K,V) RDD ...

2015-04-28 Thread subscripti...@prismalytics.io
, if the data was spread across partitions. The result is a tuple of sum and count for each key. Use mapValues to keep your partitioning by keys intact and minimize a full shuffle for downstream keyed operations. It just calculates the avg for each key. From: Todd Nist Date: Tuesday, Ap

Calculating the averages for each KEY in a Pairwise (K,V) RDD ...

2015-04-28 Thread subscripti...@prismalytics.io
Hello Friends: I generated a Pair RDD with K/V pairs, like so: >>> >>> rdd1.take(10) # Show a small sample. [(u'2013-10-09', 7.60117302052786), (u'2013-10-10', 9.322709163346612), (u'2013-10-10', 28.264462809917358), (u'2013-10-07', 9.664429530201343), (u'2013-10-07', 12.461538461538463),

Re: ImportError: No module named iter ... (on CDH5 v1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch) ...

2015-03-03 Thread subscripti...@prismalytics.io
aused when crossing Python versions. Thank you Marcelo! On March 3, 2015 7:39:03 PM Marcelo Vanzin wrote: Weird python errors like this generally mean you have different versions of python in the nodes of your cluster. Can you check that? On Tue, Mar 3, 2015 at 4:21 PM, subscripti...@prism

ImportError: No module named iter ... (on CDH5 v1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch) ...

2015-03-03 Thread subscripti...@prismalytics.io
Hi Friends: We noticed the following in 'pyspark' happens when running in distributed Standalone Mode (MASTER=spark://vps00:7077), but not in Local Mode (MASTER=local[n]). See the following, particularly what is highlighted in *Red* (again the problem only happens in Standalone Mode). Any id