Spark MOOC - early access

2015-05-21 Thread Marco Shaw
*Hi Spark Devs and Users,BerkeleyX and Databricks are currently developing two Spark-related MOOC on edX (intro , ml ), the first of which

Re: Spark Team - Paco Nathan said that your team can help

2015-01-22 Thread Marco Shaw
Hi, Let me reword your request so you understand how (too) generic your question is "Hi, I have $10,000, please find me some means of transportation so I can get to work." Please provide (a lot) more details. If you can't, consider using one of the pre-built express VMs from either Cloude

Re: Spark Team - Paco Nathan said that your team can help

2015-01-22 Thread Marco Shaw
(Starting over...) The best place to look for the requirements would be at the individual pages of each technology. As for absolute minimum requirements, I would suggest 50GB of disk space and at least 8GB of memory. This is the absolute minimum. "Architecting" a solution like you are looking f

Re: Spark Team - Paco Nathan said that your team can help

2015-01-22 Thread Marco Shaw
HAT IS THE MINIMUM >> HARDWARE >> >> CONFIGURATION REQUIRED TO BUILT HDFS+ MAPREDUCE+SPARK+YARN on a >> system? >> >> Please let me know if you need any further information and if you dont >> know >> >> please drive across with the $1 to Sir Pac

Need some guidance

2015-04-13 Thread Marco Shaw
**Learning the ropes** I'm trying to grasp the concept of using the pipeline in pySpark... Simplified example: >>> list=[(1,"alpha"),(1,"beta"),(1,"foo"),(1,"alpha"),(2,"alpha"),(2,"alpha"),(2,"bar"),(3,"foo")] Desired outcome: [(1,3),(2,2),(3,1)] Basically for each key, I want the number of un

Re: Spark vs Google cloud dataflow

2014-06-27 Thread Marco Shaw
Dean: Some interesting information... Do you know where I can read more about these coming changes to Scalding/Cascading? > On Jun 27, 2014, at 9:40 AM, Dean Wampler wrote: > > ... and to be clear on the point, Summingbird is not limited to MapReduce. It > abstracts over Scalding (which abstra

Re: Spark vs Google cloud dataflow

2014-06-27 Thread Marco Shaw
Sorry. Never mind... I guess that's what "Summingbird" is all about. Never heard of it. > On Jun 27, 2014, at 7:10 PM, Marco Shaw wrote: > > Dean: Some interesting information... Do you know where I can read more about > these coming changes to Scalding/Cascading

Re: Spark Summit 2014 Day 2 Video Streams?

2014-07-01 Thread Marco Shaw
They are recorded... For example, 2013: http://spark-summit.org/2013 I'm assuming the 2014 videos will be up in 1-2 weeks. Marco On Tue, Jul 1, 2014 at 3:18 PM, Soumya Simanta wrote: > Are these sessions recorded ? > > > On Tue, Jul 1, 2014 at 9:47 AM, Alexis Roos wrote: > >> >> >> >> >> >>

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Marco Shaw
Can you provide links to the sections that are confusing? My understanding, the HDP1 binaries do not need YARN, while the HDP2 binaries do. Now, you can also install Hortonworks Spark RPM... For production, in my opinion, RPMs are better for manageability. > On Jul 6, 2014, at 5:39 PM, Konst

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Marco Shaw
; If you have a Hadoop 2 cluster, you can run Spark without any installation > needed. " > > And this is confusing for me... do I need rpm installation on not?... > > > Thank you, > Konstantin Kudryavtsev > > >> On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw wro

Re: Running Spark on Microsoft Azure HDInsight

2014-07-14 Thread Marco Shaw
I'm a Spark and HDInsight novice, so I could be wrong... HDInsight is based on HDP2, so my guess here is that you have the option of installing/configuring Spark in cluster mode (YARN) or in standalone mode and package the Spark binaries with your job. Everything I seem to look at is related to U

Re: Running Spark on Microsoft Azure HDInsight

2014-07-14 Thread Marco Shaw
Looks like going with cluster mode is not a good idea: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-administer-use-management-portal/ Seems like a non-HDInsight VM might be needed to make it the Spark master node. Marco On Mon, Jul 14, 2014 at 12:43 PM, Marco Shaw wrote

Re: Starting with spark

2014-07-24 Thread Marco Shaw
First thing... Go into the Cloudera Manager and make sure that the Spark service (master?) is started. Marco On Thu, Jul 24, 2014 at 7:53 AM, Sameer Sayyed wrote: > Hello All, > > I am new user of spark, I am using *cloudera-quickstart-vm-5.0.0-0-vmware* > for execute sample examples of Spark

Re: when will the spark 1.3.0 be released?

2014-12-16 Thread Marco Shaw
When it is ready. > On Dec 16, 2014, at 11:43 PM, 张建轶 wrote: > > Hi £¡ > > when will the spark 1.3.0 be released£¿ > I want to use new LDA feature. > Thank > you!B‹CB•È[œÝXœØÜšX™KK[XZ[ˆ\Ù\‹][œÝXœØÜšX™PÜ\šË˜\XÚK›Ü

Re: DeepLearning and Spark ?

2015-01-09 Thread Marco Shaw
Pretty vague on details: http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A227199 > On Jan 9, 2015, at 11:39 AM, Jaonary Rabarisoa wrote: > > Hi all, > > DeepLearning algorithms are popular and achieve many state of the art > performance in several real world machine learn

Express VMs - good idea?

2014-05-14 Thread Marco Shaw
Hi, I've wanted to play with Spark. I wanted to fast track things and just use one of the vendor's "express VMs". I've tried Cloudera CDH 5.0 and Hortonworks HDP 2.1. I've not written down all of my issues, but for certain, when I try to run spark-shell it doesn't work. Cloudera seems to crash

Re: How to Run Machine Learning Examples

2014-05-22 Thread Marco Shaw
About run-example, I've tried MapR, Hortonworks and Cloudera distributions with there Spark packages and none seem to package it. Am I missing something? Is this only provided with the Spark project pre-built binaries or from source installs? Marco > On May 22, 2014, at 5:04 PM, Stephen Boes