Re: Running Spark on EMR

2017-01-15 Thread Andrew Holway
use yarn :) "spark-submit --master yarn" On Sun, Jan 15, 2017 at 7:55 PM, Darren Govoni wrote: > So what was the answer? > > > > Sent from my Verizon, Samsung Galaxy smartphone > > Original message > From: Andrew Holway > Date: 1/1

Re: Running Spark on EMR

2017-01-15 Thread Andrew Holway
Darn. I didn't respond to the list. Sorry. On Sun, Jan 15, 2017 at 5:29 PM, Marco Mistroni wrote: > thanks Neil. I followed original suggestion from Andrw and everything is > working fine now > kr > > On Sun, Jan 15, 2017 at 4:27 PM, Neil Jonkers wrote: > >> Hello, >> >> Can you drop the url:

python environments with "local" and "yarn-client" - Boto failing on HDP2.5

2016-11-29 Thread Andrew Holway
Hey, I am making some calls with Boto3 in my pyspark which is working fine in master=local mode but when I switch to master=yarn I am getting "NoCredentialsError: Unable to locate credentials" which is a bit annoying as I cannot work out why! I have been running this application fine on Mesos and

Re: createDataFrame causing a strange error.

2016-11-29 Thread Andrew Holway
> .add("timezone", StringType).add("day", StringType) > .add("minute", StringType) > > val jsonContentWithSchema = sqlContext.jsonRDD(jsonRdd, schema) > println(s"- And the Json withSchema has > ${jsonConten

Re: createDataFrame causing a strange error.

2016-11-28 Thread Andrew Holway
n a spark code > > 2 - try to replace your distributedJsonRead. instead of reading from s3, > generate a string out of a snippet of your json object > > 3 - Spark can read data from s3 as well , just do a > sc.textFile('s3://) ==> http://www.sparktutorials. > net/r

Re: createDataFrame causing a strange error.

2016-11-27 Thread Andrew Holway
.protocol.Py4JError: An error occurred while calling o33.__getnewargs__. Trace: py4j.Py4JException: Method __getnewargs__([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Ga

createDataFrame causing a strange error.

2016-11-27 Thread Andrew Holway
Hi, Can anyone tell me what is causing this error Spark 2.0.0 Python 2.7.5 df = sqlContext.createDataFrame(foo, schema) https://gist.github.com/mooperd/368e3453c29694c8b2c038d6b7b4413a Traceback (most recent call last): File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", li

javac - No such file or directory

2016-11-09 Thread Andrew Holway
I'm getting this error trying to build spark on Centos7. It is not googling very well: [error] (tags/compile:compileIncremental) java.io.IOException: Cannot run program "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/bin/javac" (in directory "/home/spark/spark"): error=2, No such fil

Re: Save a spark RDD to disk

2016-11-08 Thread Andrew Holway
Thats around 750MB/s which seems quite respectable even in this day and age! How many and what kind of disks to you have attached to your nodes? What are you expecting? On Tue, Nov 8, 2016 at 11:08 PM, Elf Of Lothlorein wrote: > Hi > I am trying to save a RDD to disk and I am using the > saveAs

Live data visualisations with Spark

2016-11-08 Thread Andrew Holway
something that could be accomplished with shiny server for instance? Thanks, Andrew Holway

Re: sanboxing spark executors

2016-11-04 Thread Andrew Holway
I think running it on a Mesos cluster could give you better control over this kinda stuff. On Fri, Nov 4, 2016 at 7:41 AM, blazespinnaker wrote: > Is there a good method / discussion / documentation on how to sandbox a > spark > executor? Assume the code is untrusted and you don't want it to

Re: Python - Spark Cassandra Connector on DC/OS

2016-11-01 Thread Andrew Holway
Sorry: Spark 2.0.0 On Tue, Nov 1, 2016 at 10:04 AM, Andrew Holway < andrew.hol...@otternetworks.de> wrote: > Hello, > > I've been getting pretty serious with DC/OS which I guess could be > described as a somewhat polished distribution of Mesos. I'm not sure how

Python - Spark Cassandra Connector on DC/OS

2016-11-01 Thread Andrew Holway
Hello, I've been getting pretty serious with DC/OS which I guess could be described as a somewhat polished distribution of Mesos. I'm not sure how relevant DC/OS is to this problem. I am using this pyspark program to test the cassandra connection: http://bit.ly/2eWAfxm (github) I can that the df

ERROR SparkContext: Error initializing SparkContext.

2016-05-09 Thread Andrew Holway
Hi, I am having a hard time getting to the bottom of this problem. I'm really not sure where to start with it. Everything works fine in local mode. Cheers, Andrew [testing@instance-16826 ~]$ /opt/mapr/spark/spark-1.5.2/bin/spark-submit --num-executors 21 --executor-cores 5 --master yarn-client

[OT] Apache Spark Jobs in Kochi, India

2016-02-11 Thread Andrew Holway
Hello, I'm not sure how appropriate job postings are to a user group. We're getting deep into spark and are looking for some talent in our Kochi office. http://bit.ly/Spark-Eng - Apache Spark Engineer / Architect - Kochi http://bit.ly/Spark-Dev - Lead Apache Spark Developer - Kochi Sorry for th

Re: Writing to jdbc database from SparkR (1.5.2)

2016-02-06 Thread Andrew Holway
> > df <- read.df(sqlContext, source="jdbc", > url="jdbc:mysql://hostname:3306?user=user&password=pass", > dbtable="database.table") > I got a bit further but am now getting the following error. This error is being thrown without the database being touched. I tested this by making the database una

Writing to jdbc database from SparkR (1.5.2)

2016-02-06 Thread Andrew Holway
I'm managing to read data via JDBC using the following but I can't work out how to write something back to the Database. df <- read.df(sqlContext, source="jdbc", url="jdbc:mysql://hostname:3306?user=user&password=pass", dbtable="database.table") Does this functionality exist in 1.5.2? Thanks,

Concatenating tables

2016-01-23 Thread Andrew Holway
Is there a data frame operation to do this? +-+ | A B C D | +-+ | 1 2 3 4 | | 5 6 7 8 | +-+ +-+ | A B C D | +-+ | 3 5 6 8 | | 0 0 0 0 | +-+ +-+ | A B C D | +-+ | 8 8 8 8 | | 1 1 1 1 | +-+ Concatenated together to make this.

python - list objects in HDFS directory

2016-01-23 Thread Andrew Holway
Hello, I would like to make a list of files (parquet or json) in a specific HDFS directory with python so I can do some logic on which files to load into a dataframe. Any ideas? Thanks, Andrew - To unsubscribe, e-mail: user-un

Re: Date / time stuff with spark.

2016-01-21 Thread Andrew Holway
P.S. We are working with Python. On Thu, Jan 21, 2016 at 8:24 PM, Andrew Holway wrote: > Hello, > > I am importing this data from HDFS into a data frame with > sqlContext.read.json(). > > {“a": 42, “a": 56, "Id": "621368e2f829f230", “smunkId&

Date / time stuff with spark.

2016-01-21 Thread Andrew Holway
Hello, I am importing this data from HDFS into a data frame with sqlContext.read.json(). {“a": 42, “a": 56, "Id": "621368e2f829f230", “smunkId": "CKm26sDMucoCFReRGwodbHAAgw", “popsicleRange": "17610", "time": "2016-01-20T23:59:53+00:00”} I want to do some date/time operations on this json data b