[POWERED BY] Please add our organization

2015-09-23 Thread barmaley
Name: Frontline Systems Inc. URL: www.solver.com Description: • We built an interface between Microsoft Excel and Apache Spark - bringing Big Data from the clusters to Excel enabling tools ranging from simple charts and Power View dashboards to add-ins for machine learning and predictive an

Akka failures: Driver Disassociated

2015-06-24 Thread barmaley
I'm running Spark 1.3.1 on AWS... Having long-running application (spark context) which accepts and completes jobs fine. However, it crashes at as it seems random times (anywhere from 1 hour and up to 6 days)... At a latter case, context run and finished hundreds of jobs without an issue and then s

takeSample() results in two stages

2015-06-11 Thread barmaley
I've observed interesting behavior in Spark 1.3.1, the reason for which is not clear. Doing something as simple as sc.textFile("...").takeSample(...) always results in two stages:Spark's takeSample() results in two stages

Can't access Ganglia on EC2 Spark cluster

2015-06-10 Thread barmaley
Launching using spark-ec2 script results in: Setting up ganglia RSYNC'ing /etc/ganglia to slaves... <...> Shutting down GANGLIA gmond: [FAILED] Starting GANGLIA gmond:[ OK ] Shutting down GANGLIA gmond:

Re: Required settings for permanent HDFS Spark on EC2

2015-06-04 Thread barmaley
Hi - I'm having similar problem with switching from ephemeral to persistent HDFS - it always looks for 9000 port regardless of options I set for 9010 persistent HDFS. Have you figured out a solution? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Req

Re: Adding new Spark workers on AWS EC2 - access error

2015-06-04 Thread barmaley
The issue was that SSH key generated on Spark Master was not transferred to this new slave. Spark-ec2 script with `start` command omits this step. The solution is to use `launch` command with `--resume` options. Then the SSH key is transferred to the new slave and everything goes smooth. -- View

Adding new Spark workers on AWS EC2 - access error

2015-06-03 Thread barmaley
I have the existing operating Spark cluster that was launched with spark-ec2 script. I'm trying to add new slave by following the instructions: Stop the cluster On AWS console "launch more like this" on one of the slaves Start the cluster Although the new instance is added to the same security gro

Spark SQL: STDDEV working in Spark Shell but not in a standalone app

2015-05-08 Thread barmaley
Given a registered table from data frame, I'm able to execute queries like sqlContext.sql("SELECT STDDEV(col1) FROM table") from Spark Shell just fine. However, when I run exactly the same code in a standalone app on a cluster, it throws an exception: "java.util.NoSuchElementException: key not foun

Spark-csv data source: infer data types

2015-04-18 Thread barmaley
I'm experimenting with Spark-CSV package (https://github.com/databricks/spark-csv) for reading csv files into Spark DataFrames. Everything works but all columns are assumed to be of StringType. As shown in Spark SQL documentation (https://spark.apache.org/docs/latest/sql-programming-guide.html), fo

Re: Add row IDs column to data frame

2015-04-08 Thread barmaley
Hi Bojan, Could you please expand your idea on how to append to RDD? I can think of how to append a constant value to each row on RDD: //oldRDD - RDD[Array[String]] val c = "const" val newRDD = oldRDD.map(r=>c+:r) But how to append a custom column to RDD? Something like: val colToAppend = sc.ma