Re: Missing shuffle files

2015-02-22 Thread Sameer Farooqui
Do you guys have dynamic allocation turned on for YARN? Anders, was Task 450 in your job acting like a Reducer and fetching the Map spill output data from a different node? If a Reducer task can't read the remote data it needs, that could cause the stage to fail. Sometimes this forces the previou

Spark SQL odbc on Windows

2015-02-22 Thread Francisco Orchard
Hello, I work on a MS consulting company and we are evaluating including SPARK on our BigData offer. We are particulary interested into testing SPARK as rolap engine for SSAS but we cannot find a way to activate the odbc server (thrift) on a Windows custer. There is no start-thriftserver.sh com

RE: Spark SQL odbc on Windows

2015-02-22 Thread Ashic Mahtab
Hi Francisco,While I haven't tried this, have a look at the contents of start-thriftserver.sh - all it's doing is setting up a few variables and calling: /bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 and passing some additional parameters. Perhaps doing the s

Broadcasting Large Objects Fails?

2015-02-22 Thread TJ Klein
Hi, I am trying to broadcast large objects (order of a couple of 100 MBs). However, I keep getting errors when trying to do so: Traceback (most recent call last): File "/LORM_experiment.py", line 510, in broadcast_gradient_function = sc.broadcast(gradient_function) File "/scratch/users/2

Running Example Spark Program

2015-02-22 Thread Surendran Duraisamy
Hello All, I am new to Apache Spark, I am trying to run JavaKMeans.java from Spark Examples in my Ubuntu System. I downloaded spark-1.2.1-bin-hadoop2.4.tgz and started sbin/start-master.sh After starting

Re: Running Example Spark Program

2015-02-22 Thread VISHNU SUBRAMANIAN
Try restarting your Spark cluster . ./sbin/stop-all.sh ./sbin/start-all.sh Thanks, Vishnu On Sun, Feb 22, 2015 at 7:30 PM, Surendran Duraisamy < 2013ht12...@wilp.bits-pilani.ac.in> wrote: > Hello All, > > I am new to Apache Spark, I am trying to run JavaKMeans.java from Spark > Examples in my U

Re: Running Example Spark Program

2015-02-22 Thread Jason Bell
If you would like a morr detailed walkthrough I wrote one recently. https://dataissexy.wordpress.com/2015/02/03/apache-spark-standalone-clusters-bigdata-hadoop-spark/ Regards Jason Bell On 22 Feb 2015 14:16, "VISHNU SUBRAMANIAN" wrote: > Try restarting your Spark cluster . > ./sbin/stop-all.sh

Re: Spark performance tuning

2015-02-22 Thread Akhil Das
You can simply follow these http://spark.apache.org/docs/1.2.0/tuning.html Thanks Best Regards On Sun, Feb 22, 2015 at 1:14 AM, java8964 wrote: > Can someone share some ideas about how to tune the GC time? > > Thanks > > -- > From: java8...@hotmail.com > To: user@spa

Re: Broadcasting Large Objects Fails?

2015-02-22 Thread Akhil Das
Did you try with torrent broadcast factory? Thanks Best Regards On Sun, Feb 22, 2015 at 3:29 PM, TJ Klein wrote: > Hi, > > I am trying to broadcast large objects (order of a couple of 100 MBs). > However, I keep getting errors when trying to do so: > > Traceback (most recent call last): > Fil

Re: Spark SQL odbc on Windows

2015-02-22 Thread Denny Lee
Hi Francisco, Out of curiosity - why ROLAP mode using multi-dimensional mode (vs tabular) from SSAS to Spark? As a past SSAS guy you've definitely piqued my interest. The one thing that you may run into is that the SQL generated by SSAS can be quite convoluted. When we were doing the same thing t

Re: Spark SQL odbc on Windows

2015-02-22 Thread Denny Lee
Back to thrift, there was an earlier thread on this topic at http://mail-archives.apache.org/mod_mbox/spark-user/201411.mbox/%3CCABPQxsvXA-ROPeXN=wjcev_n9gv-drqxujukbp_goutvnyx...@mail.gmail.com%3E that may be useful as well. On Sun Feb 22 2015 at 8:42:29 AM Denny Lee wrote: > Hi Francisco, > >

Re: Running Example Spark Program

2015-02-22 Thread Surendran Duraisamy
Thank You Jason, Got the program working after setting SPARK_WORKER_CORES SPARK_WORKER_MEMORY While running the program from eclipse, got strange ClassNotFoundException. In JavaKMeans.java, ParsePoint is static inner class. When running the program I got ClassNotFound for ParsePoint. I have t

Re: Broadcasting Large Objects Fails?

2015-02-22 Thread Tassilo Klein
Hi Akhil, thanks for your reply. I am using the latest version of Spark 1.2.1 (also tried 1.3 developer branch). If I am not mistaken the TorrentBroadcast is the default there, isn't it? Thanks, Tassilo On Sun, Feb 22, 2015 at 10:59 AM, Akhil Das wrote: > Did you try with torrent broadcast fa

Re: Broadcasting Large Objects Fails?

2015-02-22 Thread Akhil Das
Yes it is, you have some more customizable options over here http://spark.apache.org/docs/1.2.0/configuration.html#compression-and-serialization Thanks Best Regards On Sun, Feb 22, 2015 at 11:47 PM, Tassilo Klein wrote: > Hi Akhil, > > thanks for your reply. I am using the latest version of Sp

Re: Broadcasting Large Objects Fails?

2015-02-22 Thread Tassilo Klein
I see, thanks. Yes, I have tried already all sorts of changes to these parameters. Unfortunately, none of seem had any impact. Thanks, Tassilo On Sun, Feb 22, 2015 at 1:24 PM, Akhil Das wrote: > Yes it is, you have some more customizable options over here > http://spark.apache.org/docs/1.2.0/c

Re: [Spark SQL]: Convert SchemaRDD back to RDD

2015-02-22 Thread stephane.collot
Hi Michael, I think that the feature (convert a SchemaRDD to a structured class RDD) is now available. But I didn't understand in the PR how exactly to do this. Can you give an example or doc links? Best regards -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.

How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread hnahak
Hi, I have written custom InputFormat and RecordReader for Spark, I need to use user variables from spark client program. I added them in SparkConf val sparkConf = new SparkConf().setAppName(args(0)).set("developer","MyName") *and in InputFormat class* protected boolean isSplitabl

Re: Posting to the list

2015-02-22 Thread hnahak
I'm also facing the same issue, this is third time whenever I post anything it never accept by the community and at the same time got a failure mail in my register mail id. and when click to "subscribe to this mailing list" link, i didnt get any new subscription mail in my inbox. Please anyone

Re: Posting to the list

2015-02-22 Thread Ted Yu
bq. i didnt get any new subscription mail in my inbox. Have you checked your Spam folder ? Cheers On Sun, Feb 22, 2015 at 2:36 PM, hnahak wrote: > I'm also facing the same issue, this is third time whenever I post anything > it never accept by the community and at the same time got a failure m

Re: [Spark SQL]: Convert SchemaRDD back to RDD

2015-02-22 Thread Ted Yu
Haven't found the method in http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SchemaRDD The new DataFrame has this method: /** * Returns the content of the [[DataFrame]] as an [[RDD]] of [[Row]]s. * @group rdd */ def rdd: RDD[Row] = { FYI On Sun, Feb 22,

Re: How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread Tom Vacek
The SparkConf doesn't allow you to set arbitrary variables. You can use SparkContext's HadoopRDD and create a JobConf (with whatever variables you want), and then grab them out of the JobConf in your RecordReader. On Sun, Feb 22, 2015 at 4:28 PM, hnahak wrote: > Hi, > > I have written custom In

Launching Spark cluster on EC2 with Ubuntu AMI

2015-02-22 Thread olegshirokikh
I'm trying to launch Spark cluster on AWS EC2 with custom AMI (Ubuntu) using the following: ./ec2/spark-ec2 --key-pair=*** --identity-file='/home/***.pem' --region=us-west-2 --zone=us-west-2b --spark-version=1.2.1 --slaves=2 --instance-type=t2.micro --ami=ami-29ebb519 --user=ubuntu launch spark-ub

Re: Launching Spark cluster on EC2 with Ubuntu AMI

2015-02-22 Thread Ted Yu
bq. bash: git: command not found Looks like the AMI doesn't have git pre-installed. Cheers On Sun, Feb 22, 2015 at 4:29 PM, olegshirokikh wrote: > I'm trying to launch Spark cluster on AWS EC2 with custom AMI (Ubuntu) > using > the following: > > ./ec2/spark-ec2 --key-pair=*** --identity-file=

Re: Posting to the list

2015-02-22 Thread haihar nahak
I checked it but I didn't see any mail from user list. Let me do it one more time. [image: Inline image 1] --Harihar On Mon, Feb 23, 2015 at 11:50 AM, Ted Yu wrote: > bq. i didnt get any new subscription mail in my inbox. > > Have you checked your Spam folder ? > > Cheers > > On Sun, Feb 22, 2

Re: Any sample code for Kafka consumer

2015-02-22 Thread mykidong
In java, you can see this example: https://github.com/mykidong/spark-kafka-simple-consumer-receiver - Kidong. -- Original Message -- From: "icecreamlc [via Apache Spark User List]" To: "mykidong" Sent: 2015-02-21 오전 11:16:37 Subject: Any sample code for Kafka consumer >Dear all, > >Do

Re: Use Spark Streaming for Batch?

2015-02-22 Thread Tobias Pfeiffer
Hi, On Sat, Feb 21, 2015 at 1:05 AM, craigv wrote: > > /Might it be possible to perform "large batches" processing on HDFS time > > series data using Spark Streaming?/ > > > > 1.I understand that there is not currently an InputDStream that could do > > what's needed. I would have to create such

Re: How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread haihar nahak
Thanks. I extract hadoop configuration and set a my arbitrary variable and able to get inside InputFormat from JobContext.configuration On Mon, Feb 23, 2015 at 12:04 PM, Tom Vacek wrote: > The SparkConf doesn't allow you to set arbitrary variables. You can use > SparkContext's HadoopRDD and cre

Re: How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread hnahak
Instead of setting in SparkConf , set it into SparkContext.hadoopconfiguration.set(key,value) and from JobContext extract same key. --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-send-user-variables-from-Spark-client-to-custom-InputForma

Re: Use Spark Streaming for Batch?

2015-02-22 Thread Soumitra Kumar
See if https://issues.apache.org/jira/browse/SPARK-3660 helps you. My patch has been accepted and, this enhancement is scheduled for 1.3.0. This lets you specify initialRDD for updateStateByKey operation. Let me know if you need any information. On Sun, Feb 22, 2015 at 5:21 PM, Tobias Pfeiffer w

Re: Any sample code for Kafka consumer

2015-02-22 Thread Tathagata Das
Spark Streaming already directly supports Kafka http://spark.apache.org/docs/latest/streaming-programming-guide.html#advanced-sources Is there any reason why that is not sufficient? TD On Sun, Feb 22, 2015 at 5:18 PM, mykidong wrote: > In java, you can see this example: > https://github.com/my

Re: cannot run spark shell in yarn-client mode

2015-02-22 Thread quangnguyenbh
Does anyone fix this error ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/cannot-run-spark-shell-in-yarn-client-mode-tp4013p21761.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

How to integrate HBASE on Spark

2015-02-22 Thread sandeep vura
Hi I had installed spark on 3 node cluster. Spark services are up and running.But i want to integrate hbase on spark Do i need to install HBASE on hadoop cluster or spark cluster. Please let me know asap. Regards, Sandeep.v

Re: How to integrate HBASE on Spark

2015-02-22 Thread Akhil Das
If you are having both the clusters on the same network, then i'd suggest you installing it on the hadoop cluster. If you install it on the spark cluster itself, then hbase might take up a few cpu cycles and there's a chance for the job to lag. Thanks Best Regards On Mon, Feb 23, 2015 at 12:48 PM

Submitting jobs to Spark EC2 cluster remotely

2015-02-22 Thread olegshirokikh
I've set up the EC2 cluster with Spark. Everything works, all master/slaves are up and running. I'm trying to submit a sample job (SparkPi). When I ssh to cluster and submit it from there - everything works fine. However when driver is created on a remote host (my laptop), it doesn't work. I've tr