?????? ?????? ?????? How to use spark to access HBase with Security enabled

2015-05-23 Thread donhoff_h
Hi, The exception is the same as before. Just like the following: 2015-05-23 18:01:40,943 ERROR [hconnection-0x14027b82-shared--pool1-t1] ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. javax.security.sasl.SaslExcep

Re: [Streaming] Non-blocking recommendation in custom receiver documentation and KinesisReceiver's worker.run blocking calll

2015-05-23 Thread Aniket Bhatnagar
Hi TD Unfortunately, I am off for a week so I won't be able to test this until next week. Will keep you posted. Aniket On Sat, May 23, 2015, 6:16 AM Tathagata Das wrote: > Hey Aniket, I just checked in the fix in Spark master and branch-1.4. > Could you download Spark and test it out? > > > >

split function on spark sql created rdd

2015-05-23 Thread kali.tumm...@gmail.com
Hi All, I am trying to do word count on number of tweets, my first step is to get data from table using spark sql and then run split function on top of it to calculate word count. Error:- valuse split is not a member of org.apache.spark.sql.SchemaRdd Spark Code that doesn't work to do word coun

Is anyone using Amazon EC2?

2015-05-23 Thread Joe Wass
I used Spark on EC2 a while ago

Is anyone using Amazon EC2? (second attempt!)

2015-05-23 Thread Joe Wass
I used Spark on EC2 a while ago, but recent revisions seem to have broken the functionality. Is anyone actually using Spark on EC2 at the moment? The bug in question is: https://issues.apache.org/jira/browse/SPARK-5008 It makes it impossible to use persistent HDFS without a workround on each sl

Re: SparkSQL query plan to Stage wise breakdown

2015-05-23 Thread ayan guha
I think you are looking for Df.explain On 23 May 2015 12:51, "Pramod Biligiri" wrote: > Hi, > Is there an easy way to see how a SparkSQL query plan maps to different > stages of the generated Spark job? The WebUI is entirely in terms of RDD > stages and I'm having a hard time mapping it back to m

Not able to run SparkPi locally

2015-05-23 Thread Sujit Pal
Hello all, This is probably me doing something obviously wrong, would really appreciate some pointers on how to fix this. I installed spark-1.3.1-bin-hadoop2.6.tgz from the Spark download page [ https://spark.apache.org/downloads.html] and just untarred it on a local drive. I am on Mac OSX 10.9.5

Re: Not able to run SparkPi locally

2015-05-23 Thread Sujit Pal
Replying to my own email in case someone has the same or similar issue. On a hunch I ran this against my Linux (Ubuntu 14.04 with JDK 8) box. Not only did "bin/run-example SparkPi" run without any problems, it also provided a very helpful message in the output. 15/05/23 08:35:15 WARN Utils: Your

Re: split function on spark sql created rdd

2015-05-23 Thread Ted Yu
BTW flatmap is misspelled. See RDD.scala: def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U] = withScope { On Sat, May 23, 2015 at 8:52 AM, Ted Yu wrote: > hiveCtx.sql() returns DataFrame which doesn't have split method. > > The columns of a row in the result can be accessed by fiel

Doubts about SparkSQL

2015-05-23 Thread Renato Marroquín Mogrovejo
Hi all, I have some doubts about the latest SparkSQL. 1. In the paper about SparkSQL it has been stated that "The physical planner also performs rule-based physical optimizations, such as pipelining projections or filters into one Spark map operation. ..." If dealing with a query of the form: s

Re: Help reading Spark UI tea leaves..

2015-05-23 Thread Shay Seng
Thanks! I was getting a little confused by this partitioner business, I thought that by default a pairRDD would be partitioned by a HashPartitioner? Was this possibly the case in 0.9.3 but not in 1.x? In anycase, I tried your suggestion and the shuffle was removed. Cheers. One small question tho

Re: Doubts about SparkSQL

2015-05-23 Thread Ram Sriharsha
Yes it does ... you can try out the following example (the People dataset that comes with Spark). There is an inner query that filters on age and an outer query that filters on name. The physical plan applies a single composite filter on name and age as you can see below sqlContext.sql("select * f

Re: spark.executor.extraClassPath - Values not picked up by executors

2015-05-23 Thread Todd Nist
Hi Yana, Yes typeo in the eamil, file name is correct "spark-defaults.conf"; thanks though. So it appears to work if in the driver is specify it as part of the sparkConf: val conf = new SparkConf().setAppName(getClass.getSimpleName) .set("spark.executor.extraClassPath", "/projects/spark-cassan

Re: Is anyone using Amazon EC2?

2015-05-23 Thread Johan Beisser
Yes. We're looking at bootstrapping in EMR... On Sat, May 23, 2015 at 07:21 Joe Wass wrote: > I used Spark on EC2 a while ago >

Re: Is anyone using Amazon EC2?

2015-05-23 Thread Shafaq
Yes-Spark EC2 cluster . Looking into migrating to spark emr. Adding more ec2 is not possible afaik. On May 23, 2015 11:22 AM, "Johan Beisser" wrote: > Yes. > > We're looking at bootstrapping in EMR... > On Sat, May 23, 2015 at 07:21 Joe Wass wrote: > >> I used Spark on EC2 a while ago >> >

Re: Spark Streaming: all tasks running on one executor (Kinesis + Mongodb)

2015-05-23 Thread Mike Trienis
Yup, and since I have only one core per executor it explains why there was only one executor utilized. I'll need to investigate which EC2 instance type is going to be the best fit. Thanks Evo. On Fri, May 22, 2015 at 3:47 PM, Evo Eftimov wrote: > A receiver occupies a cpu core, an executor is s

Re: Migrate Relational to Distributed

2015-05-23 Thread Dmitry Tolpeko
Hi Brant, Let me partially answer to your concerns: please follow a new open source project PL/HQL (www.plhql.org) aimed at allowing you to reuse existing logic and leverage existing skills at some extent, so you do not need to rewrite everything to Scala/Java and can do this gradually. I hope it

Re: Is anyone using Amazon EC2?

2015-05-23 Thread Joe Wass
Sorry guys, my email submitted before I finished writing it. Check my other message (with the same subject)! On 23 May 2015 at 20:25, Shafaq wrote: > Yes-Spark EC2 cluster . Looking into migrating to spark emr. > Adding more ec2 is not possible afaik. > On May 23, 2015 11:22 AM, "Johan Beisser"

Re: Is anyone using Amazon EC2?

2015-05-23 Thread Vadim Bichutskiy
Yes, we're running Spark on EC2. Will transition to EMR soon. -Vadim ᐧ On Sat, May 23, 2015 at 2:22 PM, Johan Beisser wrote: > Yes. > > We're looking at bootstrapping in EMR... > > On Sat, May 23, 2015 at 07:21 Joe Wass wrote: > >> I used Spark on EC2 a while ago >> >

??????spark.executor.extraClassPath - Values not picked up by executors

2015-05-23 Thread wesley.miao
My experience is don't put any application specific settings into spark-defaults.conf which is applied to all applications. Instead, you can either set them programmatically as what you did below or through spark-submit. Also, if you still like to do it via spark-defaults.conf, you will have

Re: DataFrame groupBy vs RDD groupBy

2015-05-23 Thread ayan guha
Hi Michael This is great info. I am currently using repartitionandsort function to achieve the same. Is this the recommended way till 1.3 or is there any better way? On 23 May 2015 07:38, "Michael Armbrust" wrote: > DataFrames have a lot more information about the data, so there is a whole > cla

Strange ClassNotFound exeption

2015-05-23 Thread boci
Hi guys! I have a small spark application. It's query some data from postgres, enrich it and write to elasticsearch. When I deployed into spark container I got a very fustrating error: https://gist.github.com/b0c1/66527e00bada1e4c0dc3 Spark version: 1.3.1 Hadoop version: 2.6.0 Additional info:

Re: Strange ClassNotFound exeption

2015-05-23 Thread Ted Yu
In my local maven repo, I found: $ jar tvf /Users/tyu/.m2/repository//org/spark-project/akka/akka-actor_2.10/2.3.4-spark/akka-actor_2.10-2.3.4-spark.jar | grep SelectionPath 521 Mon Sep 29 12:05:36 PDT 2014 akka/actor/SelectionPathElement.class Is the above jar in your classpath ? On Sat, May

SparkSQL can't read S3 path for hive external table

2015-05-23 Thread ogoh
Hello, I am using Spark1.3 in AWS. SparkSQL can't recognize Hive external table on S3. The following is the error message. I appreciate any help. Thanks, Okehee -- 15/05/24 01:02:18 ERROR thriftserver.SparkSQLDriver: Failed in [select count(*) from api_search where pdate='2015-05-08'] java

Re: SparkSQL failing while writing into S3 for 'insert into table'

2015-05-23 Thread Cheolsoo Park
>> It seems it generated query results into tmp dir firstly, and tries to rename it into the right folder finally. But, it failed while renaming it. This problem exists not only in SparkSQL but also in any Hadoop tools (e.g. Hive, Pig, etc) when using with s3. Usually, It is better to write task o

SparkSQL errors in 1.4 rc when using with Hive 0.12 metastore

2015-05-23 Thread Cheolsoo Park
Hi, I've been testing SparkSQL in 1.4 rc and found two issues. I wanted to confirm whether these are bugs or not before opening a jira. *1)* I can no longer compile SparkSQL with -Phive-0.12.0. I noticed that in 1.4, IsolatedClientLoader is introduced, and different versions of Hive metastore jar