Hi,
The exception is the same as before. Just like the following:
2015-05-23 18:01:40,943 ERROR [hconnection-0x14027b82-shared--pool1-t1]
ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is
missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslExcep
Hi TD
Unfortunately, I am off for a week so I won't be able to test this until
next week. Will keep you posted.
Aniket
On Sat, May 23, 2015, 6:16 AM Tathagata Das wrote:
> Hey Aniket, I just checked in the fix in Spark master and branch-1.4.
> Could you download Spark and test it out?
>
>
>
>
Hi All,
I am trying to do word count on number of tweets, my first step is to get
data from table using spark sql and then run split function on top of it to
calculate word count.
Error:- valuse split is not a member of org.apache.spark.sql.SchemaRdd
Spark Code that doesn't work to do word coun
I used Spark on EC2 a while ago
I used Spark on EC2 a while ago, but recent revisions seem to have broken
the functionality.
Is anyone actually using Spark on EC2 at the moment?
The bug in question is:
https://issues.apache.org/jira/browse/SPARK-5008
It makes it impossible to use persistent HDFS without a workround on each
sl
I think you are looking for Df.explain
On 23 May 2015 12:51, "Pramod Biligiri" wrote:
> Hi,
> Is there an easy way to see how a SparkSQL query plan maps to different
> stages of the generated Spark job? The WebUI is entirely in terms of RDD
> stages and I'm having a hard time mapping it back to m
Hello all,
This is probably me doing something obviously wrong, would really
appreciate some pointers on how to fix this.
I installed spark-1.3.1-bin-hadoop2.6.tgz from the Spark download page [
https://spark.apache.org/downloads.html] and just untarred it on a local
drive. I am on Mac OSX 10.9.5
Replying to my own email in case someone has the same or similar issue.
On a hunch I ran this against my Linux (Ubuntu 14.04 with JDK 8) box. Not
only did "bin/run-example SparkPi" run without any problems, it also
provided a very helpful message in the output.
15/05/23 08:35:15 WARN Utils: Your
BTW flatmap is misspelled.
See RDD.scala:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U] = withScope {
On Sat, May 23, 2015 at 8:52 AM, Ted Yu wrote:
> hiveCtx.sql() returns DataFrame which doesn't have split method.
>
> The columns of a row in the result can be accessed by fiel
Hi all,
I have some doubts about the latest SparkSQL.
1. In the paper about SparkSQL it has been stated that "The physical
planner also performs rule-based physical optimizations, such as pipelining
projections or filters into one Spark map operation. ..."
If dealing with a query of the form:
s
Thanks!
I was getting a little confused by this partitioner business, I thought
that by default a pairRDD would be partitioned by a HashPartitioner? Was
this possibly the case in 0.9.3 but not in 1.x?
In anycase, I tried your suggestion and the shuffle was removed. Cheers.
One small question tho
Yes it does ... you can try out the following example (the People dataset
that comes with Spark). There is an inner query that filters on age and an
outer query that filters on name.
The physical plan applies a single composite filter on name and age as you
can see below
sqlContext.sql("select * f
Hi Yana,
Yes typeo in the eamil, file name is correct "spark-defaults.conf"; thanks
though. So it appears to work if in the driver is specify it as part of
the sparkConf:
val conf = new SparkConf().setAppName(getClass.getSimpleName)
.set("spark.executor.extraClassPath",
"/projects/spark-cassan
Yes.
We're looking at bootstrapping in EMR...
On Sat, May 23, 2015 at 07:21 Joe Wass wrote:
> I used Spark on EC2 a while ago
>
Yes-Spark EC2 cluster . Looking into migrating to spark emr.
Adding more ec2 is not possible afaik.
On May 23, 2015 11:22 AM, "Johan Beisser" wrote:
> Yes.
>
> We're looking at bootstrapping in EMR...
> On Sat, May 23, 2015 at 07:21 Joe Wass wrote:
>
>> I used Spark on EC2 a while ago
>>
>
Yup, and since I have only one core per executor it explains why there was
only one executor utilized. I'll need to investigate which EC2 instance
type is going to be the best fit.
Thanks Evo.
On Fri, May 22, 2015 at 3:47 PM, Evo Eftimov wrote:
> A receiver occupies a cpu core, an executor is s
Hi Brant,
Let me partially answer to your concerns: please follow a new open source
project PL/HQL (www.plhql.org) aimed at allowing you to reuse existing
logic and leverage existing skills at some extent, so you do not need to
rewrite everything to Scala/Java and can do this gradually. I hope it
Sorry guys, my email submitted before I finished writing it. Check my other
message (with the same subject)!
On 23 May 2015 at 20:25, Shafaq wrote:
> Yes-Spark EC2 cluster . Looking into migrating to spark emr.
> Adding more ec2 is not possible afaik.
> On May 23, 2015 11:22 AM, "Johan Beisser"
Yes, we're running Spark on EC2. Will transition to EMR soon. -Vadim
ᐧ
On Sat, May 23, 2015 at 2:22 PM, Johan Beisser wrote:
> Yes.
>
> We're looking at bootstrapping in EMR...
>
> On Sat, May 23, 2015 at 07:21 Joe Wass wrote:
>
>> I used Spark on EC2 a while ago
>>
>
My experience is don't put any application specific settings into
spark-defaults.conf which is applied to all applications.
Instead, you can either set them programmatically as what you did below or
through spark-submit.
Also, if you still like to do it via spark-defaults.conf, you will have
Hi Michael
This is great info. I am currently using repartitionandsort function to
achieve the same. Is this the recommended way till 1.3 or is there any
better way?
On 23 May 2015 07:38, "Michael Armbrust" wrote:
> DataFrames have a lot more information about the data, so there is a whole
> cla
Hi guys!
I have a small spark application. It's query some data from postgres,
enrich it and write to elasticsearch. When I deployed into spark container
I got a very fustrating error:
https://gist.github.com/b0c1/66527e00bada1e4c0dc3
Spark version: 1.3.1
Hadoop version: 2.6.0
Additional info:
In my local maven repo, I found:
$ jar tvf
/Users/tyu/.m2/repository//org/spark-project/akka/akka-actor_2.10/2.3.4-spark/akka-actor_2.10-2.3.4-spark.jar
| grep SelectionPath
521 Mon Sep 29 12:05:36 PDT 2014 akka/actor/SelectionPathElement.class
Is the above jar in your classpath ?
On Sat, May
Hello,
I am using Spark1.3 in AWS.
SparkSQL can't recognize Hive external table on S3.
The following is the error message.
I appreciate any help.
Thanks,
Okehee
--
15/05/24 01:02:18 ERROR thriftserver.SparkSQLDriver: Failed in [select
count(*) from api_search where pdate='2015-05-08']
java
>> It seems it generated query results into tmp dir firstly, and tries to rename
it into the right folder finally. But, it failed while renaming it.
This problem exists not only in SparkSQL but also in any Hadoop tools (e.g.
Hive, Pig, etc) when using with s3. Usually, It is better to write task
o
Hi,
I've been testing SparkSQL in 1.4 rc and found two issues. I wanted to
confirm whether these are bugs or not before opening a jira.
*1)* I can no longer compile SparkSQL with -Phive-0.12.0. I noticed that in
1.4, IsolatedClientLoader is introduced, and different versions of Hive
metastore jar
26 matches
Mail list logo