Hi Divya,
Can you please provide full logs or Stacktrace.
Ankit
Thanks,
Ankit Jindal | Lead Engineer
GlobalLogic
P +91.120.406.2277 M +91.965.088.6887
www.globallogic.com
http://www.globallogic.com/email_disclaimer.txt
On Wed, Oct 5, 2016 at 10:29 AM, Divya Gehlot
wrote:
> Hi,
> One of
Hi,
One of my spark streaming long running job stopped all of sudden .
and I could see that
16/10/04 11:18:25 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
Can any body point me out the reason behind the driver commanded shut down.
Thanks,
Divya
First of all, if you want to read a txt file in Spark, you should use
sc.textFile, because you are using "Source.fromFile", so you are reading it
with Scala standard api, so it will be read sequentially.
Furthermore you are going to need create a schema if you want to use
dataframes.
El 5/10/2016
I'm not getting any support in this group, is the question not valid ? need
someone to reply to this question, we have a huge dependency on SAS which we
want to eliminate & want to know if spark can help.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/MLib-
Hello guys,
I'm new here. I'm using Spark 1.6.0, and I'm trying to programmatically
access a Yarn cluster from my scala app.
I create a SparkContext as usual, with the following code,
val sc = SparkContext.getOrCreate(new SparkConf().setMaster("yarn-client"))
My yarn-site.xml is being read corre
Right now, I am doing it like below,
import scala.io.Source
val animalsFile = "/home/ajay/dataset/animal_types.txt"
val animalTypes = Source.fromFile(animalsFile).getLines.toArray
for ( anmtyp <- animalTypes ) {
val distinctAnmTypCount = sqlContext.sql("select
count(distinct("+anmtyp+")) f
Hi Everyone,
I have a use-case where I have two Dataframes like below,
1) First Dataframe(DF1) contains,
*ANIMALS*
Mammals
Birds
Fish
Reptiles
Amphibians
2) Second Dataframe(DF2) contains,
*ID, Mammals, Birds, Fish, Reptiles, Amphibians*
1, Dogs, Eagle, Goldfish,
Hi,
On a secure hadoop cluster, spark shuffle is enabled (spark 1.6.0, shuffle
jar is spark-1.6.0-yarn-shuffle.jar). A client connecting using
spark-assembly_2.11-1.6.1.jar
gets errors starting executors, with following trace.
Could this be due to spark version mismatch ? Any thoughts ?
Thanks i
Yep... I was thinking about that... but it seems to work w JSON
jg
> On Oct 4, 2016, at 19:17, Peter Figliozzi wrote:
>
> It's pretty clear that df.col(xpath) is looking for a column named xpath in
> your df, not executing an xpath over an XML document as you wish. Try
> constructing a UDF
It's pretty clear that df.col(xpath) is looking for a column named xpath in
your df, not executing an xpath over an XML document as you wish. Try
constructing a UDF which applies your xpath query, and give that as the
second argument to withColumn.
On Tue, Oct 4, 2016 at 4:35 PM, Jean Georges Per
Spark 2.0.0
XML parser 0.4.0
Java
Hi,
I am trying to create a new column in my data frame, based on a value of a sub
element. I have done that several time with JSON, but not very successful in
XML.
(I know a world with less format would be easier :) )
Here is the code:
df.withColumn("Fulfill
It only exists in the latest docs, not in versions <= 1.6.
From: Sean Owen
Sent: Tuesday, October 4, 2016 1:51:49 PM
To: Sesterhenn, Mike; user@spark.apache.org
Subject: Re: Time-unit of RDD.countApprox timeout parameter
The API docs already say: "maximum time to
They have been published yesterday, but can take a while to propagate.
On Tue, Oct 4, 2016 at 12:58 PM, Prajwal Tuladhar wrote:
> Hi,
>
> It seems like, 2.0.1 artifact hasn't been published to Maven Central. Can
> anyone confirm?
>
> On Tue, Oct 4, 2016 at 5:39 PM, Reynold Xin wrote:
>
>> We a
Hi all
my mvn build of Spark 2.1 using Java 1.8 is spinning out of memory with an
error saying it cannot allocate enough memory during maven compilation
Instructions (in the Spark 2.0 page) says that MAVENOPTS are not needed for
Java 1.8 and , accoding to my understanding, spark build process wil
This should be fixed now; let me know if you see any more problems with
these download links.
On Tue, Oct 4, 2016 at 12:12 PM Sean Owen wrote:
Yeah I think the issue is possibly that the final real announcement is on
the mailing list, after the site is in order. Not sure. In any event the
downlo
Hello guys,
I have a job that reads compressed (Snappy) data but when I run the job, it
is throwing an error "native snappy library not available: this version
of libhadoop was built without snappy support".
.
I followed this instruction but it did not resolve the issue:
https://community.hortonwo
Yeah I think the issue is possibly that the final real announcement is on
the mailing list, after the site is in order. Not sure. In any event the
download should of course work by the time it's really released and it
doesn't now, not the direct download. This may be the reason it's not yet
announc
According to the official webpage it was released yesterday:
http://spark.apache.org/downloads.html
Our latest stable version is Apache Spark 2.0.1, released on Oct 3, 2016
2016-10-04 21:01 GMT+02:00 Sean Owen :
> Unless I totally missed it, 2.0.1 has not been formally released, but is
> abou
Unless I totally missed it, 2.0.1 has not been formally released, but is
about to be. I would not be surprised if it's literally being uploaded as
we speak and you're seeing an inconsistent state this hour.
On Tue, Oct 4, 2016 at 7:56 PM Daniel wrote:
> When you try download Spark 2.0.1 from off
confirmed
On Tue, Oct 4, 2016 at 11:56 AM, Daniel wrote:
> When you try download Spark 2.0.1 from official webpage you get this error:
>
> NoSuchKeyThe specified key does not
> exist.spark-2.0.1-bin-hadoop2.7.tgz6EA5F8FFFE6CCAEFg8UIuHetxWoGE0J/w2UtHn7DjKwATRKtHHHKu/2Mj2SmUPhPBZ+aoDPb+2uwn5J4Uj2vo
It's still there on master. It is in the "spark-tags" module however
(under common/tags), maybe something changed in the build environment
and it isn't made available as a dependency to your project? What
happens if you include the module as a direct dependency?
--Jakob
On Tue, Oct 4, 2016 at 10:
When you try download Spark 2.0.1 from official webpage you get this error:
NoSuchKeyThe specified key does not
exist.spark-2.0.1-bin-hadoop2.7.tgz6EA5F8FFFE6CCAEFg8UIuHetxWoGE0J/w2UtHn7DjKwATRKtHHHKu/2Mj2SmUPhPBZ+aoDPb+2uwn5J4Uj2voQa8WKg=
Just saying, to let Spark people to know it.
No, they're just in a separate module now, called spark-tags
On Tue, Oct 4, 2016 at 6:34 PM Liren Ding
wrote:
> I just upgrade from Spark 1.6.1 to 2.0, and got an java compile error:
> *error: cannot access DeveloperApi*
> * class file for org.apache.spark.annotation.DeveloperApi not found*
>
The API docs already say: "maximum time to wait for the job, in
milliseconds"
On Tue, Oct 4, 2016 at 7:14 PM Sesterhenn, Mike
wrote:
> Nevermind. Through testing it seems it is MILLISECONDS. This should be
> added to the docs.
> --
> *From:* Sesterhenn, Mike
> *Sent
Hi,
When I start Spark v1.6 (cdh5.8.0) in YARN Master mode I don't see API (
http://localhost:4040/api/v1/applications is unavailable) on port 4040.
I started Spark application like this:
spark-submit --master yarn-cluster --class
org.apache.spark.examples.SparkPi
/usr/lib/spark/examples
Nevermind. Through testing it seems it is MILLISECONDS. This should be added
to the docs.
From: Sesterhenn, Mike
Sent: Tuesday, October 4, 2016 1:02:25 PM
To: user@spark.apache.org
Subject: Time-unit of RDD.countApprox timeout parameter
Hi all,
Does anyone k
Hi all,
Does anyone know what the unit is on the 'timeout' parameter to the
RDD.countApprox() function?
(ie. is that seconds, milliseconds, nanoseconds, ...?)
I was searching through the source but it got hairy pretty quickly.
Thanks
I think you just hit https://issues.apache.org/jira/browse/SPARK-15899
Could you try 2.0.1?
On Tue, Oct 4, 2016 at 7:52 AM, Denis Bolshakov
wrote:
> I think you are wrong with port for hdfs file, as I remember default value
> is 8020, and not 9000.
>
> 4 Окт 2016 г. 17:29 пользователь "Hafiz Mu
We are happy to announce the availability of Spark 2.0.1!
Apache Spark 2.0.1 is a maintenance release containing 300 stability and
bug fixes. This release is based on the branch-2.0 maintenance branch of
Spark. We strongly recommend all 2.0.0 users to upgrade to this stable
release.
To download A
I just upgrade from Spark 1.6.1 to 2.0, and got an java compile error:
*error: cannot access DeveloperApi*
* class file for org.apache.spark.annotation.DeveloperApi not found*
>From the Spark 2.0 document (
https://spark.apache.org/docs/2.0.0/api/java/overview-summary.html), the
package org.apac
Hi there,
Currently working on a custom Encoder for a kind of schema-based Java
object. For the object's schema, field positions, and types are isomorphic
to SQL column ordinals and types. The implementation should be quite
similar to the JavaBean Encoder, but as we have a schema, class-based
refl
Few pointer from in addition:
1) Executor can also get lost if they hung up on GC and can't respond to
driver for timeout ms. That should be in executor logs though.
2) --conf "spark.shuffle.memoryFraction=0.8" that's very high shuffle
fraction. You should check UI for Event Timeline and exec logs
I think you are wrong with port for hdfs file, as I remember default value
is 8020, and not 9000.
4 Окт 2016 г. 17:29 пользователь "Hafiz Mujadid"
написал:
> Hi,
>
> I am trying example of structured streaming in spark using following piece
> of code,
>
> val spark = SparkSession
> .builder
> .a
Hi,
I am trying example of structured streaming in spark using following piece
of code,
val spark = SparkSession
.builder
.appName("testingSTructuredQuery")
.master("local")
.getOrCreate()
import spark.implicits._
val userSchema = new StructType()
.add("name", "string").add("age", "integer")
val
You should check your executor log to identify the reason. My guess is that the
executor is dead due to OOM.
If it is the reason, then you need to tune your executor memory setting, or
more important, your partitions count, to make sure you have enough memory to
handle correct size of partitio
Got any solution for this?
On Tuesday 04 October 2016 05:37 AM, Punit Naik wrote:
Hi All
I am trying to run a program for a large dataset (~ 1TB). I have
already tested the code for low size of data and it works fine. But
what I noticed is that he job fails if the size of input is large. It
Yes, I did set spark.sql.hive.thriftServer.singleSession to true in
spark-defaults.conf of both spark sessions. after starting the 2nd spark
session, I manually set hive.server2.thrift.port to the spark thrift port
started within the 1st spark session, the temporary table is still not visible.
Hi,
I have the following schema:
-root
|-timestamp
|-date
|-year
|-month
|-day
|-some_column
|-some_other_column
I'd like to achieve either of these:
1) Use the timestamp field to partition by year, month and day.
This looks weird though, as Spark wouldn't magically know how to lo
Hi All!
I'm using Spark 1.6.1 and I'm trying to transform my DStream as follows:
myStream.transorm { rdd =>
val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)
import sqlContext.implicits._
val j = rdd.toDS()
j.map {
case a => Some(...)
case _ =
2.0.1 just passed the vote and should be available within this week.
2016-10-04 9:03 GMT+02:00 Aseem Bansal :
> Hi
>
> I looked at Maven Central releases and guessed that spark has something
> like 2 months release cycle or sometimes even monthly. But the release of
> Spark 2.0.0 was in July so m
Hi,
sry for the late reply. A public holiday in Germany.
Yes, its using a unique group id which no other job or consumer group is
using. I have increased the session.timeout to 1 minutes and set the
max.poll.rate to 1000. The processing takes ~1 second.
2016-09-29 4:41 GMT+02:00 Cody Koeninger :
Hi
I looked at Maven Central releases and guessed that spark has something
like 2 months release cycle or sometimes even monthly. But the release of
Spark 2.0.0 was in July so maybe that is wrong. When will the next version
be released or is it more on an ad-hoc basis?
Asking as there are some fi
42 matches
Mail list logo