Re: spark streaming job stopped

2016-10-04 Thread Ankit Jindal
Hi Divya, Can you please provide full logs or Stacktrace. Ankit Thanks, Ankit Jindal | Lead Engineer GlobalLogic P +91.120.406.2277 M +91.965.088.6887 www.globallogic.com http://www.globallogic.com/email_disclaimer.txt On Wed, Oct 5, 2016 at 10:29 AM, Divya Gehlot wrote: > Hi, > One of

spark streaming job stopped

2016-10-04 Thread Divya Gehlot
Hi, One of my spark streaming long running job stopped all of sudden . and I could see that 16/10/04 11:18:25 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown Can any body point me out the reason behind the driver commanded shut down. Thanks, Divya

Re: UseCase_Design_Help

2016-10-04 Thread Daniel
First of all, if you want to read a txt file in Spark, you should use sc.textFile, because you are using "Source.fromFile", so you are reading it with Scala standard api, so it will be read sequentially. Furthermore you are going to need create a schema if you want to use dataframes. El 5/10/2016

Re: MLib : Non Linear Optimization

2016-10-04 Thread nsareen
I'm not getting any support in this group, is the question not valid ? need someone to reply to this question, we have a huge dependency on SAS which we want to eliminate & want to know if spark can help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLib-

Problem creating SparkContext to connect to YARN cluster

2016-10-04 Thread Alberto Andreotti
Hello guys, I'm new here. I'm using Spark 1.6.0, and I'm trying to programmatically access a Yarn cluster from my scala app. I create a SparkContext as usual, with the following code, val sc = SparkContext.getOrCreate(new SparkConf().setMaster("yarn-client")) My yarn-site.xml is being read corre

Re: UseCase_Design_Help

2016-10-04 Thread Ajay Chander
Right now, I am doing it like below, import scala.io.Source val animalsFile = "/home/ajay/dataset/animal_types.txt" val animalTypes = Source.fromFile(animalsFile).getLines.toArray for ( anmtyp <- animalTypes ) { val distinctAnmTypCount = sqlContext.sql("select count(distinct("+anmtyp+")) f

UseCase_Design_Help

2016-10-04 Thread Ajay Chander
Hi Everyone, I have a use-case where I have two Dataframes like below, 1) First Dataframe(DF1) contains, *ANIMALS* Mammals Birds Fish Reptiles Amphibians 2) Second Dataframe(DF2) contains, *ID, Mammals, Birds, Fish, Reptiles, Amphibians* 1, Dogs, Eagle, Goldfish,

Any issues if spark 1.6.1 client connects to spark 1.6.0 external shuffle services

2016-10-04 Thread Manoj Samel
Hi, On a secure hadoop cluster, spark shuffle is enabled (spark 1.6.0, shuffle jar is spark-1.6.0-yarn-shuffle.jar). A client connecting using spark-assembly_2.11-1.6.1.jar gets errors starting executors, with following trace. Could this be due to spark version mismatch ? Any thoughts ? Thanks i

Re: Parsing XML

2016-10-04 Thread Jean Georges Perrin
Yep... I was thinking about that... but it seems to work w JSON jg > On Oct 4, 2016, at 19:17, Peter Figliozzi wrote: > > It's pretty clear that df.col(xpath) is looking for a column named xpath in > your df, not executing an xpath over an XML document as you wish. Try > constructing a UDF

Re: Parsing XML

2016-10-04 Thread Peter Figliozzi
It's pretty clear that df.col(xpath) is looking for a column named xpath in your df, not executing an xpath over an XML document as you wish. Try constructing a UDF which applies your xpath query, and give that as the second argument to withColumn. On Tue, Oct 4, 2016 at 4:35 PM, Jean Georges Per

Parsing XML

2016-10-04 Thread Jean Georges Perrin
Spark 2.0.0 XML parser 0.4.0 Java Hi, I am trying to create a new column in my data frame, based on a value of a sub element. I have done that several time with JSON, but not very successful in XML. (I know a world with less format would be easier :) ) Here is the code: df.withColumn("Fulfill

Re: Time-unit of RDD.countApprox timeout parameter

2016-10-04 Thread Sesterhenn, Mike
It only exists in the latest docs, not in versions <= 1.6. From: Sean Owen Sent: Tuesday, October 4, 2016 1:51:49 PM To: Sesterhenn, Mike; user@spark.apache.org Subject: Re: Time-unit of RDD.countApprox timeout parameter The API docs already say: "maximum time to

Re: [ANNOUNCE] Announcing Spark 2.0.1

2016-10-04 Thread Reynold Xin
They have been published yesterday, but can take a while to propagate. On Tue, Oct 4, 2016 at 12:58 PM, Prajwal Tuladhar wrote: > Hi, > > It seems like, 2.0.1 artifact hasn't been published to Maven Central. Can > anyone confirm? > > On Tue, Oct 4, 2016 at 5:39 PM, Reynold Xin wrote: > >> We a

building Spark 2.1 vs Java 1.8 on Ubuntu 16/06

2016-10-04 Thread Marco Mistroni
Hi all my mvn build of Spark 2.1 using Java 1.8 is spinning out of memory with an error saying it cannot allocate enough memory during maven compilation Instructions (in the Spark 2.0 page) says that MAVENOPTS are not needed for Java 1.8 and , accoding to my understanding, spark build process wil

Re: Error downloading Spark 2.0.1

2016-10-04 Thread Josh Rosen
This should be fixed now; let me know if you see any more problems with these download links. On Tue, Oct 4, 2016 at 12:12 PM Sean Owen wrote: Yeah I think the issue is possibly that the final real announcement is on the mailing list, after the site is in order. Not sure. In any event the downlo

[Spark] native snappy library not available: this version of libhadoop was built without snappy support.

2016-10-04 Thread Uthayan Suthakar
Hello guys, I have a job that reads compressed (Snappy) data but when I run the job, it is throwing an error "native snappy library not available: this version of libhadoop was built without snappy support". . I followed this instruction but it did not resolve the issue: https://community.hortonwo

Re: Error downloading Spark 2.0.1

2016-10-04 Thread Sean Owen
Yeah I think the issue is possibly that the final real announcement is on the mailing list, after the site is in order. Not sure. In any event the download should of course work by the time it's really released and it doesn't now, not the direct download. This may be the reason it's not yet announc

Re: Error downloading Spark 2.0.1

2016-10-04 Thread Daniel
According to the official webpage it was released yesterday: http://spark.apache.org/downloads.html Our latest stable version is Apache Spark 2.0.1, released on Oct 3, 2016 2016-10-04 21:01 GMT+02:00 Sean Owen : > Unless I totally missed it, 2.0.1 has not been formally released, but is > abou

Re: Error downloading Spark 2.0.1

2016-10-04 Thread Sean Owen
Unless I totally missed it, 2.0.1 has not been formally released, but is about to be. I would not be surprised if it's literally being uploaded as we speak and you're seeing an inconsistent state this hour. On Tue, Oct 4, 2016 at 7:56 PM Daniel wrote: > When you try download Spark 2.0.1 from off

Re: Error downloading Spark 2.0.1

2016-10-04 Thread Jakob Odersky
confirmed On Tue, Oct 4, 2016 at 11:56 AM, Daniel wrote: > When you try download Spark 2.0.1 from official webpage you get this error: > > NoSuchKeyThe specified key does not > exist.spark-2.0.1-bin-hadoop2.7.tgz6EA5F8FFFE6CCAEFg8UIuHetxWoGE0J/w2UtHn7DjKwATRKtHHHKu/2Mj2SmUPhPBZ+aoDPb+2uwn5J4Uj2vo

Re: Package org.apache.spark.annotation no longer exist in Spark 2.0?

2016-10-04 Thread Jakob Odersky
It's still there on master. It is in the "spark-tags" module however (under common/tags), maybe something changed in the build environment and it isn't made available as a dependency to your project? What happens if you include the module as a direct dependency? --Jakob On Tue, Oct 4, 2016 at 10:

Error downloading Spark 2.0.1

2016-10-04 Thread Daniel
When you try download Spark 2.0.1 from official webpage you get this error: NoSuchKeyThe specified key does not exist.spark-2.0.1-bin-hadoop2.7.tgz6EA5F8FFFE6CCAEFg8UIuHetxWoGE0J/w2UtHn7DjKwATRKtHHHKu/2Mj2SmUPhPBZ+aoDPb+2uwn5J4Uj2voQa8WKg= Just saying, to let Spark people to know it.

Re: Package org.apache.spark.annotation no longer exist in Spark 2.0?

2016-10-04 Thread Sean Owen
No, they're just in a separate module now, called spark-tags On Tue, Oct 4, 2016 at 6:34 PM Liren Ding wrote: > I just upgrade from Spark 1.6.1 to 2.0, and got an java compile error: > *error: cannot access DeveloperApi* > * class file for org.apache.spark.annotation.DeveloperApi not found* >

Re: Time-unit of RDD.countApprox timeout parameter

2016-10-04 Thread Sean Owen
The API docs already say: "maximum time to wait for the job, in milliseconds" On Tue, Oct 4, 2016 at 7:14 PM Sesterhenn, Mike wrote: > Nevermind. Through testing it seems it is MILLISECONDS. This should be > added to the docs. > -- > *From:* Sesterhenn, Mike > *Sent

Re: Spark metrics when running with YARN?

2016-10-04 Thread Vladimir Tretyakov
Hi, When I start Spark v1.6 (cdh5.8.0) in YARN Master mode I don't see API ( http://localhost:4040/api/v1/applications is unavailable) on port 4040. I started Spark application like this: spark-submit --master yarn-cluster --class org.apache.spark.examples.SparkPi /usr/lib/spark/examples

Re: Time-unit of RDD.countApprox timeout parameter

2016-10-04 Thread Sesterhenn, Mike
Nevermind. Through testing it seems it is MILLISECONDS. This should be added to the docs. From: Sesterhenn, Mike Sent: Tuesday, October 4, 2016 1:02:25 PM To: user@spark.apache.org Subject: Time-unit of RDD.countApprox timeout parameter Hi all, Does anyone k

Time-unit of RDD.countApprox timeout parameter

2016-10-04 Thread Sesterhenn, Mike
Hi all, Does anyone know what the unit is on the 'timeout' parameter to the RDD.countApprox() function? (ie. is that seconds, milliseconds, nanoseconds, ...?) I was searching through the source but it got hairy pretty quickly. Thanks

Re: java.net.URISyntaxException

2016-10-04 Thread Shixiong(Ryan) Zhu
I think you just hit https://issues.apache.org/jira/browse/SPARK-15899 Could you try 2.0.1? On Tue, Oct 4, 2016 at 7:52 AM, Denis Bolshakov wrote: > I think you are wrong with port for hdfs file, as I remember default value > is 8020, and not 9000. > > 4 Окт 2016 г. 17:29 пользователь "Hafiz Mu

[ANNOUNCE] Announcing Spark 2.0.1

2016-10-04 Thread Reynold Xin
We are happy to announce the availability of Spark 2.0.1! Apache Spark 2.0.1 is a maintenance release containing 300 stability and bug fixes. This release is based on the branch-2.0 maintenance branch of Spark. We strongly recommend all 2.0.0 users to upgrade to this stable release. To download A

Package org.apache.spark.annotation no longer exist in Spark 2.0?

2016-10-04 Thread Liren Ding
I just upgrade from Spark 1.6.1 to 2.0, and got an java compile error: *error: cannot access DeveloperApi* * class file for org.apache.spark.annotation.DeveloperApi not found* >From the Spark 2.0 document ( https://spark.apache.org/docs/2.0.0/api/java/overview-summary.html), the package org.apac

Extracting Row Value for Deserializer Expression

2016-10-04 Thread Aleksander Eskilson
Hi there, Currently working on a custom Encoder for a kind of schema-based Java object. For the object's schema, field positions, and types are isomorphic to SQL column ordinals and types. The implementation should be quite similar to the JavaBean Encoder, but as we have a schema, class-based refl

Re: Executor Lost error

2016-10-04 Thread Nirav Patel
Few pointer from in addition: 1) Executor can also get lost if they hung up on GC and can't respond to driver for timeout ms. That should be in executor logs though. 2) --conf "spark.shuffle.memoryFraction=0.8" that's very high shuffle fraction. You should check UI for Event Timeline and exec logs

Re: java.net.URISyntaxException

2016-10-04 Thread Denis Bolshakov
I think you are wrong with port for hdfs file, as I remember default value is 8020, and not 9000. 4 Окт 2016 г. 17:29 пользователь "Hafiz Mujadid" написал: > Hi, > > I am trying example of structured streaming in spark using following piece > of code, > > val spark = SparkSession > .builder > .a

java.net.URISyntaxException

2016-10-04 Thread Hafiz Mujadid
Hi, I am trying example of structured streaming in spark using following piece of code, val spark = SparkSession .builder .appName("testingSTructuredQuery") .master("local") .getOrCreate() import spark.implicits._ val userSchema = new StructType() .add("name", "string").add("age", "integer") val

Re: Executor Lost error

2016-10-04 Thread Yong Zhang
You should check your executor log to identify the reason. My guess is that the executor is dead due to OOM. If it is the reason, then you need to tune your executor memory setting, or more important, your partitions count, to make sure you have enough memory to handle correct size of partitio

Re: Executor Lost error

2016-10-04 Thread Aditya
Got any solution for this? On Tuesday 04 October 2016 05:37 AM, Punit Naik wrote: Hi All I am trying to run a program for a large dataset (~ 1TB). I have already tested the code for low size of data and it works fine. But what I noticed is that he job fails if the size of input is large. It

Re: access spark thrift server from another spark session

2016-10-04 Thread Herman Yu
Yes, I did set spark.sql.hive.thriftServer.singleSession to true in spark-defaults.conf of both spark sessions. after starting the 2nd spark session, I manually set hive.server2.thrift.port to the spark thrift port started within the 1st spark session, the temporary table is still not visible.

DataFrame API: how to partition by a "virtual" column, or by a nested column?

2016-10-04 Thread Samy Dindane
Hi, I have the following schema: -root |-timestamp |-date |-year |-month |-day |-some_column |-some_other_column I'd like to achieve either of these: 1) Use the timestamp field to partition by year, month and day. This looks weird though, as Spark wouldn't magically know how to lo

NotSerializableException in DStream.transform

2016-10-04 Thread Andrew A
Hi All! I'm using Spark 1.6.1 and I'm trying to transform my DStream as follows: myStream.transorm { rdd => val sqlContext = SQLContext.getOrCreate(rdd.sparkContext) import sqlContext.implicits._ val j = rdd.toDS() j.map { case a => Some(...) case _ =

Re: When will the next version of spark be released?

2016-10-04 Thread Matthias Niehoff
2.0.1 just passed the vote and should be available within this week. 2016-10-04 9:03 GMT+02:00 Aseem Bansal : > Hi > > I looked at Maven Central releases and guessed that spark has something > like 2 months release cycle or sometimes even monthly. But the release of > Spark 2.0.0 was in July so m

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-04 Thread Matthias Niehoff
Hi, sry for the late reply. A public holiday in Germany. Yes, its using a unique group id which no other job or consumer group is using. I have increased the session.timeout to 1 minutes and set the max.poll.rate to 1000. The processing takes ~1 second. 2016-09-29 4:41 GMT+02:00 Cody Koeninger :

When will the next version of spark be released?

2016-10-04 Thread Aseem Bansal
Hi I looked at Maven Central releases and guessed that spark has something like 2 months release cycle or sometimes even monthly. But the release of Spark 2.0.0 was in July so maybe that is wrong. When will the next version be released or is it more on an ad-hoc basis? Asking as there are some fi