Re: Spark 2.3.2 : No of active tasks vastly exceeds total no of executor cores

2018-10-24 Thread Shing Hing Man
User List - [Spark UI] Spark 2.3.1 UI no longer respects sp... [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs. I recently upgraded to spark 2.3.1 I have ha... | | | Shing On Monday, 22 October 2018, 19:56:32 GMT+1, Shing Hing Man wrote: In my log, I

Re: Spark 2.3.2 : No of active tasks vastly exceeds total no of executor cores

2018-10-22 Thread Shing Hing Man
/AsyncEventQueue.scala#L154 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala#L172 On Fri, Oct 19, 2018 at 4:42 PM Shing Hing Man wrote: Hi,  I have just upgraded my application to Spark 2.3.2 from 2.2.1. When I run my Spark application in Yarn

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-20 Thread Shing Hing Man
I have the same problem when I upgrade my application from Spark 2.2.1 to Spark 2.3.2 and run in Yarn client mode. Also I noticed that in my Spark driver,  org.apache.spark.status.TaskDataWrapper could take up more than 2G of memory. Shing On Tuesday, 16 October 2018, 17:34:02 GMT+1, Patr

Spark 2.1.2 Spark Streaming checkpoint interval not respected

2017-11-18 Thread Shing Hing Man
Hi, In the following example using mapWithState, I set checkpoint interval to 1 minute. From the log, Spark stills write to the checkpoint directory every second. Would be appreciated if someone can point out what I have done wrong. object MapWithStateDemo { def main(args: Array[String]) {

Spark 1.6.2 Concurrent append to a HDFS folder with different partition key

2016-09-24 Thread Shing Hing Man
I am trying to prototype using a single instance SqlContext and use it toappend Dataframes,partition by a field, to the same HDFS folder from multiple threads. (Each thread will work with a DataFrame having different partition column value.) I get the exception16/09/24 16:45:12 ERROR [ForkJoinP

Fw: Your Application has been Received

2015-08-10 Thread Shing Hing Man
Bar123 On Monday, 10 August 2015, 20:20, Resourcing Team wrote: Dear Shing Hing, Thank you for applying to Barclays. We have received your application and are currently reviewing your details. Updates on your progress will be emailed to you and can be accessed through your profile

Re: How to read avro in SparkR

2015-06-14 Thread Shing Hing Man
if this is it, but could you please try "com.databricks.spark.avro" instead of just "avro".Thanks, BurakOn Jun 13, 2015 9:55 AM, "Shing Hing Man" wrote: Hi,  I am trying to read a avro file in SparkR (in Spark 1.4.0). I started R using the following. matmsh@gauss:

How to read avro in SparkR

2015-06-13 Thread Shing Hing Man
Hi,  I am trying to read a avro file in SparkR (in Spark 1.4.0). I started R using the following. matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0 Inside the R shell, when I issue the following, > read.df(sqlContext, "file:///home/matmsh/myfile.avro","avro") I get the follow

Re: How to skip corrupted avro files

2015-05-05 Thread Shing Hing Man
code for HadoopReliableRDD in the PR into your own code and use it, without having to wait for the issue to get resolved. On Sun, May 3, 2015 at 12:57 PM, Shing Hing Man wrote: Hi, I am using Spark 1.3.1 to read a directory of about 2000 avro files. The avro files are from a third party and a

How to skip corrupted avro files

2015-05-03 Thread Shing Hing Man
Hi, I am using Spark 1.3.1 to read a directory of about 2000 avro files. The avro files are from a third party and a few of them are corrupted. val path = "{myDirecotry of avro files}" val sparkConf = new SparkConf().setAppName("avroDemo").setMaster("local") val sc = new SparkContext(sparkCo

How to create data frame from an avro file in Spark 1.3.0

2015-03-14 Thread Shing Hing Man
In spark-avro 0.1,  the method AvroContext.avroFile  returns a SchemaRDD, which is deprecated in Spark 1.3.0 package com.databricks.spark import org.apache.spark.sql.{SQLContext, SchemaRDD} package object avro {   /**    * Adds a method, `avroFile`, to SQLContext that allows reading data stor

How to query Spark master for cluster status ?

2015-01-15 Thread Shing Hing Man
Hi,   I am using Spark 1.2.  The Spark master UI has a status.Is there a web service on the Spark master that returns the status of the cluster in Json ? Alternatively, what is the best way to determine if  a cluster is up. Thanks in advance for your assistance! Shing

Re: ScalaReflectionException when using saveAsParquetFile in sbt

2015-01-11 Thread Shing Hing Man
I have the same exception when I run the following example fromSpark SQL Programming Guide - Spark 1.2.0 Documentation |   | |   |   |   |   |   | | Spark SQL Programming Guide - Spark 1.2.0 DocumentationSpark SQL Programming Guide Overview Getting Started Data Sources RDDs Inferring the Schema

Re: Spark 1.0.2 Can GroupByTest example be run in Eclipse without change

2014-09-07 Thread Shing Hing Man
After looking at the source code of SparkConf.scala, I found the following solution. Just set the following Java system property : -Dspark.master=local Shing On Monday, 1 September 2014, 22:09, Shing Hing Man wrote: Hi, I have noticed that the GroupByTest example in https

Spark-cassandra-connector 1.0.0-rc5: java.io.NotSerializableException

2014-09-05 Thread Shing Hing Man
Hi, My version of Spark is 1.0.2. I am trying to use Spark-cassandra-connector to execute an update csql statement inside an CassandraConnector(conf).withSessionDo block : CassandraConnector(conf).withSessionDo { session => { myRdd.foreach { case (ip, value

Spark 1.0.2 Can GroupByTest example be run in Eclipse without change

2014-09-01 Thread Shing Hing Man
Hi, I have noticed that the GroupByTest example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala has been changed to be run using spark-submit. Previously, I set "local" as the first command line parameter, and this enable me t