date:20171205

unable to connect to connect to cluster 2.2.0

2017-12-05 Thread Imran Rajjad

Hi, Recently upgraded from 2.1.1 to 2.2.0. My Streaming job seems to have broken. The submitted application is unable to connect to the cluster, when all is running. below is my stack trace Spark Master:spark://192.168.10.207:7077 Job Arguments: -appName orange_watch -directory /u01/watch/stream/

Spark job only starts tasks on a single node

2017-12-05 Thread Ji Yan

Hi all, I am running Spark 2.0 on Mesos 1.1. I was trying to split up my job onto several nodes. I try to set the number of executors by the formula (spark.cores.max / spark.executor.cores). The behavior I saw was that Spark will try to fill up on one mesos node as many executors as it can, then i

How to export the Spark SQL jobs from the HiveThriftServer2

2017-12-05 Thread wenxing zheng

Dear all, I have a HiveThriftServer2 serer running and most of our spark SQLs will go there for calculation. From the Yarn GUI, I can see the application id and the attempt ID of the thrift server. But with the REST api described on the page (https://spark.apache.org/docs/latest/monitoring.html#re

Re: Access to Applications metrics

2017-12-05 Thread Holden Karau

I've done a SparkListener to record metrics for validation (it's a bit out of date). Are you just looking to have graphing/alerting set up on the Spark metrics? On Tue, Dec 5, 2017 at 1:53 PM, Thakrar, Jayesh < jthak...@conversantmedia.com> wrote: > You can also get the metrics from the Spark app

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread kant kodali

@Richard I don't see any error in the executor log but let me run again to make sure. @Gerard Thanks much! but would your answer on .collect() change depending on running the spark app in client vs cluster mode? Thanks! On Tue, Dec 5, 2017 at 1:54 PM, Gerard Maas wrote: > The general answer t

Re: learning Spark

2017-12-05 Thread makoto

This gitbook explains Spark compotents in detail. 'Mastering Apache Spark 2' https://www.gitbook.com/book/jaceklaskowski/mastering-apache-spark/details 2017-12-04 12:48 GMT+09:00 Manuel Sopena Ballesteros < manuel...@garvan.org.au>: > Dear Spark community, > > > > Is there any resource (book

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread Gerard Maas

The general answer to your initial question is that "it depends". If the operation in the rdd.foreach() closure can be parallelized, then you don't need to collect first. If it needs some local context (e.g. a socket connection), then you need to do rdd.collect first to bring the data locally, whic

Re: Access to Applications metrics

2017-12-05 Thread Thakrar, Jayesh

You can also get the metrics from the Spark application events log file. See https://www.slideshare.net/JayeshThakrar/apache-bigdata2017sparkprofiling From: "Qiao, Richard" Date: Monday, December 4, 2017 at 6:09 PM To: Nick Dimiduk , "user@spark.apache.org" Subject: Re: Access to Applications

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread Qiao, Richard

In the 2nd case, is there any producer’s error thrown in executor’s log? Best Regards Richard From: kant kodali Date: Tuesday, December 5, 2017 at 4:38 PM To: "Qiao, Richard" Cc: "user @spark" Subject: Re: Do I need to do .collect inside forEachRDD Reads from Kafka and outputs to Kafka. so I

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-05 Thread Marcelo Vanzin

On Tue, Dec 5, 2017 at 12:43 PM, bsikander wrote: > 2) If I use context.addSparkListener, I can customize the listener but then > I miss the onApplicationStart event. Also, I don't know the Spark's logic to > changing the state of application from WAITING -> RUNNING. I'm not sure I follow you her

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread kant kodali

Reads from Kafka and outputs to Kafka. so I check the output from Kafka. On Tue, Dec 5, 2017 at 1:26 PM, Qiao, Richard wrote: > Where do you check the output result for both case? > > Sent from my iPhone > > > On Dec 5, 2017, at 15:36, kant kodali wrote: > > > > Hi All, > > > > I have a simple

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread Qiao, Richard

Where do you check the output result for both case? Sent from my iPhone > On Dec 5, 2017, at 15:36, kant kodali wrote: > > Hi All, > > I have a simple stateless transformation using Dstreams (stuck with the old > API for one of the Application). The pseudo code is rough like this > > dstream

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-05 Thread bsikander

Thank you for the reply. I am not a Spark expert but I was reading through the code and I thought that the state was changed from SUBMITTED to RUNNING only after executors (CoarseGrainedExecutorBackend) were registered. https://github.com/apache/spark/commit/015f7ef503d5544f79512b626749a1f0c48

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-05 Thread Marcelo Vanzin

SparkLauncher operates at a different layer than Spark applications. It doesn't know about executors or driver or anything, just whether the Spark application was started or not. So it doesn't work for your case. The best option for your case is to install a SparkListener and monitor events. But t

Do I need to do .collect inside forEachRDD

2017-12-05 Thread kant kodali

Hi All, I have a simple stateless transformation using Dstreams (stuck with the old API for one of the Application). The pseudo code is rough like this dstream.map().reduce().forEachRdd(rdd -> { rdd.collect(),forEach(); // Is this necessary ? Does execute fine but a bit slow }) I understand

Apache Spark 2.3 and Apache ORC 1.4 finally

2017-12-05 Thread Dongjoon Hyun

Hi, All. Today, Apache Spark starts to use Apache ORC 1.4 as a `native` ORC implementation. SPARK-20728 Make OrcFileFormat configurable between `sql/hive` and `sql/core`. - https://github.com/apache/spark/commit/326f1d6728a7734c228d8bfaa69442a1c7b92e9b Thank you so much for all your supports for

Re: How to persistent database/table created in sparkSession

2017-12-05 Thread Wenchen Fan

Try with `SparkSession.builder().enableHiveSupport` ? On Tue, Dec 5, 2017 at 3:22 PM, 163 wrote: > Hi, > How can I persistent database/table created in spark application? > > object TestPersistentDB { > def main(args:Array[String]): Unit = { > val spark = SparkSession

Support for storing date time fields as TIMESTAMP_MILLIS(INT64)

2017-12-05 Thread Rahul Raj

Hi, I believe spark writes datetime fields as INT96. What are the implications of https://issues.apache.org/jira/browse/SPARK-10364(Support Parquet logical type TIMESTAMP_MILLIS) which is part of 2.2.0? I am having issues while reading spark generated parquet dates using Apache Drill (Drill suppo

Re: learning Spark

2017-12-05 Thread Jean Georges Perrin

When you pick a book, make sure it covers the version of Spark you want to deploy. There are a lot of books out there that focus a lot on Spark 1.x. Spark 2.x generalizes the dataframe API, introduces Tungsten, etc. All might not be relevant to a pure “sys admin” learning, but it is good to know

unable to connect to connect to cluster 2.2.0

Spark job only starts tasks on a single node

How to export the Spark SQL jobs from the HiveThriftServer2

Re: Access to Applications metrics

Re: Do I need to do .collect inside forEachRDD

Re: learning Spark

Re: Do I need to do .collect inside forEachRDD

Re: Access to Applications metrics

Re: Do I need to do .collect inside forEachRDD

Re: Programmatically get status of job (WAITING/RUNNING)

Re: Do I need to do .collect inside forEachRDD

Re: Do I need to do .collect inside forEachRDD

Re: Programmatically get status of job (WAITING/RUNNING)

Re: Programmatically get status of job (WAITING/RUNNING)

Do I need to do .collect inside forEachRDD

Apache Spark 2.3 and Apache ORC 1.4 finally

Re: How to persistent database/table created in sparkSession

Support for storing date time fields as TIMESTAMP_MILLIS(INT64)

Re: learning Spark

19 matches

Site Navigation

Mail list logo

Footer information