date:20180704

Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow

2018-07-04 Thread Thomas Lavocat

Excerpts from Prem Sure's message of 2018-07-04 19:39:29 +0530: > Hoping below would help in clearing some.. > executors dont have control to share the data among themselves except > sharing accumulators via driver's support. > Its all based on the data locality or remote nature, tasks/stages are >

structured streaming: how to keep counter of error records in log running streaming application

2018-07-04 Thread chandan prakash

Hi, I am writing a structured streaming application, where I process records post some validation (lets say , not null). Want to keep a counter of invalid records in the long running streaming application while other valid records get processed. How can I achieve it ? First thought was using LongA

Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

2018-07-04 Thread chandan prakash

Hi Amiya/Jürgen, Did you get any lead on this ? I want to process records post some validation. Correct records should go in sink1 and incorrect records should go in sink2. How to achieve this in single stream ? Regards, Chandan On Wed, Jun 13, 2018 at 2:30 PM Amiya Mishra wrote: > Hi Jürgen, >

Kill spark executor when spark runs specific stage

2018-07-04 Thread Serega Sheypak

Hi, I'm running spark on YARN. My code is very simple. I want to kill one executor when "data.repartition(10)" is executed. Ho can I do it in easy way? val data = sc.sequenceFile[NullWritable, BytesWritable](inputPath) .map { case (key, value) => Data.fromBytes(value) } process = data.repartitio

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Prem Sure

try .pipe(.py) on RDD Thanks, Prem On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri wrote: > Can someone please suggest me , thanks > > On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, > wrote: > >> Hello Dear Spark User / Dev, >> >> I would like to pass Python user defined function to Spark Job develo

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Chetan Khatri

Can someone please suggest me , thanks On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, wrote: > Hello Dear Spark User / Dev, > > I would like to pass Python user defined function to Spark Job developed > using Scala and return value of that function would be returned to DF / > Dataset API. > > Can so

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Mich Talebzadeh

Hi Aakash, For clarification are you running this in Yarn client mode or standalone? How much total yarn memory is available? >From my experience for a bigger cluster I found the following incremental settings useful (CDH 5.9, Yarn client) so you can scale yours [1] - 576GB --num-executors 24

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Prem Sure

Can you share the API that your jobs use.. just core RDDs or SQL or DStreams..etc? refer recommendations from https://spark.apache.org/docs/2.3.0/configuration.html for detailed configurations. Thanks, Prem On Wed, Jul 4, 2018 at 12:34 PM, Aakash Basu wrote: > I do not want to change executor/d

Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow

2018-07-04 Thread Prem Sure

Hoping below would help in clearing some.. executors dont have control to share the data among themselves except sharing accumulators via driver's support. Its all based on the data locality or remote nature, tasks/stages are defined to perform which may result in shuffle. On Wed, Jul 4, 2018 at 1

[Spark Streaming MEMORY_ONLY] Understanding Dataflow

2018-07-04 Thread thomas lavocat

Hello, I have a question on Spark Dataflow. If I understand correctly, all received data is sent from the executor to the driver of the application prior to task creation. Then the task embeding the data transit from the driver to the executor in order to be processed. As executor cannot e

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Aakash Basu

I do not want to change executor/driver cores/memory on the fly in a single Spark job, all I want is to make them cluster specific. So, I want to have a formulae, with which, depending on the size of driver and executor details, I can find out the values for them before submitting those details in

Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow

structured streaming: how to keep counter of error records in log running streaming application

Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

Kill spark executor when spark runs specific stage

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

Re: Inferring Data driven Spark parameters

Re: Inferring Data driven Spark parameters

Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow

[Spark Streaming MEMORY_ONLY] Understanding Dataflow

Re: Inferring Data driven Spark parameters

11 matches

Site Navigation

Mail list logo

Footer information