Excerpts from Prem Sure's message of 2018-07-04 19:39:29 +0530:
> Hoping below would help in clearing some..
> executors dont have control to share the data among themselves except
> sharing accumulators via driver's support.
> Its all based on the data locality or remote nature, tasks/stages are
>
Hi,
I am writing a structured streaming application, where I process records
post some validation (lets say , not null).
Want to keep a counter of invalid records in the long running streaming
application while other valid records get processed.
How can I achieve it ?
First thought was using LongA
Hi Amiya/Jürgen,
Did you get any lead on this ?
I want to process records post some validation.
Correct records should go in sink1 and incorrect records should go in sink2.
How to achieve this in single stream ?
Regards,
Chandan
On Wed, Jun 13, 2018 at 2:30 PM Amiya Mishra
wrote:
> Hi Jürgen,
>
Hi, I'm running spark on YARN. My code is very simple. I want to kill one
executor when "data.repartition(10)" is executed. Ho can I do it in easy
way?
val data = sc.sequenceFile[NullWritable, BytesWritable](inputPath)
.map { case (key, value) =>
Data.fromBytes(value)
}
process = data.repartitio
try .pipe(.py) on RDD
Thanks,
Prem
On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri
wrote:
> Can someone please suggest me , thanks
>
> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri,
> wrote:
>
>> Hello Dear Spark User / Dev,
>>
>> I would like to pass Python user defined function to Spark Job develo
Can someone please suggest me , thanks
On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri,
wrote:
> Hello Dear Spark User / Dev,
>
> I would like to pass Python user defined function to Spark Job developed
> using Scala and return value of that function would be returned to DF /
> Dataset API.
>
> Can so
Hi Aakash,
For clarification are you running this in Yarn client mode or standalone?
How much total yarn memory is available?
>From my experience for a bigger cluster I found the following incremental
settings useful (CDH 5.9, Yarn client) so you can scale yours
[1] - 576GB
--num-executors 24
Can you share the API that your jobs use.. just core RDDs or SQL or
DStreams..etc?
refer recommendations from
https://spark.apache.org/docs/2.3.0/configuration.html for detailed
configurations.
Thanks,
Prem
On Wed, Jul 4, 2018 at 12:34 PM, Aakash Basu
wrote:
> I do not want to change executor/d
Hoping below would help in clearing some..
executors dont have control to share the data among themselves except
sharing accumulators via driver's support.
Its all based on the data locality or remote nature, tasks/stages are
defined to perform which may result in shuffle.
On Wed, Jul 4, 2018 at 1
Hello,
I have a question on Spark Dataflow. If I understand correctly, all
received data is sent from the executor to the driver of the application
prior to task creation.
Then the task embeding the data transit from the driver to the executor
in order to be processed.
As executor cannot e
I do not want to change executor/driver cores/memory on the fly in a single
Spark job, all I want is to make them cluster specific. So, I want to have
a formulae, with which, depending on the size of driver and executor
details, I can find out the values for them before submitting those details
in
11 matches
Mail list logo