Hi Stephan,
I figured it out. The problem was that the date/time was different on all
3 nodes. Zookeeper thought that it hadn’t heard from the other nodes for
longer than the allowed period and dropped them, therefore causing the
other two task managers in the cluster to fail. I synchronized the t
Hi Stephan,
I’m using DataStream.writeAsText(String path, WriteMode writemode) for my
sink. The data is written to disk and there’s plenty of space available.
I looked deeper into the logs and found out that the jobs on 174 and 175
are not actually stuck, but they’re moving extremely slowly, This
Hi Ali!
I see, so the tasks 192.168.200.174 and 192.168.200.175 apparently do not
make progress, even do not recognize the end-of-stream point.
I expect that the streams on 192.168.200.174 and 192.168.200.175 are
back-pressured to a stand-still. Since no network is involved, the reason
for the ba
Hi Stephan,
I got a request to share the image with someone and I assume it was you.
You should be able to see it now. This seems to be the main issue I have
at this time. I've tried running the job on the cluster with a parallelism
of 16, 24, 36, and even went up to 48. I see all the parallel pip
Hi Ali!
Seems like the Google Doc has restricted access, I tells me I have no
permission to view it...
Stephan
On Wed, Dec 9, 2015 at 8:49 PM, Kashmar, Ali wrote:
> Hi Stephan,
>
> Here’s a link to the screenshot I tried to attach earlier:
>
> https://drive.google.com/open?id=0B0_jTR8-IvUcMEd
Hi Stephan,
Here’s a link to the screenshot I tried to attach earlier:
https://drive.google.com/open?id=0B0_jTR8-IvUcMEdjWGFmYXJYS28
It looks to me like the distribution is fairly skewed across the nodes,
even though they’re executing the same pipeline.
Thanks,
Ali
On 2015-12-09, 12:36 PM, "S
Hi!
The parallel socket source looks good.
I think you forgot to attach the screenshot, or the mailing list dropped
the attachment...
Not sure if I can diagnose that without more details. The sources all do
the same. Assuming that the server distributes the data evenly across all
connected socket
Hi Stephan,
That was my original understanding, until I realized that I was not using
a parallel socket source. I had a custom source that extended
SourceFunction which always runs with parallelism = 1. I looked through
the API and found the ParallelSourceFunction interface so I implemented
that a
Hi Ali!
In the case you have, the sequence of source-map-filter ... forms a
pipeline.
You mentioned that you set the parallelism to 16, so there should be 16
pipelines. These pipelines should be completely independent.
Looking at the way the scheduler is implemented, independent pipelines
should
There is no shuffle operation in my flow. Mine actually looks like this:
Source: Custom Source -> Flat Map -> (Filter -> Flat Map -> Map -> Map ->
Map, Filter)
Maybe it’s treating this whole flow as one pipeline and assigning it to a
slot. What I really wanted was to have the custom source I bui
If I'm not mistaken, then the scheduler has already a preference to spread
independent pipelines out across the cluster. At least he uses a queue of
instances from which it pops the first element if it allocates a new slot.
This instance is then appended to the queue again, if it has some resources
Slots are like "resource groups" which execute entire pipelines. They
frequently have more than one operator.
What you can try as a workaround is decrease the number of slots per
machine to cause the operators to be spread across more machines.
If this is a crucial issue for your use case, it sho
> On 01 Dec 2015, at 15:26, Kashmar, Ali wrote:
>
> Is there a way to make a task cluster-parallelizable? I.e. Make sure the
> parallel instances of the task are distributed across the cluster. When I
> run my flink job with a parallelism of 16, all the parallel tasks are
> assigned to the first
Is there a way to make a task cluster-parallelizable? I.e. Make sure the
parallel instances of the task are distributed across the cluster. When I
run my flink job with a parallelism of 16, all the parallel tasks are
assigned to the first task manager.
- Ali
On 2015-11-30, 2:18 PM, "Ufuk Celebi"
> On 30 Nov 2015, at 17:47, Kashmar, Ali wrote:
> Do the parallel instances of each task get distributed across the cluster or
> is it possible that they all run on the same node?
Yes, slots are requested from all nodes of the cluster. But keep in mind that
multiple tasks (forming a local pipe
Hello,
I’m trying to wrap my head around task parallelism in a Flink cluster. Let’s
say I have a cluster of 3 nodes, each node offering 16 task slots, so in total
I’d have 48 slots for processing. Do the parallel instances of each task get
distributed across the cluster or is it possible that t
16 matches
Mail list logo