I am not sure at this point that the delay is caused by Flink. I would
rather suspect that it has something to do with an external system. Maybe
you could try profiling the job submission so that we see clearer where the
time is spent. Other than that, there might be some options for the GCS
filesy
Yes, we can try the same in 1.11. Meanwhile is there any network or threads
related config that we can tweak for this?
On Fri, Sep 4, 2020 at 12:48 PM Till Rohrmann wrote:
> From the log snippet it is hard to tell. Flink is not only interacting
> with GCS but also with ZooKeeper to store a point
>From the log snippet it is hard to tell. Flink is not only interacting with
GCS but also with ZooKeeper to store a pointer to the serialized JobGraph.
This can also take some time. Then of course, there could be an issue with
the GS filesystem implementation you are using. The fs throughput could
Yes, I will check that, but any pointers on why Flink is taking more time
than gsutil upload?
On Thu, Sep 3, 2020 at 10:14 PM Till Rohrmann wrote:
> Hmm then it probably rules GCS out. What about ZooKeeper? Have you
> experienced slow response times from your ZooKeeper cluster?
>
> Cheers,
> Til
Hmm then it probably rules GCS out. What about ZooKeeper? Have you
experienced slow response times from your ZooKeeper cluster?
Cheers,
Till
On Thu, Sep 3, 2020 at 6:23 PM Prakhar Mathur wrote:
> We tried uploading the same blob from Job Manager k8s pod directly to GCS
> using gsutils and it to
We tried uploading the same blob from Job Manager k8s pod directly to GCS
using gsutils and it took 2 seconds. The upload speed was 166.8 MiB/s.
Thanks.
On Wed, Sep 2, 2020 at 6:14 PM Till Rohrmann wrote:
> The logs don't look suspicious. Could you maybe check what the write
> bandwidth to your
The logs don't look suspicious. Could you maybe check what the write
bandwidth to your GCS bucket is from the machine you are running Flink on?
It should be enough to generate a 200 MB file and write it to GCS. Thanks a
lot for your help in debugging this matter.
Cheers,
Till
On Wed, Sep 2, 2020
Hi,
Thanks for the response. Yes, we are running Flink in HA mode. We checked
there are no such quota limits for GCS for us. Please find the logs below,
here you can see the copying of blob started at 11:50:39,455 and it
got JobGraph submission at 11:50:46,400.
2020-09-01 11:50:37,061 DEBUG
org.a
Hi Prakhar,
have you enabled HA for your cluster? If yes, then Flink will try to store
the job graph to the configured high-availability.storageDir in order to be
able to recover it. If this operation takes long, then it is either the
filesystem which is slow or storing the pointer in ZooKeeper. I
Hi,
We are currently running Flink 1.9.0. We see a delay of around 20 seconds
in order to start a job on a session Flink cluster. We start the job using
Flink's monitoring REST API where our jar is already uploaded on Job
Manager. Our jar file size is around 200 MB. We are using memory state
backe
10 matches
Mail list logo