Re: Great number of jobs and numberOfBuffers

Ufuk Celebi Thu, 17 Aug 2017 02:25:07 -0700

Hey Gwenhael,

the network buffers are recycled automatically after a job terminates.
If this does not happen, it would be quite a major bug.

To help debug this:

- Which version of Flink are you using?
- Does the job fail immediately after submission or later during execution?
- Is the following correct: the batch job that eventually fails
because of missing network buffers runs without problems if you submit
it to a fresh cluster with the same memory

The network buffers are recycled after the task managers report the
task being finished. If you immediately submit the next batch there is
a slight chance that the buffers are not recycled yet. As a possible
temporary work around, could you try waiting for a short amount of
time before submitting the next batch?

I think we should also be able to run the job without splitting it up
after increasing the network memory configuration. Did you already try
this?

Best,

Ufuk

On Thu, Aug 17, 2017 at 10:38 AM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com> wrote:
> Hello,
>
>
>
> We’re meeting a limit with the numberOfBuffers.
>
>
>
> In a quite complex job we do a lot of operations, with a lot of operators,
> on a lot of folders (datehours).
>
>
>
> In order to split the job into smaller “batches” (to limit the necessary
> “numberOfBuffers”) I’ve done a loop over the batches (handle the datehours 3
> by 3), for each batch I create a new env then call the execute() method.
>
>
>
> However it looks like there is no cleanup : after a while, if the number of
> batches is too big, there is an error saying that the numberOfBuffers isn’t
> high enough. It kinds of looks like some leak. Is there a way to clean them
> up ?

Re: Great number of jobs and numberOfBuffers

Reply via email to