As Fabian suggested, YARN is a good way to go for isolation (it actually
isolates more than a JVM, which is very nice).

Here are some additional things you can do:

  - For isolation between parallel tasks (within a job), start your YARN
job such that each TaskManager has one slot, and start many TaskManagers.
That is a bit less efficient (but not much) than fewer TaskManagers with
more slots. (*)

  - If you need to isolate successor tasks in a job against predecessor
tasks, you can select "batch" execution mode. By default, the system uses
"pipelined" execution mode. In a MapReduce case, this means that mappers
and reducers run concurrently. With "batch" mode, reducers run only after
all mappers finished.

Greetings,
Stephan


(*) The reason why multiple slots in one TaskManager are more efficient is
that TaskManagers multiplex multiple data exchanges of a shuffle through a
TCP connection, reducing per-exchange overhead and usually increasing
throughput.



On Thu, Jul 30, 2015 at 12:10 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi,
>
> it is currently not possible to isolate tasks that consume a lot of JVM
> heap memory and schedule them to a specific slot (or TaskManager).
> If you operate in a YARN setup, you can isolate different jobs from each
> other by starting a new YARN session for each job, but tasks within the
> same job cannot be isolated from each other right now.
>
> Cheers, Fabian
>
> 2015-07-30 4:02 GMT+02:00 wangzhijiang999 <wangzhijiang...@aliyun.com>:
>
>> As I know, flink uses thread model in TaskManager, that means one
>> taskmanager process may run many different operator threads,and these
>> threads will compete the memory of the process. I know that flink has
>> memoryManage component in each taskManager, and it will control the
>> localBufferPool of InputGate, ResultPartition for each task,but if UDF
>> consume much memory, it will use jvm heap memory, so it can not be
>> controlled by flink. If I use flink as common platform, some users will
>> consume much memory in UDF, and it may influence other threads in the
>> process, especially for OOM.  I know that it has sharedslot or isolated
>> slot properties , but it just limit the task schedule in one taskmanager,
>> can i schedule task in separate taskmanger if i consume much memory and
>> donot want to influence other tasks. Or are there any suggestions for the
>> issue of thread model. As I know spark is also thread model, but hadoop2
>> use process model.
>>
>>
>> Thank you for any suggestions in advance!
>>
>
>

Reply via email to