Hi! (1) A mapper is created once per parallel task. So if you create a program that runs a map() transformation with a parallelism of n, you will have n mapper instances in the cluster. Some may be on the same TaskManager, if the TaskManager has multiple slots.
(2) I would really like that. But it means Java has to deal with both managed and unmanaged memory at the same time, which is quite a heavy addition. C# has some form of support for that. BTW: Where did you originally post these questions? I have not seen them before... On Fri, Aug 14, 2015 at 5:43 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Any insight about these 2 questions..? > On 12 Aug 2015 17:38, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: > >> This is something I've never understood in depth: isn't a mapper created >> for each record?if it's created only once per task manager then it's not so >> different from mapPartition..what I'm missing here? >> >> And then a more philosophic question: all big data framework requires >> somehow to manage memory very efficiently (Flink has even though to reserve >> a fraction of the entire memory in order to have control over it). Wouldn't >> be simpler if java would finally release some APIs (even marked as unsafe, >> it doesn't change theMat much) to allow for a full control of the >> memory..?it will make a lot of sense for all big data platforms (at least >> for non-UDF code...). >> >> Best, >> Flavio >> On 12 Aug 2015 12:44, "Timo Walther" <twal...@apache.org> wrote: >> >>> Hello Michael, >>> >>> every time you code a Java program you should avoid object creation if >>> you want an efficient program, because every created object needs to be >>> garbage collected later (which slows down your program performance). >>> You can have small Pojos, just try to avoid the call "new" in your >>> functions: >>> >>> Instead of: >>> >>> class Mapper implements MapFunction<String,Pojo> { >>> public Pojo map(String s) { >>> Pojo p = new Pojo(); >>> p.f = s; >>> } >>> } >>> >>> do: >>> >>> class Mapper implements MapFunction<String,Pojo> { >>> private Pojo p = new Pojo(); >>> public Pojo map(String s) { >>> p.f = s; >>> } >>> } >>> >>> Then an object is only created once per Mapper and not per record. >>> >>> Hope this helps. >>> >>> Regards, >>> Timo >>> >>> >>> >>> On 12.08.2015 11:53, Michael Huelfenhaus wrote: >>> >>>> Hello >>>> >>>> I have a question about the programming of user defined functions, is >>>> it still like in old Stratosphere times the case that object creation >>>> should be avoided al all cost? Because in some of the examples there are >>>> now Tuples and other objects created before returning them. >>>> >>>> I gonna have an at least 6 step streaming plan and I am going to use >>>> Pojos. Is it performance wise a big improvement to define one big pojo that >>>> can be used by all the steps or better to have smaller ones to send less >>>> data but create more objects. >>>> >>>> Thanks >>>> Michael >>>> >>> >>>