Hi Stephan thanks for the reply! Now it's more clear..if I understood correctly map and mapPartition are the same iff I have only one slot per task manager, right?
I was convinced to have post those questions in this thread as 3rd or 4th message..isn't it? On 14 Aug 2015 17:57, "Stephan Ewen" <se...@apache.org> wrote: > Hi! > > (1) A mapper is created once per parallel task. So if you create a program > that runs a map() transformation with a parallelism of n, you will have n > mapper instances in the cluster. Some may be on the same TaskManager, if > the TaskManager has multiple slots. > > (2) I would really like that. But it means Java has to deal with both > managed and unmanaged memory at the same time, which is quite a heavy > addition. C# has some form of support for that. > > BTW: Where did you originally post these questions? I have not seen them > before... > > On Fri, Aug 14, 2015 at 5:43 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Any insight about these 2 questions..? >> On 12 Aug 2015 17:38, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: >> >>> This is something I've never understood in depth: isn't a mapper created >>> for each record?if it's created only once per task manager then it's not so >>> different from mapPartition..what I'm missing here? >>> >>> And then a more philosophic question: all big data framework requires >>> somehow to manage memory very efficiently (Flink has even though to reserve >>> a fraction of the entire memory in order to have control over it). Wouldn't >>> be simpler if java would finally release some APIs (even marked as unsafe, >>> it doesn't change theMat much) to allow for a full control of the >>> memory..?it will make a lot of sense for all big data platforms (at least >>> for non-UDF code...). >>> >>> Best, >>> Flavio >>> On 12 Aug 2015 12:44, "Timo Walther" <twal...@apache.org> wrote: >>> >>>> Hello Michael, >>>> >>>> every time you code a Java program you should avoid object creation if >>>> you want an efficient program, because every created object needs to be >>>> garbage collected later (which slows down your program performance). >>>> You can have small Pojos, just try to avoid the call "new" in your >>>> functions: >>>> >>>> Instead of: >>>> >>>> class Mapper implements MapFunction<String,Pojo> { >>>> public Pojo map(String s) { >>>> Pojo p = new Pojo(); >>>> p.f = s; >>>> } >>>> } >>>> >>>> do: >>>> >>>> class Mapper implements MapFunction<String,Pojo> { >>>> private Pojo p = new Pojo(); >>>> public Pojo map(String s) { >>>> p.f = s; >>>> } >>>> } >>>> >>>> Then an object is only created once per Mapper and not per record. >>>> >>>> Hope this helps. >>>> >>>> Regards, >>>> Timo >>>> >>>> >>>> >>>> On 12.08.2015 11:53, Michael Huelfenhaus wrote: >>>> >>>>> Hello >>>>> >>>>> I have a question about the programming of user defined functions, is >>>>> it still like in old Stratosphere times the case that object creation >>>>> should be avoided al all cost? Because in some of the examples there are >>>>> now Tuples and other objects created before returning them. >>>>> >>>>> I gonna have an at least 6 step streaming plan and I am going to use >>>>> Pojos. Is it performance wise a big improvement to define one big pojo >>>>> that >>>>> can be used by all the steps or better to have smaller ones to send less >>>>> data but create more objects. >>>>> >>>>> Thanks >>>>> Michael >>>>> >>>> >>>> >