Any insight about these 2 questions..? On 12 Aug 2015 17:38, "Flavio Pompermaier" <pomperma...@okkam.it> wrote:
> This is something I've never understood in depth: isn't a mapper created > for each record?if it's created only once per task manager then it's not so > different from mapPartition..what I'm missing here? > > And then a more philosophic question: all big data framework requires > somehow to manage memory very efficiently (Flink has even though to reserve > a fraction of the entire memory in order to have control over it). Wouldn't > be simpler if java would finally release some APIs (even marked as unsafe, > it doesn't change theMat much) to allow for a full control of the > memory..?it will make a lot of sense for all big data platforms (at least > for non-UDF code...). > > Best, > Flavio > On 12 Aug 2015 12:44, "Timo Walther" <twal...@apache.org> wrote: > >> Hello Michael, >> >> every time you code a Java program you should avoid object creation if >> you want an efficient program, because every created object needs to be >> garbage collected later (which slows down your program performance). >> You can have small Pojos, just try to avoid the call "new" in your >> functions: >> >> Instead of: >> >> class Mapper implements MapFunction<String,Pojo> { >> public Pojo map(String s) { >> Pojo p = new Pojo(); >> p.f = s; >> } >> } >> >> do: >> >> class Mapper implements MapFunction<String,Pojo> { >> private Pojo p = new Pojo(); >> public Pojo map(String s) { >> p.f = s; >> } >> } >> >> Then an object is only created once per Mapper and not per record. >> >> Hope this helps. >> >> Regards, >> Timo >> >> >> >> On 12.08.2015 11:53, Michael Huelfenhaus wrote: >> >>> Hello >>> >>> I have a question about the programming of user defined functions, is it >>> still like in old Stratosphere times the case that object creation should >>> be avoided al all cost? Because in some of the examples there are now >>> Tuples and other objects created before returning them. >>> >>> I gonna have an at least 6 step streaming plan and I am going to use >>> Pojos. Is it performance wise a big improvement to define one big pojo that >>> can be used by all the steps or better to have smaller ones to send less >>> data but create more objects. >>> >>> Thanks >>> Michael >>> >> >>