Re: Udf Performance and Object Creation

Flavio Pompermaier Fri, 14 Aug 2015 08:45:01 -0700

Any insight about these 2 questions..?
On 12 Aug 2015 17:38, "Flavio Pompermaier" <pomperma...@okkam.it> wrote:


> This is something I've never understood in depth: isn't a mapper created
> for each record?if it's created only once per task manager then it's not so
> different from mapPartition..what I'm missing here?
>
> And then a more philosophic question: all big data framework requires
> somehow to manage memory very efficiently (Flink has even though to reserve
> a fraction of the entire memory in order to have control over it). Wouldn't
> be simpler if java would finally release some APIs (even marked as unsafe,
> it doesn't change theMat much) to allow for a full control of the
> memory..?it will make a lot of sense for all big data platforms (at least
> for non-UDF code...).
>
> Best,
> Flavio
> On 12 Aug 2015 12:44, "Timo Walther" <twal...@apache.org> wrote:
>
>> Hello Michael,
>>
>> every time you code a Java program you should avoid object creation if
>> you want an efficient program, because every created object needs to be
>> garbage collected later (which slows down your program performance).
>> You can have small Pojos, just try to avoid the call "new" in your
>> functions:
>>
>> Instead of:
>>
>> class Mapper implements MapFunction<String,Pojo> {
>> public Pojo map(String s) {
>>     Pojo p = new Pojo();
>>     p.f = s;
>> }
>> }
>>
>> do:
>>
>> class Mapper implements MapFunction<String,Pojo> {
>> private Pojo p = new Pojo();
>> public Pojo map(String s) {
>>     p.f = s;
>> }
>> }
>>
>> Then an object is only created once per Mapper and not per record.
>>
>> Hope this helps.
>>
>> Regards,
>> Timo
>>
>>
>>
>> On 12.08.2015 11:53, Michael Huelfenhaus wrote:
>>
>>> Hello
>>>
>>> I have a question about the programming of user defined functions, is it
>>> still like in old Stratosphere times the case that object creation should
>>> be avoided al all cost? Because in some of the examples there are now
>>> Tuples and other objects created before returning them.
>>>
>>> I gonna have an at least 6 step streaming plan and I am going to use
>>> Pojos. Is it performance wise a big improvement to define one big pojo that
>>> can be used by all the steps or better to have smaller ones to send less
>>> data but create more objects.
>>>
>>> Thanks
>>> Michael
>>>
>>
>>

Re: Udf Performance and Object Creation

Reply via email to