Hi Stefan,

Sorry for jumping in the discussion. 

I’ve seen a blog post [1] of dataArtisans which says that object reuse has not 
much influence on data streams.

>  For Flink’s DataStream API, this setting does in fact not even result in 
> reusing of objects, but only in avoiding additional object copying on the 
> way, which happens by default as an additional safety net for users.

So I’m a bit confused here. Could you please give more details about the object 
reuse in data streams? Thanks a lot!

[1] 
https://data-artisans.com/blog/curious-case-broken-benchmark-revisiting-apache-flink-vs-databricks-runtime
 
<https://data-artisans.com/blog/curious-case-broken-benchmark-revisiting-apache-flink-vs-databricks-runtime>

Best,
Paul Lam


> 在 2018年9月20日,23:52,Stefan Richter <s.rich...@data-artisans.com> 写道:
> 
> Oh yes exactly, enable is right.
> 
>> Am 20.09.2018 um 17:48 schrieb Hequn Cheng <chenghe...@gmail.com 
>> <mailto:chenghe...@gmail.com>>:
>> 
>> Hi Stefan,
>> 
>> Do you mean enable object reuse?
>> If you want to reduce latency between chained operators, you can also try to 
>> disable object-reuse:
>> 
>> On Thu, Sep 20, 2018 at 10:37 PM Stefan Richter <s.rich...@data-artisans.com 
>> <mailto:s.rich...@data-artisans.com>> wrote:
>> Sorry, forgot the link for reference [1], which is 
>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/datastream_api.html#controlling-latency
>>  
>> <https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/datastream_api.html#controlling-latency>
>> 
>>> Am 20.09.2018 um 16:36 schrieb Stefan Richter <s.rich...@data-artisans.com 
>>> <mailto:s.rich...@data-artisans.com>>:
>>> 
>>> Hi,
>>> 
>>> you provide not very much information for this question, e.g. what and how 
>>> exactly your measure, if this is a local or distributed setting etc. I 
>>> assume that it is distributed and that the cause for your observation is 
>>> the buffer timeout, i.e. the maximum time that Flink waits until sending a 
>>> buffer with just one element, which happens to be 100ms by default. You can 
>>> decrease this value to some extend, at to cost of potential loss in 
>>> throughput, but I think even values around 5-10ms are ok-ish. See [1] for 
>>> more details. If you want to reduce latency between chained operators, you 
>>> can also try to disable object-reuse:
>>> 
>>> StreamExecutionEnvironment env = 
>>> StreamExecutionEnvironment.getExecutionEnvironment();
>>> env.getConfig().enableObjectReuse();
>>> 
>>> Best,
>>> Stefan
>>> 
>>>> Am 20.09.2018 um 16:03 schrieb James Yu <cyu...@gmail.com 
>>>> <mailto:cyu...@gmail.com>>:
>>>> 
>>>> The previous email seems unable to display embedded images, let me put on 
>>>> the links.
>>>> Hi,
>>>> 
>>>> My team and I try to measure total time spent on our flink job and found 
>>>> out that Flink takes 40ms ~ 100ms to proceed from one operator to another. 
>>>> I wonder how can we reduce this transition time.
>>>> 
>>>> Following DAG represents our job:
>>>> https://drive.google.com/open?id=1wNR8po-SooAfYtCxU3qUDm4-hm16tclV 
>>>> <https://drive.google.com/open?id=1wNR8po-SooAfYtCxU3qUDm4-hm16tclV> 
>>>> 
>>>> 
>>>> and here is the screenshot of our log:
>>>> https://drive.google.com/open?id=14PTViZMkhagxeNjOb4R8u3BEr7Ym1xBi 
>>>> <https://drive.google.com/open?id=14PTViZMkhagxeNjOb4R8u3BEr7Ym1xBi> 
>>>> 
>>>> at 19:37:04.564, the job is leaving "Source: Custom Source -> Flat Map"
>>>> at 19:37:04.605, the job is entering "Co-Flat Map"
>>>> at 19:37:04.605, the job is leaving "Co-Flat Map"
>>>> at 19:37:04.705, the job is entering "Co-Flat Map -> ...."
>>>> at 19:37:04.708, the job is leaving "Co-Flat Map -> ..."
>>>> 
>>>> both "Co-Flat Map" finishes merely instantly, while most of the execution 
>>>> time is spent on the transition. Any idea?
>>>> 
>>>> 
>>>> This is a UTF-8 formatted mail
>>>> -----------------------------------------------
>>>> James C.-C.Yu
>>>> +886988713275
>>>> +8615618429976
>>>> 
>>> 
>> 
> 

Reply via email to