Re: [DISCUSS] Allowed Lateness in Flink

Kostas Kloudas Thu, 07 Jul 2016 00:14:45 -0700

Hi,

In the effort to move the discussion to the mailing list, rather than the doc, 
there was a comment in the doc:

“It seems this proposal marries the allowed lateness of events and the 
discarding of window state. In most use cases this should be sufficient, but 
there are instances where having independent control of these may be useful.

For instance, you may have a job that computes some aggregate, like a sum. You 
may want to keep the window state around for a while, but not too long. Yet you 
may want to continue processing late events after you discarded the window 
state. It is possible that your stream sinks can make use of this data. For 
instance, they may be writing to a data store that returns an error if a row 
already exists, which allow the sink to read the existing row and update it 
with the new data."

To which I would like to reply:

If I understand your use-case correctly, I believe that the proposed binding of 
the allowed lateness to the state purging does not impose any problem. The 
lateness specifies the upper time bound, after which the state will be 
discarded. Between the start of a window and its (end + allowedLateness) you 
can write custom triggers that fire, purge the state, or do nothing. Given 
this, I suppose that, at the most extreme case, you can specify an allowed 
lateness of Long.MaxValue and do the purging of the state "manually". By doing 
this, you remove the safeguard of letting the system purge the state at some 
point in time, and you can do your own custom state management that fits your 
needs.

Thanks, 
Kostas

> On Jul 6, 2016, at 5:43 PM, Aljoscha Krettek <aljos...@apache.org> wrote:
> 
> @Vishnu Funny you should ask that because I have a design doc lying around.
> I'll open a new mail thread to not hijack this one.
> 
> On Wed, 6 Jul 2016 at 17:17 Vishnu Viswanath <vishnu.viswanat...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> I was going through the suggested improvements in window, and I have
>> few questions/suggestion on improvement regarding the Evictor.
>> 
>> 1) I am having a use case where I have to create a custom Evictor that will
>> evict elements from the window based on the value (e.g., if I have elements
>> are of case class Item(id: Int, type:String) then evict elements that has
>> type="a"). I believe this is not currently possible.
>> 2) this is somewhat related to 1) where there should be an option to evict
>> elements from anywhere in the window. not only from the beginning of the
>> window. (e.g., apply the delta function to all elements and remove all
>> those don't pass. I checked the code and evict method just returns the
>> number of elements to be removed and processTriggerResult just skips those
>> many elements from the beginning.
>> 3) Add an option to enables the user to decide if the eviction should
>> happen before the apply function or after the apply function. Currently it
>> is before the apply function, but I have a use case where I need to first
>> apply the function and evict afterward.
>> 
>> I am doing these for a POC so I think I can modify the flink code base to
>> make these changes and build, but I would appreciate any suggestion on
>> whether these are viable changes or will there any performance issue if
>> these are done. Also any pointer on where to start(e.g, do I create a new
>> class similar to EvictingWindowOperator that extends WindowOperator?)
>> 
>> Thanks and Regards,
>> Vishnu Viswanath,
>> 
>> On Wed, Jul 6, 2016 at 9:39 AM, Aljoscha Krettek <aljos...@apache.org>
>> wrote:
>> 
>>> I did:
>>> 
>>> 
>> https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3ccanmxww0abttjjg9ewdxrugxkjm7jscbenmvrzohpt2qo3pq...@mail.gmail.com%3e
>>> ;-)
>>> 
>>> On Wed, 6 Jul 2016 at 15:31 Ufuk Celebi <u...@apache.org> wrote:
>>> 
>>>> On Wed, Jul 6, 2016 at 3:19 PM, Aljoscha Krettek <aljos...@apache.org>
>>>> wrote:
>>>>> In the future, it might be good to to discussions directly on the ML
>>> and
>>>>> then change the document accordingly. This way everyone can follow
>> the
>>>>> discussion on the ML. I also feel that Google Doc comments often
>> don't
>>>> give
>>>>> enough space for expressing more complex opinions.
>>>> 
>>>> I agree! Would you mind raising this point as a separate discussion on
>>> dev@
>>>> ?
>>>> 
>>> 
>>

Re: [DISCUSS] Allowed Lateness in Flink

Reply via email to