Hi,

Let's assume the following case.
- a stream processor that uses the Processor API
- context.schedule(1000) is called in the init()
- the processor reads only one topic that has one partition
- using custom timestamp extractor, but that timestamp is just a wall 
clock time


Image the following events:
1., for 10 seconds I send in 5 messages / second
2., does not send any messages for 3 seconds
3., starts the 5 messages / second again

I see that punctuate() is not called during the 3 seconds when I do not 
send any messages. This is ok according to the documentation, because 
there is not any new messages to trigger the punctuate() call. When the 
first few messages arrives after a restart the sending (point 3. above) I 
see the following sequence of method calls:

1., process() on the 1st message
2., punctuate() is called 3 times
3., process() on the 2nd message
4., process() on each following message

What I would expect instead is that punctuate() is called first and then 
process() is called on the messages, because the first message's timestamp 
is already 3 seconds older then the last punctuate() was called, so the 
first message belongs after the 3 punctuate() calls.

Please let me know if this is a bug or intentional, in this case what is 
the reason for processing one message before punctuate() is called?


Thanks,
Peter

Péter Sinóros-Szabó
Software Engineer

Ustream, an IBM Company
Andrassy ut 39, H-1061 Budapest
Mobile: +36203693050
Email: peter.sinoros-sz...@hu.ibm.com

Reply via email to