Hi Stephan,

To verify if S3 is making teh pipeline stall, I have replaced the S3 sink
with HDFS and kept minimum pause between checkpoints to 5minutes, still I
see the same issue with checkpoints getting failed.

If I keep the  pause time to 20 seconds, all checkpoints are completed ,
however there is a hit in overall throughput.




Regards,
Vinay Patil

On Fri, Feb 24, 2017 at 10:09 PM, Stephan Ewen [via Apache Flink User
Mailing List archive.] <ml-node+s2336050n11891...@n4.nabble.com> wrote:

> Flink's state backends currently do a good number of "make sure this
> exists" operations on the file systems. Through Hadoop's S3 filesystem,
> that translates to S3 bucket list operations, where there is a limit in how
> many operation may happen per time interval. After that, S3 blocks.
>
> It seems that operations that are totally cheap on HDFS are hellishly
> expensive (and limited) on S3. It may be that you are affected by that.
>
> We are gradually trying to improve the behavior there and be more S3 aware.
>
> Both 1.3-SNAPSHOT and 1.2-SNAPSHOT already contain improvements there.
>
> Best,
> Stephan
>
>
> On Fri, Feb 24, 2017 at 4:42 PM, vinay patil <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=11891&i=0>> wrote:
>
>> Hi Stephan,
>>
>> So do you mean that S3 is causing the stall , as I have mentioned in my
>> previous mail, I could not see any progress for 16minutes as checkpoints
>> were getting failed continuously.
>>
>> On Feb 24, 2017 8:30 PM, "Stephan Ewen [via Apache Flink User Mailing
>> List archive.]" <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=11887&i=0>> wrote:
>>
>>> Hi Vinay!
>>>
>>> True, the operator state (like Kafka) is currently not asynchronously
>>> checkpointed.
>>>
>>> While it is rather small state, we have seen before that on S3 it can
>>> cause trouble, because S3 frequently stalls uploads of even data amounts as
>>> low as kilobytes due to its throttling policies.
>>>
>>> That would be a super important fix to add!
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> On Fri, Feb 24, 2017 at 2:58 PM, vinay patil <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=11885&i=0>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have attached a snapshot for reference:
>>>> As you can see all the 3 checkpointins failed , for checkpoint ID 2 and
>>>> 3 it
>>>> is stuck at the Kafka source after 50%
>>>> (The data sent till now by Kafka source 1 is 65GB and sent by source 2
>>>> is
>>>> 15GB )
>>>>
>>>> Within 10minutes 15M records were processed, and for the next 16minutes
>>>> the
>>>> pipeline is stuck , I don't see any progress beyond 15M because of
>>>> checkpoints getting failed consistently.
>>>>
>>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.na
>>>> bble.com/file/n11882/Checkpointing_Failed.png>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-flink-user-maili
>>>> ng-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-
>>>> RocksDB-as-statebackend-tp11752p11882.html
>>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>>> archive at Nabble.com.
>>>>
>>>
>>>
>>>
>>> ------------------------------
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>>> ble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p11885.html
>>> To start a new topic under Apache Flink User Mailing List archive.,
>>> email [hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=11887&i=1>
>>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>>> NAML
>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>> ------------------------------
>> View this message in context: Re: Checkpointing with RocksDB as
>> statebackend
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p11887.html>
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/>
>> at Nabble.com.
>>
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-
> Checkpointing-with-RocksDB-as-statebackend-tp11752p11891.html
> To start a new topic under Apache Flink User Mailing List archive., email
> ml-node+s2336050n1...@n4.nabble.com
> To unsubscribe from Apache Flink User Mailing List archive., click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p11901.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to