HI Stephan,

Just to avoid the confusion here, I am using S3 sink for writing the data,
and using HDFS for storing checkpoints.
There are 2 core nodes (HDFS) and two task nodes on EMR

I replaced s3 sink with HDFS for writing data in my last test.

Let's say the checkpoint interval is 5 minutes, now within 5minutes of run
the state size grows to 30GB ,  after checkpointing the 30GB state that is
maintained in rocksDB has to be copied to HDFS, right ?  is this causing
the pipeline to stall ?


Regards,
Vinay Patil

On Sat, Feb 25, 2017 at 12:22 AM, Vinay Patil <vinay18.pa...@gmail.com>
wrote:

> Hi Stephan,
>
> To verify if S3 is making teh pipeline stall, I have replaced the S3 sink
> with HDFS and kept minimum pause between checkpoints to 5minutes, still I
> see the same issue with checkpoints getting failed.
>
> If I keep the  pause time to 20 seconds, all checkpoints are completed ,
> however there is a hit in overall throughput.
>
>
>
>
> Regards,
> Vinay Patil
>
> On Fri, Feb 24, 2017 at 10:09 PM, Stephan Ewen [via Apache Flink User
> Mailing List archive.] <ml-node+s2336050n11891...@n4.nabble.com> wrote:
>
>> Flink's state backends currently do a good number of "make sure this
>> exists" operations on the file systems. Through Hadoop's S3 filesystem,
>> that translates to S3 bucket list operations, where there is a limit in how
>> many operation may happen per time interval. After that, S3 blocks.
>>
>> It seems that operations that are totally cheap on HDFS are hellishly
>> expensive (and limited) on S3. It may be that you are affected by that.
>>
>> We are gradually trying to improve the behavior there and be more S3
>> aware.
>>
>> Both 1.3-SNAPSHOT and 1.2-SNAPSHOT already contain improvements there.
>>
>> Best,
>> Stephan
>>
>>
>> On Fri, Feb 24, 2017 at 4:42 PM, vinay patil <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=11891&i=0>> wrote:
>>
>>> Hi Stephan,
>>>
>>> So do you mean that S3 is causing the stall , as I have mentioned in my
>>> previous mail, I could not see any progress for 16minutes as checkpoints
>>> were getting failed continuously.
>>>
>>> On Feb 24, 2017 8:30 PM, "Stephan Ewen [via Apache Flink User Mailing
>>> List archive.]" <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=11887&i=0>> wrote:
>>>
>>>> Hi Vinay!
>>>>
>>>> True, the operator state (like Kafka) is currently not asynchronously
>>>> checkpointed.
>>>>
>>>> While it is rather small state, we have seen before that on S3 it can
>>>> cause trouble, because S3 frequently stalls uploads of even data amounts as
>>>> low as kilobytes due to its throttling policies.
>>>>
>>>> That would be a super important fix to add!
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>>
>>>> On Fri, Feb 24, 2017 at 2:58 PM, vinay patil <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11885&i=0>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have attached a snapshot for reference:
>>>>> As you can see all the 3 checkpointins failed , for checkpoint ID 2
>>>>> and 3 it
>>>>> is stuck at the Kafka source after 50%
>>>>> (The data sent till now by Kafka source 1 is 65GB and sent by source 2
>>>>> is
>>>>> 15GB )
>>>>>
>>>>> Within 10minutes 15M records were processed, and for the next
>>>>> 16minutes the
>>>>> pipeline is stuck , I don't see any progress beyond 15M because of
>>>>> checkpoints getting failed consistently.
>>>>>
>>>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.na
>>>>> bble.com/file/n11882/Checkpointing_Failed.png>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: http://apache-flink-user-maili
>>>>> ng-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-
>>>>> RocksDB-as-statebackend-tp11752p11882.html
>>>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>>>> archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> If you reply to this email, your message will be added to the
>>>> discussion below:
>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>>>> ble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp1175
>>>> 2p11885.html
>>>> To start a new topic under Apache Flink User Mailing List archive.,
>>>> email [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11887&i=1>
>>>> To unsubscribe from Apache Flink User Mailing List archive., click here
>>>> .
>>>> NAML
>>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>
>>>
>>> ------------------------------
>>> View this message in context: Re: Checkpointing with RocksDB as
>>> statebackend
>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p11887.html>
>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>> archive
>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/>
>>> at Nabble.com.
>>>
>>
>>
>>
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-
>> tp11752p11891.html
>> To start a new topic under Apache Flink User Mailing List archive., email
>> ml-node+s2336050n1...@n4.nabble.com
>> To unsubscribe from Apache Flink User Mailing List archive., click here
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
>> .
>> NAML
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p11913.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to