Hi, I am using this consumer for processing records from DynamoDb Streams , few questions on this :
1. How does checkpointing works with Dstreams, since this class is extending FlinkKinesisConsumer, I am assuming it will start from the last successful checkpoint in case of failure, right ? 2. Currently when I kill the pipeline and start again it reads all the data from the start of the stream, is there any configuration to avoid this (apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to group-id in Kafka. 3. As these DynamoDB Streams are separated by shards what is the recommended parallelism to be set for the source , should it be one to one mapping , for example if there are 3 shards , then parallelism should be 3 ? Regards, Vinay Patil On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List archive.] <ml+s1008284n23597...@n3.nabble.com> wrote: > Thank you so much Fabian! > > Will update status in the JIRA. > > - > Ying > > On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=0>> wrote: > > > Done! > > > > Thank you :-) > > > > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=1>>: > > > > > Thanks Fabian and Thomas. > > > > > > Please assign FLINK-4582 to the following username: > > > *yxu-apache > > > < > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache > > >* > > > > > > If needed I can get a ICLA or CCLA whichever is proper. > > > > > > *Ying Xu* > > > Software Engineer > > > 510.368.1252 <+15103681252> > > > [image: Lyft] <http://www.lyft.com/> > > > > > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=2>> wrote: > > > > > > > The user is yxu-lyft, Ying had commented on that JIRA as well. > > > > > > > > https://issues.apache.org/jira/browse/FLINK-4582 > > > > > > > > > > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=3>> > > wrote: > > > > > > > > > Hi Ying, > > > > > > > > > > Thanks for considering to contribute the connector! > > > > > > > > > > In general, you don't need special permissions to contribute to > > Flink. > > > > > Anybody can open Jiras and PRs. > > > > > You only need to be assigned to the Contributor role in Jira to be > > able > > > > to > > > > > assign an issue to you. > > > > > I can give you these permissions if you tell me your Jira user. > > > > > > > > > > It would also be good if you could submit a CLA [1] if you plan to > > > > > contribute a larger feature. > > > > > > > > > > Thanks, Fabian > > > > > > > > > > [1] https://www.apache.org/licenses/#clas > > > > > > > > > > > > > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=4>>: > > > > > > > > > > > Hello Flink dev: > > > > > > > > > > > > We have implemented the prototype design and the initial PoC > worked > > > > > pretty > > > > > > well. Currently, we plan to move ahead with this design in our > > > > internal > > > > > > production system. > > > > > > > > > > > > We are thinking of contributing this connector back to the flink > > > > > community > > > > > > sometime soon. May I request to be granted with a contributor > > role? > > > > > > > > > > > > Many thanks in advance. > > > > > > > > > > > > *Ying Xu* > > > > > > Software Engineer > > > > > > 510.368.1252 <+15103681252> > > > > > > [image: Lyft] <http://www.lyft.com/> > > > > > > > > > > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=5>> wrote: > > > > > > > > > > > > > Hi Gordon: > > > > > > > > > > > > > > Cool. Thanks for the thumb-up! > > > > > > > > > > > > > > We will include some test cases around the behavior of > > re-sharding. > > > > If > > > > > > > needed we can double check the behavior with AWS, and see if > > > > additional > > > > > > > changes are needed. Will keep you posted. > > > > > > > > > > > > > > - > > > > > > > Ying > > > > > > > > > > > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai < > > > > > [hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=6> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > >> Hi Ying, > > > > > > >> > > > > > > >> Sorry for the late reply here. > > > > > > >> > > > > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it > seems > > > like > > > > > > this > > > > > > >> should simply work. > > > > > > >> > > > > > > >> Regarding the resharding behaviour I mentioned in the JIRA: > > > > > > >> I'm not sure if this is really a difference in behaviour. > > > > Internally, > > > > > if > > > > > > >> DynamoDB streams is actually just working on Kinesis Streams, > > then > > > > the > > > > > > >> resharding primitives should be similar. > > > > > > >> The shard discovery logic of the Flink Kinesis Consumer > assumes > > > that > > > > > > >> splitting / merging shards will result in new shards of > > > increasing, > > > > > > >> consecutive shard ids. As long as this is also the behaviour > for > > > > > > DynamoDB > > > > > > >> resharding, then we should be fine. > > > > > > >> > > > > > > >> Feel free to start with the implementation for this, I think > > > > > design-wise > > > > > > >> we're good to go. And thanks for working on this! > > > > > > >> > > > > > > >> Cheers, > > > > > > >> Gordon > > > > > > >> > > > > > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=7>> wrote: > > > > > > >> > > > > > > >> > HI Gordon: > > > > > > >> > > > > > > > >> > We are starting to implement some of the primitives along > this > > > > path. > > > > > > >> Please > > > > > > >> > let us know if you have any suggestions. > > > > > > >> > > > > > > > >> > Thanks! > > > > > > >> > > > > > > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=8>> > > wrote: > > > > > > >> > > > > > > > >> > > Hi Gordon: > > > > > > >> > > > > > > > > >> > > Really appreciate the reply. > > > > > > >> > > > > > > > > >> > > Yes our plan is to build the connector on top of the > > > > > > >> > FlinkKinesisConsumer. > > > > > > >> > > At the high level, FlinkKinesisConsumer mainly interacts > > with > > > > > > Kinesis > > > > > > >> > > through the AmazonKinesis client, more specifically > through > > > the > > > > > > >> following > > > > > > >> > > three function calls: > > > > > > >> > > > > > > > > >> > > - describeStream > > > > > > >> > > - getRecords > > > > > > >> > > - getShardIterator > > > > > > >> > > > > > > > > >> > > Given that the low-level DynamoDB client > > > > > > (AmazonDynamoDBStreamsClient) > > > > > > >> > > has already implemented similar calls, it is possible to > use > > > > that > > > > > > >> client > > > > > > >> > to > > > > > > >> > > interact with the dynamoDB streams, and adapt the results > > from > > > > the > > > > > > >> > dynamoDB > > > > > > >> > > streams model to the kinesis model. > > > > > > >> > > > > > > > > >> > > It appears this is exactly what the > > > > AmazonDynamoDBStreamsAdapterCl > > > > > > >> ient > > > > > > >> > > < > > > > > > >> > > https://github.com/awslabs/dynamodb-streams-kinesis-adapter/ > > > > > > >> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/ > > > > > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java > > > > > > >> > > > > > > > > >> > > does. The adaptor client implements the AmazonKinesis > client > > > > > > >> interface, > > > > > > >> > > and is officially supported by AWS. Hence it is possible > to > > > > > replace > > > > > > >> the > > > > > > >> > > internal Kinesis client inside FlinkKinesisConsumer with > > this > > > > > > adapter > > > > > > >> > > client when interacting with dynamoDB streams. The new > > object > > > > can > > > > > > be > > > > > > >> a > > > > > > >> > > subclass of FlinkKinesisConsumer with a new name e.g, > > > > > > >> > FlinkDynamoStreamCon > > > > > > >> > > sumer. > > > > > > >> > > > > > > > > >> > > At best this could simply work. But we would like to hear > if > > > > there > > > > > > are > > > > > > >> > > other situations to take care of. In particular, I am > > > wondering > > > > > > >> what's > > > > > > >> > the *"resharding > > > > > > >> > > behavior"* mentioned in FLINK-4582. > > > > > > >> > > > > > > > > >> > > Thanks a lot! > > > > > > >> > > > > > > > > >> > > - > > > > > > >> > > Ying > > > > > > >> > > > > > > > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai < > > > > > > >> > [hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=9> > > > > > > >> > > > wrote: > > > > > > >> > > > > > > > > >> > >> Hi! > > > > > > >> > >> > > > > > > >> > >> I think it would be definitely nice to have this > feature. > > > > > > >> > >> > > > > > > >> > >> No actual previous work has been made on this issue, but > > > AFAIK, > > > > > we > > > > > > >> > should > > > > > > >> > >> be able to build this on top of the > FlinkKinesisConsumer. > > > > > > >> > >> Whether this should live within the Kinesis connector > > module > > > or > > > > > an > > > > > > >> > >> independent module of its own is still TBD. > > > > > > >> > >> If you want, I would be happy to look at any concrete > > design > > > > > > >> proposals > > > > > > >> > you > > > > > > >> > >> have for this before you start the actual development > > > efforts. > > > > > > >> > >> > > > > > > >> > >> Cheers, > > > > > > >> > >> Gordon > > > > > > >> > >> > > > > > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=10>> > > > wrote: > > > > > > >> > >> > > > > > > >> > >> > Thanks Fabian for the suggestion. > > > > > > >> > >> > > > > > > > >> > >> > *Ying Xu* > > > > > > >> > >> > Software Engineer > > > > > > >> > >> > 510.368.1252 <+15103681252> > > > > > > >> > >> > [image: Lyft] <http://www.lyft.com/> > > > > > > >> > >> > > > > > > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske < > > > > > > [hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=11>> > > > > > > >> > >> wrote: > > > > > > >> > >> > > > > > > > >> > >> > > Hi Ying, > > > > > > >> > >> > > > > > > > > >> > >> > > I'm not aware of any effort for this issue. > > > > > > >> > >> > > You could check with the assigned contributor in > Jira > > if > > > > > there > > > > > > is > > > > > > >> > some > > > > > > >> > >> > > previous work. > > > > > > >> > >> > > > > > > > > >> > >> > > Best, Fabian > > > > > > >> > >> > > > > > > > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=23597&i=12>>: > > > > > > >> > >> > > > > > > > > >> > >> > > > Hello Flink dev: > > > > > > >> > >> > > > > > > > > > >> > >> > > > We have a number of use cases which involves > pulling > > > data > > > > > > from > > > > > > >> > >> DynamoDB > > > > > > >> > >> > > > streams into Flink. > > > > > > >> > >> > > > > > > > > > >> > >> > > > Given that this issue is tracked by Flink-4582 > > > > > > >> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. > > > we > > > > > would > > > > > > >> like > > > > > > >> > >> to > > > > > > >> > >> > > check > > > > > > >> > >> > > > if any prior work has been completed by the > > community. > > > > We > > > > > > are > > > > > > >> > also > > > > > > >> > >> > very > > > > > > >> > >> > > > interested in contributing to this effort. > > Currently, > > > we > > > > > > have > > > > > > >> a > > > > > > >> > >> > > high-level > > > > > > >> > >> > > > proposal which is based on extending the existing > > > > > > >> > >> FlinkKinesisConsumer > > > > > > >> > >> > > and > > > > > > >> > >> > > > making it work with DynamoDB streams (via > integrating > > > > with > > > > > > the > > > > > > >> > >> > > > AmazonDynamoDBStreams API). > > > > > > >> > >> > > > > > > > > > >> > >> > > > Any suggestion is welcome. Thank you very much. > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > > >> > >> > > > - > > > > > > >> > >> > > > Ying > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Consuming-data-from-dynamoDB-streams-to-flink-tp22963p23597.html > To start a new topic under Apache Flink Mailing List archive., email > ml+s1008284n1...@n3.nabble.com > To unsubscribe from Apache Flink Mailing List archive., click here > <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx> > . > NAML > <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >