Re: Consuming data from dynamoDB streams to flink

Ying Xu Mon, 30 Jul 2018 01:08:35 -0700

Hello Flink dev:

We have implemented the prototype design and the initial PoC worked pretty
well.  Currently, we plan to move ahead with this design in our internal
production system.


We are thinking of contributing this connector back to the flink community
sometime soon.  May I request to be granted with a contributor role?

Many thanks in advance.

*Ying Xu*
Software Engineer
510.368.1252 <+15103681252>
[image: Lyft] <http://www.lyft.com/>

On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <[email protected]> wrote:

> Hi Gordon:
>
> Cool. Thanks for the thumb-up!
>
> We will include some test cases around the behavior of re-sharding. If
> needed we can double check the behavior with AWS, and see if additional
> changes are needed.  Will keep you posted.
>
> -
> Ying
>
> On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <[email protected]>
> wrote:
>
>> Hi Ying,
>>
>> Sorry for the late reply here.
>>
>> From the looks of the AmazonDynamoDBStreamsClient, yes it seems like this
>> should simply work.
>>
>> Regarding the resharding behaviour I mentioned in the JIRA:
>> I'm not sure if this is really a difference in behaviour. Internally, if
>> DynamoDB streams is actually just working on Kinesis Streams, then the
>> resharding primitives should be similar.
>> The shard discovery logic of the Flink Kinesis Consumer assumes that
>> splitting / merging shards will result in new shards of increasing,
>> consecutive shard ids. As long as this is also the behaviour for DynamoDB
>> resharding, then we should be fine.
>>
>> Feel free to start with the implementation for this, I think design-wise
>> we're good to go. And thanks for working on this!
>>
>> Cheers,
>> Gordon
>>
>> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <[email protected]> wrote:
>>
>> > HI Gordon:
>> >
>> > We are starting to implement some of the primitives along this path.
>> Please
>> > let us know if you have any suggestions.
>> >
>> > Thanks!
>> >
>> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <[email protected]> wrote:
>> >
>> > > Hi Gordon:
>> > >
>> > > Really appreciate the reply.
>> > >
>> > > Yes our plan is to build the connector on top of the
>> > FlinkKinesisConsumer.
>> > > At the high level, FlinkKinesisConsumer mainly interacts with Kinesis
>> > > through the AmazonKinesis client, more specifically through the
>> following
>> > > three function calls:
>> > >
>> > >    - describeStream
>> > >    - getRecords
>> > >    - getShardIterator
>> > >
>> > > Given that the low-level DynamoDB client (AmazonDynamoDBStreamsClient)
>> > > has already implemented similar calls, it is possible to use that
>> client
>> > to
>> > > interact with the dynamoDB streams, and adapt the results from the
>> > dynamoDB
>> > > streams model to the kinesis model.
>> > >
>> > > It appears this is exactly what the AmazonDynamoDBStreamsAdapterCl
>> ient
>> > > <
>> > https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
>> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
>> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
>> > >
>> > > does. The adaptor client implements the AmazonKinesis client
>> interface,
>> > > and is officially supported by AWS.  Hence it is possible to replace
>> the
>> > > internal Kinesis client inside FlinkKinesisConsumer with this adapter
>> > > client when interacting with dynamoDB streams.  The new object can be
>> a
>> > > subclass of FlinkKinesisConsumer with a new name e.g,
>> > FlinkDynamoStreamCon
>> > > sumer.
>> > >
>> > > At best this could simply work. But we would like to hear if there are
>> > > other situations to take care of.  In particular, I am wondering
>> what's
>> > the *"resharding
>> > > behavior"* mentioned in FLINK-4582.
>> > >
>> > > Thanks a lot!
>> > >
>> > > -
>> > > Ying
>> > >
>> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
>> > [email protected]
>> > > > wrote:
>> > >
>> > >> Hi!
>> > >>
>> > >> I think it would be definitely nice to have this feature.
>> > >>
>> > >> No actual previous work has been made on this issue, but AFAIK, we
>> > should
>> > >> be able to build this on top of the FlinkKinesisConsumer.
>> > >> Whether this should live within the Kinesis connector module or an
>> > >> independent module of its own is still TBD.
>> > >> If you want, I would be happy to look at any concrete design
>> proposals
>> > you
>> > >> have for this before you start the actual development efforts.
>> > >>
>> > >> Cheers,
>> > >> Gordon
>> > >>
>> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <[email protected]> wrote:
>> > >>
>> > >> > Thanks Fabian for the suggestion.
>> > >> >
>> > >> > *Ying Xu*
>> > >> > Software Engineer
>> > >> > 510.368.1252 <+15103681252>
>> > >> > [image: Lyft] <http://www.lyft.com/>
>> > >> >
>> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <[email protected]>
>> > >> wrote:
>> > >> >
>> > >> > > Hi Ying,
>> > >> > >
>> > >> > > I'm not aware of any effort for this issue.
>> > >> > > You could check with the assigned contributor in Jira if there is
>> > some
>> > >> > > previous work.
>> > >> > >
>> > >> > > Best, Fabian
>> > >> > >
>> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <[email protected]>:
>> > >> > >
>> > >> > > > Hello Flink dev:
>> > >> > > >
>> > >> > > > We have a number of use cases which involves pulling data from
>> > >> DynamoDB
>> > >> > > > streams into Flink.
>> > >> > > >
>> > >> > > > Given that this issue is tracked by Flink-4582
>> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we would
>> like
>> > >> to
>> > >> > > check
>> > >> > > > if any prior work has been completed by the community.   We are
>> > also
>> > >> > very
>> > >> > > > interested in contributing to this effort.  Currently, we have
>> a
>> > >> > > high-level
>> > >> > > > proposal which is based on extending the existing
>> > >> FlinkKinesisConsumer
>> > >> > > and
>> > >> > > > making it work with DynamoDB streams (via integrating with the
>> > >> > > > AmazonDynamoDBStreams API).
>> > >> > > >
>> > >> > > > Any suggestion is welcome. Thank you very much.
>> > >> > > >
>> > >> > > >
>> > >> > > > -
>> > >> > > > Ying
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Consuming data from dynamoDB streams to flink

Reply via email to