Re: [DISCUSS] FLIP-307: Flink connector Redshift

Samrat Deb Wed, 21 Jun 2023 01:58:10 -0700

Hi Martijn,

Thank you for sharing your thoughts on the matter.
I understand that you don't have a strong opinion on whether to support
exactly-once processing from the beginning or at a later stage.
For initial implementation I will go ahead with at-least-once semantics.


>  The only consideration that I could think of is that
if you start with at-least-once, you could consider using the ASync API,
but I don't think the ASync API yet supports exactly-once.

Noted. It's a valid consideration to start compatibility with the Async
API.

Bests,
Samrat


On Mon, Jun 19, 2023 at 5:28 PM Martijn Visser <[email protected]>
wrote:

> Hi Samrat,
>
> I have no strong opinion on whether to support exactly-once from the start
> or potentially later. The only consideration that I could think of is that
> if you start with at-least-once, you could consider using the ASync API,
> but I don't think the ASync API yet supports exactly-once.
>
> Thanks,
>
> Martijn
>
> On Fri, Jun 9, 2023 at 7:22 PM Jing Ge <[email protected]> wrote:
>
> > Hi Samrat,
> >
> > The FLIP looks good, thanks!
> >
> > Best regards,
> > Jing
> >
> >
> > On Tue, Jun 6, 2023 at 8:16 PM Samrat Deb <[email protected]> wrote:
> >
> > > Hi Jing,
> > >
> > > >  I would suggest adding that information into the
> > > FLIP.
> > >
> > > Updated now, please review the new version of flip whenever time.
> > >
> > > > +1 Looking forward to your PR :-)
> > > I will request for your review once m ready with PR :-)
> > >
> > > Bests,
> > > Samrat
> > >
> > > On Tue, Jun 6, 2023 at 11:43 PM Samrat Deb <[email protected]>
> > wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > > > If I understand this correctly, the Redshift sink
> > > > would not be able to support exactly-once, is that correct?
> > > >
> > > > As I delve deeper into the study of Redshift's capabilities, I have
> > > > discovered that it does support "merge into" operations [1] and some
> > > > merge into examples [2].
> > > > This opens up the possibility of implementing exactly-once semantics
> > with
> > > > the connector.
> > > > However, I believe it would be prudent to start with a more focused
> > scope
> > > > for the initial phase of implementation and defer the exact-once
> > support
> > > > for subsequent iterations.
> > > >
> > > > Before finalizing the approach, I would greatly appreciate your
> > thoughts
> > > > and suggestions on this matter.
> > > > Should we prioritize the initial implementation without exactly-once
> > > > support, or would you advise incorporating it right from the start?
> > > > Your insights and experiences would be immensely valuable in making
> > this
> > > > decision.
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://docs.aws.amazon.com/redshift/latest/dg/t_updating-inserting-using-staging-tables-.html
> > > > [2]
> https://docs.aws.amazon.com/redshift/latest/dg/merge-examples.html
> > > >
> > > > Bests,
> > > > Samrat
> > > >
> > > > On Mon, Jun 5, 2023 at 7:09 PM Jing Ge <[email protected]>
> > > wrote:
> > > >
> > > >> Hi Samrat,
> > > >>
> > > >> Thanks for the feedback. I would suggest adding that information
> into
> > > the
> > > >> FLIP.
> > > >>
> > > >> +1 Looking forward to your PR :-)
> > > >>
> > > >> Best regards,
> > > >> Jing
> > > >>
> > > >> On Sat, Jun 3, 2023 at 9:19 PM Samrat Deb <[email protected]>
> > > wrote:
> > > >>
> > > >> > Hi Jing Ge,
> > > >> >
> > > >> > >>> Do you already have any prototype? I'd like to join the
> reviews.
> > > >> > The prototype is in progress. I will raise the dedicated PR for
> > review
> > > >> soon
> > > >> > also notify in this thread as well .
> > > >> >
> > > >> > >>> Will the Redshift connector provide additional features
> > > >> > beyond the mediator/wrapper of the jdbc connector?
> > > >> >
> > > >> > Here are the additional features that the Flink connector for AWS
> > > >> Redshift
> > > >> > can provide on top of using JDBC:
> > > >> >
> > > >> > 1. Integration with AWS Redshift Workload Management (WLM): AWS
> > > Redshift
> > > >> > allows you to configure WLM[1] to manage query prioritization and
> > > >> resource
> > > >> > allocation. The Flink connector for Redshift will be agnostic to
> the
> > > >> > configured WLM and utilize it for scaling in and out for the sink.
> > > This
> > > >> > means that the connector can leverage the WLM capabilities of
> > Redshift
> > > >> to
> > > >> > optimize the execution of queries and allocate resources
> efficiently
> > > >> based
> > > >> > on your defined workload priorities.
> > > >> >
> > > >> > 2. Abstraction of AWS Redshift Quotas and Limits: AWS Redshift
> > imposes
> > > >> > certain quotas and limits[2] on various aspects such as the number
> > of
> > > >> > clusters, concurrent connections, queries per second, etc. The
> Flink
> > > >> > connector for Redshift will provide an abstraction layer for
> users,
> > > >> > allowing them to work with Redshift without having to worry about
> > > these
> > > >> > specific limits. The connector will handle the management of
> > > connections
> > > >> > and queries within the defined quotas and limits, abstracting away
> > the
> > > >> > complexity and ensuring compliance with Redshift's restrictions.
> > > >> >
> > > >> > These features aim to simplify the integration of Flink with AWS
> > > >> Redshift,
> > > >> > providing optimized resource utilization and transparent handling
> of
> > > >> > Redshift-specific limitations.
> > > >> >
> > > >> > Bests,
> > > >> > Samrat
> > > >> >
> > > >> > [1]
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://docs.aws.amazon.com/redshift/latest/dg/cm-c-implementing-workload-management.html
> > > >> > [2]
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://docs.aws.amazon.com/redshift/latest/mgmt/amazon-redshift-limits.html
> > > >> >
> > > >> > On Sat, Jun 3, 2023 at 11:40 PM Samrat Deb <[email protected]
> >
> > > >> wrote:
> > > >> >
> > > >> > > Hi Ahmed,
> > > >> > >
> > > >> > > >>> please let me know If you need any collaboration regarding
> > > >> > integration
> > > >> > > with
> > > >> > > AWS connectors credential providers or regarding FLIP-171 I
> would
> > be
> > > >> more
> > > >> > > than happy to assist.
> > > >> > >
> > > >> > > Sure, I will reach out incase of any hands required.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Fri, Jun 2, 2023 at 6:12 PM Jing Ge
> <[email protected]
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > >> Hi Samrat,
> > > >> > >>
> > > >> > >> Excited to see your proposal. Supporting data warehouses is one
> > of
> > > >> the
> > > >> > >> major tracks for Flink. Thanks for driving it! Happy to see
> that
> > we
> > > >> > >> reached
> > > >> > >> consensus to prioritize the Sink over Source in the previous
> > > >> discussion.
> > > >> > >> Do
> > > >> > >> you already have any prototype? I'd like to join the reviews.
> > > >> > >>
> > > >> > >> Just out of curiosity, speaking of JDBC mode, according to the
> > > FLIP,
> > > >> it
> > > >> > >> should be doable to directly use the jdbc connector with
> > Redshift,
> > > >> if I
> > > >> > am
> > > >> > >> not mistaken. Will the Redshift connector provide additional
> > > features
> > > >> > >> beyond the mediator/wrapper of the jdbc connector?
> > > >> > >>
> > > >> > >> Best regards,
> > > >> > >> Jing
> > > >> > >>
> > > >> > >> On Thu, Jun 1, 2023 at 8:22 PM Ahmed Hamdy <
> [email protected]
> > >
> > > >> > wrote:
> > > >> > >>
> > > >> > >> > Hi Samrat
> > > >> > >> >
> > > >> > >> > Thanks for putting up this FLIP. I agree regarding the
> > importance
> > > >> of
> > > >> > the
> > > >> > >> > use case.
> > > >> > >> > please let me know If you need any collaboration regarding
> > > >> integration
> > > >> > >> with
> > > >> > >> > AWS connectors credential providers or regarding FLIP-171 I
> > would
> > > >> be
> > > >> > >> more
> > > >> > >> > than happy to assist.
> > > >> > >> > I also like Leonard's proposal for starting with
> DataStreamSink
> > > and
> > > >> > >> > TableSink, It would be great to have some milestones
> delivered
> > as
> > > >> soon
> > > >> > >> as
> > > >> > >> > ready.
> > > >> > >> > best regards
> > > >> > >> > Ahmed Hamdy
> > > >> > >> >
> > > >> > >> >
> > > >> > >> > On Wed, 31 May 2023 at 11:15, Samrat Deb <
> > [email protected]>
> > > >> > wrote:
> > > >> > >> >
> > > >> > >> > > Hi Liu Ron,
> > > >> > >> > >
> > > >> > >> > > > 1. Regarding the  `read.mode` and `write.mode`, you say
> > here
> > > >> > >> provides
> > > >> > >> > two
> > > >> > >> > > modes, respectively, jdbc and `unload or copy`, What is the
> > > >> default
> > > >> > >> value
> > > >> > >> > > for `read.mode` and `write.mode?
> > > >> > >> > >
> > > >> > >> > > I have made an effort to make the configuration options
> > > >> `read.mode`
> > > >> > >> and
> > > >> > >> > > `write.mode` mandatory for the "flink-connector-redshift"
> > > >> according
> > > >> > to
> > > >> > >> > > FLIP[1]. The rationale behind this decision is to empower
> > users
> > > >> who
> > > >> > >> are
> > > >> > >> > > familiar with their Redshift setup and have specific
> > > expectations
> > > >> > for
> > > >> > >> the
> > > >> > >> > > sink. By making these configurations mandatory, users can
> > have
> > > >> more
> > > >> > >> > control
> > > >> > >> > > and flexibility in configuring the connector to meet their
> > > >> > >> requirements.
> > > >> > >> > >
> > > >> > >> > > However, I am open to receiving feedback on whether it
> would
> > be
> > > >> > >> > beneficial
> > > >> > >> > > to make the configuration options non-mandatory and set
> > default
> > > >> > values
> > > >> > >> > for
> > > >> > >> > > them. If you believe there are advantages to having default
> > > >> values
> > > >> > or
> > > >> > >> any
> > > >> > >> > > other suggestions, please share your thoughts. Your
> feedback
> > is
> > > >> > highly
> > > >> > >> > > appreciated.
> > > >> > >> > >
> > > >> > >> > > >  2. For Source, does it both support batch read and
> > streaming
> > > >> > read?
> > > >> > >> > >
> > > >> > >> > > Redshift currently does not provide native support for
> > > streaming
> > > >> > >> reads,
> > > >> > >> > > although it does support streaming writes[2]. As part of
> the
> > > >> plan, I
> > > >> > >> > intend
> > > >> > >> > > to conduct a proof of concept and benchmarking to explore
> the
> > > >> > >> > possibilities
> > > >> > >> > > of implementing streaming reads using the Flink JDBC
> > connector,
> > > >> as
> > > >> > >> > Redshift
> > > >> > >> > > is JDBC compatible.
> > > >> > >> > > However, it is important to note that, in the initial phase
> > of
> > > >> > >> > > implementation, the focus will primarily be on supporting
> > batch
> > > >> > reads
> > > >> > >> > > rather than streaming reads. This approach will allow us to
> > > >> deliver
> > > >> > a
> > > >> > >> > > robust and reliable solution for batch processing in phase
> 2
> > of
> > > >> the
> > > >> > >> > > implementation.
> > > >> > >> > >
> > > >> > >> > > [1]
> > > >> > >> > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
> > > >> > >> > > [2]
> > > >> > >> > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html
> > > >> > >> > >
> > > >> > >> > > Bests,
> > > >> > >> > > Samrat
> > > >> > >> > >
> > > >> > >> > > On Wed, May 31, 2023 at 8:03 AM liu ron <
> [email protected]>
> > > >> wrote:
> > > >> > >> > >
> > > >> > >> > > > Hi, Samrat
> > > >> > >> > > >
> > > >> > >> > > > Thanks for driving this FLIP. It looks like supporting
> > > >> > >> > > > flink-connector-redshift is very useful to Flink. I have
> > two
> > > >> > >> question:
> > > >> > >> > > > 1. Regarding the  `read.mode` and `write.mode`, you say
> > here
> > > >> > >> provides
> > > >> > >> > two
> > > >> > >> > > > modes, respectively, jdbc and `unload or copy`, What is
> the
> > > >> > default
> > > >> > >> > value
> > > >> > >> > > > for `read.mode` and `write.mode?
> > > >> > >> > > > 2. For Source, does it both support batch read and
> > streaming
> > > >> read?
> > > >> > >> > > >
> > > >> > >> > > >
> > > >> > >> > > > Best,
> > > >> > >> > > > Ron
> > > >> > >> > > >
> > > >> > >> > > > Samrat Deb <[email protected]> 于2023年5月30日周二
> 17:15写道：
> > > >> > >> > > >
> > > >> > >> > > > > [1]
> > > >> > >> > > > >
> > > >> > >> > > > >
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
> > > >> > >> > > > >
> > > >> > >> > > > > [note] Missed the trailing link for previous mail
> > > >> > >> > > > >
> > > >> > >> > > > >
> > > >> > >> > > > >
> > > >> > >> > > > > On Tue, May 30, 2023 at 2:43 PM Samrat Deb <
> > > >> > [email protected]
> > > >> > >> >
> > > >> > >> > > > wrote:
> > > >> > >> > > > >
> > > >> > >> > > > > > Hi Leonard,
> > > >> > >> > > > > >
> > > >> > >> > > > > > > and I’m glad to help review the design as well as
> the
> > > >> code
> > > >> > >> > review.
> > > >> > >> > > > > > Thank you so much. It would be really great and
> helpful
> > > to
> > > >> > bring
> > > >> > >> > > > > > flink-connector-redshift for flink users :) .
> > > >> > >> > > > > >
> > > >> > >> > > > > > I have divided the implementation in 3 phases in the
> > > >> `Scope`
> > > >> > >> > > > Section[1].
> > > >> > >> > > > > > 1st phase is to
> > > >> > >> > > > > >
> > > >> > >> > > > > >    - Integrate with Flink Sink API (*FLIP-171*
> > > >> > >> > > > > >    <
> > > >> > >> > > > >
> > > >> > >> > >
> > > >> > >>
> > > >> >
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
> > > >> > >> > > > >
> > > >> > >> > > > > >    )
> > > >> > >> > > > > >
> > > >> > >> > > > > >
> > > >> > >> > > > > > > About the implementation phases, How about
> > prioritizing
> > > >> > >> support
> > > >> > >> > for
> > > >> > >> > > > the
> > > >> > >> > > > > > Datastream Sink API and TableSink API in the first
> > phase?
> > > >> > >> > > > > > I can completely resonate with you to prioritize
> > support
> > > >> for
> > > >> > >> > > Datastream
> > > >> > >> > > > > > Sink API and TableSink API in the first phase.
> > > >> > >> > > > > > I will update the FLIP[1] as you have suggested.
> > > >> > >> > > > > >
> > > >> > >> > > > > > > It seems that the primary use cases for the
> Redshift
> > > >> > connector
> > > >> > >> > are
> > > >> > >> > > > > > acting as a sink for processed data by Flink.
> > > >> > >> > > > > > Yes, majority ask and requirement for Redshift
> > connector
> > > is
> > > >> > sink
> > > >> > >> > for
> > > >> > >> > > > > > processed data by Flink.
> > > >> > >> > > > > >
> > > >> > >> > > > > > Bests,
> > > >> > >> > > > > > Samrat
> > > >> > >> > > > > >
> > > >> > >> > > > > > On Tue, May 30, 2023 at 12:35 PM Leonard Xu <
> > > >> > [email protected]>
> > > >> > >> > > wrote:
> > > >> > >> > > > > >
> > > >> > >> > > > > >> Thanks @Samrat for bringing this discussion.
> > > >> > >> > > > > >>
> > > >> > >> > > > > >> It makes sense to me to introduce AWS Redshift
> > connector
> > > >> for
> > > >> > >> > Apache
> > > >> > >> > > > > >> Flink, and I’m glad to help review the design as
> well
> > as
> > > >> the
> > > >> > >> code
> > > >> > >> > > > > review.
> > > >> > >> > > > > >>
> > > >> > >> > > > > >> About the implementation phases, How about
> > prioritizing
> > > >> > support
> > > >> > >> > for
> > > >> > >> > > > the
> > > >> > >> > > > > >> Datastream Sink API and TableSink API in the first
> > > phase?
> > > >> It
> > > >> > >> seems
> > > >> > >> > > > that
> > > >> > >> > > > > the
> > > >> > >> > > > > >> primary use cases for the Redshift connector are
> > acting
> > > >> as a
> > > >> > >> sink
> > > >> > >> > > for
> > > >> > >> > > > > >> processed data by Flink.
> > > >> > >> > > > > >>
> > > >> > >> > > > > >> Best,
> > > >> > >> > > > > >> Leonard
> > > >> > >> > > > > >>
> > > >> > >> > > > > >>
> > > >> > >> > > > > >> > On May 29, 2023, at 12:51 PM, Samrat Deb <
> > > >> > >> [email protected]
> > > >> > >> > >
> > > >> > >> > > > > wrote:
> > > >> > >> > > > > >> >
> > > >> > >> > > > > >> > Hello all ,
> > > >> > >> > > > > >> >
> > > >> > >> > > > > >> > Context:
> > > >> > >> > > > > >> > Amazon Redshift [1] is a fully managed,
> > petabyte-scale
> > > >> data
> > > >> > >> > > > warehouse
> > > >> > >> > > > > >> > service in the cloud. It allows analyzing data
> > without
> > > >> all
> > > >> > of
> > > >> > >> > the
> > > >> > >> > > > > >> > configurations of a provisioned data warehouse.
> > > >> Resources
> > > >> > are
> > > >> > >> > > > > >> automatically
> > > >> > >> > > > > >> > provisioned and data warehouse capacity is
> > > intelligently
> > > >> > >> scaled
> > > >> > >> > to
> > > >> > >> > > > > >> deliver
> > > >> > >> > > > > >> > fast performance for even the most demanding and
> > > >> > >> unpredictable
> > > >> > >> > > > > >> workloads.
> > > >> > >> > > > > >> > Redshift is one of the widely used warehouse
> > solutions
> > > >> in
> > > >> > the
> > > >> > >> > > > current
> > > >> > >> > > > > >> > market.
> > > >> > >> > > > > >> >
> > > >> > >> > > > > >> > Building flink connector redshift will allow flink
> > > >> users to
> > > >> > >> have
> > > >> > >> > > > > source
> > > >> > >> > > > > >> and
> > > >> > >> > > > > >> > sink directly to redshift. It will help flink to
> > > expand
> > > >> the
> > > >> > >> > scope
> > > >> > >> > > to
> > > >> > >> > > > > >> > redshift as a new connector in the ecosystem.
> > > >> > >> > > > > >> >
> > > >> > >> > > > > >> > I would like to start a discussion on the
> FLIP-307:
> > > >> Flink
> > > >> > >> > > connector
> > > >> > >> > > > > >> > redshift [2].
> > > >> > >> > > > > >> > Looking forward to comments, feedbacks and
> > suggestions
> > > >> from
> > > >> > >> the
> > > >> > >> > > > > >> community
> > > >> > >> > > > > >> > on the proposal.
> > > >> > >> > > > > >> >
> > > >> > >> > > > > >> > [1]
> > > >> > >> >
> https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html
> > > >> > >> > > > > >> > [2]
> > > >> > >> > > > > >> >
> > > >> > >> > > > > >>
> > > >> > >> > > > >
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
> > > >> > >> > > > > >> >
> > > >> > >> > > > > >> >
> > > >> > >> > > > > >> >
> > > >> > >> > > > > >> > Bests,
> > > >> > >> > > > > >> > Samrat
> > > >> > >> > > > > >>
> > > >> > >> > > > > >>
> > > >> > >> > > > >
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-307: Flink connector Redshift

Reply via email to