Re: [DISCUSS] FLIP-307: Flink connector Redshift

Samrat Deb Sat, 03 Jun 2023 12:19:11 -0700

Hi Jing Ge,

>>> Do you already have any prototype? I'd like to join the reviews.
The prototype is in progress. I will raise the dedicated PR for review soon
also notify in this thread as well .


>>> Will the Redshift connector provide additional features
beyond the mediator/wrapper of the jdbc connector?

Here are the additional features that the Flink connector for AWS Redshift
can provide on top of using JDBC:

1. Integration with AWS Redshift Workload Management (WLM): AWS Redshift
allows you to configure WLM[1] to manage query prioritization and resource
allocation. The Flink connector for Redshift will be agnostic to the
configured WLM and utilize it for scaling in and out for the sink. This
means that the connector can leverage the WLM capabilities of Redshift to
optimize the execution of queries and allocate resources efficiently based
on your defined workload priorities.

2. Abstraction of AWS Redshift Quotas and Limits: AWS Redshift imposes
certain quotas and limits[2] on various aspects such as the number of
clusters, concurrent connections, queries per second, etc. The Flink
connector for Redshift will provide an abstraction layer for users,
allowing them to work with Redshift without having to worry about these
specific limits. The connector will handle the management of connections
and queries within the defined quotas and limits, abstracting away the
complexity and ensuring compliance with Redshift's restrictions.

These features aim to simplify the integration of Flink with AWS Redshift,
providing optimized resource utilization and transparent handling of
Redshift-specific limitations.

Bests,
Samrat

[1]
https://docs.aws.amazon.com/redshift/latest/dg/cm-c-implementing-workload-management.html
[2]
https://docs.aws.amazon.com/redshift/latest/mgmt/amazon-redshift-limits.html

On Sat, Jun 3, 2023 at 11:40 PM Samrat Deb <[email protected]> wrote:

> Hi Ahmed,
>
> >>> please let me know If you need any collaboration regarding integration
> with
> AWS connectors credential providers or regarding FLIP-171 I would be more
> than happy to assist.
>
> Sure, I will reach out incase of any hands required.
>
>
>
> On Fri, Jun 2, 2023 at 6:12 PM Jing Ge <[email protected]> wrote:
>
>> Hi Samrat,
>>
>> Excited to see your proposal. Supporting data warehouses is one of the
>> major tracks for Flink. Thanks for driving it! Happy to see that we
>> reached
>> consensus to prioritize the Sink over Source in the previous discussion.
>> Do
>> you already have any prototype? I'd like to join the reviews.
>>
>> Just out of curiosity, speaking of JDBC mode, according to the FLIP, it
>> should be doable to directly use the jdbc connector with Redshift, if I am
>> not mistaken. Will the Redshift connector provide additional features
>> beyond the mediator/wrapper of the jdbc connector?
>>
>> Best regards,
>> Jing
>>
>> On Thu, Jun 1, 2023 at 8:22 PM Ahmed Hamdy <[email protected]> wrote:
>>
>> > Hi Samrat
>> >
>> > Thanks for putting up this FLIP. I agree regarding the importance of the
>> > use case.
>> > please let me know If you need any collaboration regarding integration
>> with
>> > AWS connectors credential providers or regarding FLIP-171 I would be
>> more
>> > than happy to assist.
>> > I also like Leonard's proposal for starting with DataStreamSink and
>> > TableSink, It would be great to have some milestones delivered as soon
>> as
>> > ready.
>> > best regards
>> > Ahmed Hamdy
>> >
>> >
>> > On Wed, 31 May 2023 at 11:15, Samrat Deb <[email protected]> wrote:
>> >
>> > > Hi Liu Ron,
>> > >
>> > > > 1. Regarding the  `read.mode` and `write.mode`, you say here
>> provides
>> > two
>> > > modes, respectively, jdbc and `unload or copy`, What is the default
>> value
>> > > for `read.mode` and `write.mode?
>> > >
>> > > I have made an effort to make the configuration options `read.mode`
>> and
>> > > `write.mode` mandatory for the "flink-connector-redshift" according to
>> > > FLIP[1]. The rationale behind this decision is to empower users who
>> are
>> > > familiar with their Redshift setup and have specific expectations for
>> the
>> > > sink. By making these configurations mandatory, users can have more
>> > control
>> > > and flexibility in configuring the connector to meet their
>> requirements.
>> > >
>> > > However, I am open to receiving feedback on whether it would be
>> > beneficial
>> > > to make the configuration options non-mandatory and set default values
>> > for
>> > > them. If you believe there are advantages to having default values or
>> any
>> > > other suggestions, please share your thoughts. Your feedback is highly
>> > > appreciated.
>> > >
>> > > >  2. For Source, does it both support batch read and streaming read?
>> > >
>> > > Redshift currently does not provide native support for streaming
>> reads,
>> > > although it does support streaming writes[2]. As part of the plan, I
>> > intend
>> > > to conduct a proof of concept and benchmarking to explore the
>> > possibilities
>> > > of implementing streaming reads using the Flink JDBC connector, as
>> > Redshift
>> > > is JDBC compatible.
>> > > However, it is important to note that, in the initial phase of
>> > > implementation, the focus will primarily be on supporting batch reads
>> > > rather than streaming reads. This approach will allow us to deliver a
>> > > robust and reliable solution for batch processing in phase 2 of the
>> > > implementation.
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
>> > > [2]
>> > >
>> > >
>> >
>> https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html
>> > >
>> > > Bests,
>> > > Samrat
>> > >
>> > > On Wed, May 31, 2023 at 8:03 AM liu ron <[email protected]> wrote:
>> > >
>> > > > Hi, Samrat
>> > > >
>> > > > Thanks for driving this FLIP. It looks like supporting
>> > > > flink-connector-redshift is very useful to Flink. I have two
>> question:
>> > > > 1. Regarding the  `read.mode` and `write.mode`, you say here
>> provides
>> > two
>> > > > modes, respectively, jdbc and `unload or copy`, What is the default
>> > value
>> > > > for `read.mode` and `write.mode?
>> > > > 2. For Source, does it both support batch read and streaming read?
>> > > >
>> > > >
>> > > > Best,
>> > > > Ron
>> > > >
>> > > > Samrat Deb <[email protected]> 于2023年5月30日周二 17:15写道：
>> > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
>> > > > >
>> > > > > [note] Missed the trailing link for previous mail
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, May 30, 2023 at 2:43 PM Samrat Deb <[email protected]
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi Leonard,
>> > > > > >
>> > > > > > > and I’m glad to help review the design as well as the code
>> > review.
>> > > > > > Thank you so much. It would be really great and helpful to bring
>> > > > > > flink-connector-redshift for flink users :) .
>> > > > > >
>> > > > > > I have divided the implementation in 3 phases in the `Scope`
>> > > > Section[1].
>> > > > > > 1st phase is to
>> > > > > >
>> > > > > >    - Integrate with Flink Sink API (*FLIP-171*
>> > > > > >    <
>> > > > >
>> > >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
>> > > > >
>> > > > > >    )
>> > > > > >
>> > > > > >
>> > > > > > > About the implementation phases, How about prioritizing
>> support
>> > for
>> > > > the
>> > > > > > Datastream Sink API and TableSink API in the first phase?
>> > > > > > I can completely resonate with you to prioritize support for
>> > > Datastream
>> > > > > > Sink API and TableSink API in the first phase.
>> > > > > > I will update the FLIP[1] as you have suggested.
>> > > > > >
>> > > > > > > It seems that the primary use cases for the Redshift connector
>> > are
>> > > > > > acting as a sink for processed data by Flink.
>> > > > > > Yes, majority ask and requirement for Redshift connector is sink
>> > for
>> > > > > > processed data by Flink.
>> > > > > >
>> > > > > > Bests,
>> > > > > > Samrat
>> > > > > >
>> > > > > > On Tue, May 30, 2023 at 12:35 PM Leonard Xu <[email protected]>
>> > > wrote:
>> > > > > >
>> > > > > >> Thanks @Samrat for bringing this discussion.
>> > > > > >>
>> > > > > >> It makes sense to me to introduce AWS Redshift connector for
>> > Apache
>> > > > > >> Flink, and I’m glad to help review the design as well as the
>> code
>> > > > > review.
>> > > > > >>
>> > > > > >> About the implementation phases, How about prioritizing support
>> > for
>> > > > the
>> > > > > >> Datastream Sink API and TableSink API in the first phase? It
>> seems
>> > > > that
>> > > > > the
>> > > > > >> primary use cases for the Redshift connector are acting as a
>> sink
>> > > for
>> > > > > >> processed data by Flink.
>> > > > > >>
>> > > > > >> Best,
>> > > > > >> Leonard
>> > > > > >>
>> > > > > >>
>> > > > > >> > On May 29, 2023, at 12:51 PM, Samrat Deb <
>> [email protected]
>> > >
>> > > > > wrote:
>> > > > > >> >
>> > > > > >> > Hello all ,
>> > > > > >> >
>> > > > > >> > Context:
>> > > > > >> > Amazon Redshift [1] is a fully managed, petabyte-scale data
>> > > > warehouse
>> > > > > >> > service in the cloud. It allows analyzing data without all of
>> > the
>> > > > > >> > configurations of a provisioned data warehouse. Resources are
>> > > > > >> automatically
>> > > > > >> > provisioned and data warehouse capacity is intelligently
>> scaled
>> > to
>> > > > > >> deliver
>> > > > > >> > fast performance for even the most demanding and
>> unpredictable
>> > > > > >> workloads.
>> > > > > >> > Redshift is one of the widely used warehouse solutions in the
>> > > > current
>> > > > > >> > market.
>> > > > > >> >
>> > > > > >> > Building flink connector redshift will allow flink users to
>> have
>> > > > > source
>> > > > > >> and
>> > > > > >> > sink directly to redshift. It will help flink to expand the
>> > scope
>> > > to
>> > > > > >> > redshift as a new connector in the ecosystem.
>> > > > > >> >
>> > > > > >> > I would like to start a discussion on the FLIP-307: Flink
>> > > connector
>> > > > > >> > redshift [2].
>> > > > > >> > Looking forward to comments, feedbacks and suggestions from
>> the
>> > > > > >> community
>> > > > > >> > on the proposal.
>> > > > > >> >
>> > > > > >> > [1]
>> > https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html
>> > > > > >> > [2]
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > Bests,
>> > > > > >> > Samrat
>> > > > > >>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-307: Flink connector Redshift

Reply via email to