from:"Nicholas"

[DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-05-04 Thread Nicholas

Hi everyone,




Pattern#withIn interface in CEP defines the maximum time interval in which a 
matching pattern has to be completed in order to be considered valid, which 
interval corresponds to the maximum time gap between first and the last event. 
The interval representing the maximum time gap between events is required to 
define in the scenario like purchasing good within a maximum of 5 minutes after 
browsing. 




I would like to start a discussion about FLIP-228[1], in which within between 
events is proposed in Pattern to support the definition of the maximum time 
interval in which a completed partial matching pattern is considered valid, 
which interval represents the maximum time gap between events for partial 
matching Pattern.




Hence we propose the Pattern#partialWithin interface to define the maximum time 
interval in which a completed partial matching pattern is considered valid. 
Please take a look at the FLIP page [1] to get more details. Any feedback about 
the FLIP-228 would be appreciated!




[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-228%3A+Support+Within+between+events+in+CEP+Pattern




Best regards,

Nicholas Jiang

[VOTE] FLIP-228: Support Within between events in CEP Pattern

2022-06-12 Thread Nicholas

Hi everyone，




Thanks for feedback for FLIP-228: Support Within between events in CEP 
Pattern[1] on the discussion thread[2]. I'd like to start a VOTE thread for 
FLIP-228.

The vote will be open for at least 72 hours unless there is an objection or not 
enough votes.




Regards,

Nicholas Jiang




[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-228%3A+Support+Within+between+events+in+CEP+Pattern

[2] https://lists.apache.org/thread/p60ctx213f8921rgklk5f0b6xfrs8ksz

[RESULT][VOTE] FLIP-228: Support Within between events in CEP Pattern

2022-06-16 Thread Nicholas

Hi dev,




FLIP-228: Support Within between events in CEP Pattern[1] has been accepted.




There are 3 binding votes, 5 non-binding votes[2].

- Martijn Visser(binding)

- Dian Fu(binding)

- Xingbo Huang(binding)

- md peng(non-binding)

- Rui Fan(non-binding)

- Utopia(non-binding)

- yue ma(non-binding)

- Jing Ge(non-binding)




None against.




Thanks again for every one who concerns on this FLIP.




[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-228%3A+Support+Within+between+events+in+CEP+Pattern
[2] https://lists.apache.org/thread/4t7kwrb9t4fp0kmqvjrth798z25c0r97



Regards,

Nicholas Jiang

Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-26 Thread Nicholas Jiang

Awesome! Big +1.

Regards,
Nicholas Jiang


On 2023/12/07 03:24:59 Leonard Xu wrote:
> Dear Flink devs,
> 
> As you may have heard, we at Alibaba (Ververica) are planning to donate CDC 
> Connectors for the Apache Flink project[1] to the Apache Flink community.
> 
> CDC Connectors for Apache Flink comprise a collection of source connectors 
> designed specifically for Apache Flink. These connectors[2] enable the 
> ingestion of changes from various databases using Change Data Capture (CDC), 
> most of these CDC connectors are powered by Debezium[3]. They support both 
> the DataStream API and the Table/SQL API, facilitating the reading of 
> database snapshots and continuous reading of transaction logs with 
> exactly-once processing, even in the event of failures.
> 
> 
> Additionally, in the latest version 3.0, we have introduced many long-awaited 
> features. Starting from CDC version 3.0, we've built a Streaming ELT 
> Framework available for streaming data integration. This framework allows 
> users to write their data synchronization logic in a simple YAML file, which 
> will automatically be translated into a Flink DataStreaming job. It 
> emphasizes optimizing the task submission process and offers advanced 
> functionalities such as whole database synchronization, merging sharded 
> tables, and schema evolution[4].
> 
> 
> I believe this initiative is a perfect match for both sides. For the Flink 
> community, it presents an opportunity to enhance Flink's competitive 
> advantage in streaming data integration, promoting the healthy growth and 
> prosperity of the Apache Flink ecosystem. For the CDC Connectors project, 
> becoming a sub-project of Apache Flink means being part of a neutral 
> open-source community, which can attract a more diverse pool of contributors.
> 
> Please note that the aforementioned points represent only some of our 
> motivations and vision for this donation. Specific future operations need to 
> be further discussed in this thread. For example, the sub-project name after 
> the donation; we hope to name it Flink-CDC aiming to streaming data 
> intergration through Apache Flink, following the naming convention of 
> Flink-ML; And this project is managed by a total of 8 maintainers, including 
> 3 Flink PMC members and 1 Flink Committer. The remaining 4 maintainers are 
> also highly active contributors to the Flink community, donating this project 
> to the Flink community implies that their permissions might be reduced. 
> Therefore, we may need to bring up this topic for further discussion within 
> the Flink PMC. Additionally, we need to discuss how to migrate existing users 
> and documents. We have a user group of nearly 10,000 people and a 
> multi-version documentation site need to migrate. We also need to plan for 
> the migration of CI/CD processes and other specifics. 
> 
> 
> While there are many intricate details that require implementation, we are 
> committed to progressing and finalizing this donation process.
> 
> 
> Despite being Flink’s most active ecological project (as evaluated by GitHub 
> metrics), it also boasts a significant user base. However, I believe it's 
> essential to commence discussions on future operations only after the 
> community reaches a consensus on whether they desire this donation.
> 
> 
> Really looking forward to hear what you think! 
> 
> 
> Best,
> Leonard (on behalf of the Flink CDC Connectors project maintainers)
> 
> [1] https://github.com/ververica/flink-cdc-connectors
> [2] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-connectors.html
> [3] https://debezium.io
> [4] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-pipeline.html

[DISCUSS]FLIP-150: Introduce Hybrid Source

2020-11-03 Thread Nicholas Jiang

Hi devs,

I'd like to start a new FLIP to introduce the Hybrid Source. The hybrid
source is a source that contains a list of concrete sources. The hybrid
source reads from each contained source in the defined order. It switches
from source A to the next source B when source A finishes.

In practice, many Flink jobs need to read data from multiple sources in
sequential order. Change Data Capture (CDC) and machine learning feature
backfill are two concrete scenarios of this consumption pattern. Users may
have to either run two different Flink jobs or have some hacks in the
SourceFunction to address such use cases. 

To support above scenarios smoothly, the Flink jobs need to first read from
HDFS for historical data then switch to Kafka for real-time records. The
hybrid source has several benefits from the user's perspective:

- Switching among multiple sources is easy based on the switchable source
implementations of different connectors.
- This supports to automatically switching for user-defined switchable
source that constitutes hybrid source.
- There is complete and effective mechanism to support smooth source
migration between historical and real-time data.

Therefore, in this discussion, we propose to introduce a “Hybrid Source” API
built on top of the new Source API (FLIP-27) to help users to smoothly
switch sources. For more detail, please refer to the FLIP design doc[1].

I'm looking forward to your feedback.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source>
  

Best,
Nicholas Jiang



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [VOTE] Apache Flink Kubernetes Operator Release 0.1.0, release candidate #3

2022-04-02 Thread Nicholas Jiang

+1 (non-binding)

1.Verified maven and helm chart versions for built from source
2.Verified helm chart points to correct docker image and deploys it by
default
3.Verified helm installation and
basic/checkpointing, stateful examples with upgrades and manual savepoints
4.Verified online documents including Quick Start etc.

Best,
Nicholas Jiang

On 2022/03/31 08:53:40 Yang Wang wrote:
> +1 (non-binding)
> 
> Verified via the following steps:
> 
> * Verify checksums and GPG signatures
> * Verify that the source distributions do not contain any binaries
> * Build source distribution successfully
> * Verify all the POM version is 0.1.0
> 
> * License check, the jars bundled in docker image and maven artifacts have
> correct NOTICE and licenses
> 
> # Functionality verification
> * Install flink-kubernetes-operator via helm
> - helm repo add flink-kubernetes-operator-0.1.0-rc3
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-0.1.0-rc3
> - helm install flink-kubernetes-operator
> flink-kubernetes-operator-0.1.0-rc3/flink-kubernetes-operator
> 
> * Apply a new FlinkDeployment CR with HA and ingress enabled, Flink webUI
> normal
> - kubectl apply -f
> https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-0.1/e2e-tests/data/cr.yaml
> 
> * Upgrade FlinkDeployment, new job parallelism takes effect and recover
> from latest checkpoint
> - kubectl patch flinkdep flink-example-statemachine --type merge
> --patch '{"spec":{"job": {"parallelism": 1 } } }'
> 
> * Verify manual savepoint trigger
> - kubectl patch flinkdep flink-example-statemachine --type merge
> --patch '{"spec":{"job": {"savepointTriggerNonce": 1 } } }'
> 
> * Suspend a FlinkDeployment
> - kubectl patch flinkdep flink-example-statemachine --type merge
> --patch '{"spec":{"job": {"state": "suspended" } } }'
> 
> 
> Best,
> Yang
> 
> Márton Balassi  于2022年3月31日周四 01:01写道：
> 
> > +1 (binding)
> >
> > Verified the following:
> >
> >- shasums
> >- gpg signatures
> >- source does not contain any binaries
> >- built from source
> >- deployed via helm after adding the distribution webserver endpoint as
> >a helm registry
> >- all relevant files have license headers
> >
> >
> > On Wed, Mar 30, 2022 at 4:39 PM Gyula Fóra  wrote:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #3 for the version 0.1.0
> > of
> > > Apache Flink Kubernetes Operator,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > **Release Overview**
> > >
> > > As an overview, the release consists of the following:
> > > a) Kubernetes Operator canonical source distribution (including the
> > > Dockerfile), to be deployed to the release repository at dist.apache.org
> > > b) Kubernetes Operator Helm Chart to be deployed to the release
> > repository
> > > at dist.apache.org
> > > c) Maven artifacts to be deployed to the Maven Central Repository
> > > d) Docker image to be pushed to dockerhub
> > >
> > > **Staging Areas to Review**
> > >
> > > The staging areas containing the above mentioned artifacts are as
> > follows,
> > > for your review:
> > > * All artifacts for a,b) can be found in the corresponding dev repository
> > > at dist.apache.org [1]
> > > * All artifacts for c) can be found at the Apache Nexus Repository [2]
> > > * The docker image is staged on github [7]
> > >
> > > All artifacts are signed with the key
> > > 0B4A34ADDFFA2BB54EB720B221F06303B87DAFF1 [3]
> > >
> > > Other links for your review:
> > > * JIRA release notes [4]
> > > * source code tag "release-0.1.0-rc3" [5]
> > > * PR to update the website Downloads page to include Kubernetes Operator
> > > links [6]
> > >
> > > **Vote Duration**
> > >
> > > The voting time will run for at least 72 hours.
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > votes.
> > >
> > > **Note on Verification**
> > >
> > > You can follow the basic verification guide here
> > > <
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
> > > >
> > > .
> > > Note that you don't need to verify everything yourself, but please make
> > > note of what you have tested together with your +- vote.
> > >
> > > Thanks,
> > > Gyula
> > >
> > > [1]
> > >
> > >
> > https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-0.1.0-rc3/
> > > [2]
> > > https://repository.apache.org/content/repositories/orgapacheflink-1492/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> > >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351499
> > > [5]
> > >
> > https://github.com/apache/flink-kubernetes-operator/tree/release-0.1.0-rc3
> > > [6] https://github.com/apache/flink-web/pull/519
> > > [7] ghcr.io/apache/flink-kubernetes-operator:2c166e3
> > >
> >
>

Re: [DISCUSS] Make Kubernetes Operator config "dynamic" and consider merging with flinkConfiguration

2022-04-02 Thread Nicholas Jiang

Thanks Gyula for discussing this topic! I also prefer Proposal 2 which merges 
*flinkConfiguration* and *operatorConfiguration* for easily understanding to 
end Flink users. IMO, from an end-user perspective, the *flinkConfiguration* 
and *operatorConfiguration* are the configuration related to the Flink 
deployment or job, that there is no need to distinguish and let users configure 
separately. 

Best,
Nicholas Jiang

On 2022/04/01 18:25:14 Gyula Fóra wrote:
> Hi Devs!
> 
> *Background*:
> With more and more features and options added to the flink kubernetes
> operator it would make sense to not expose everything as first class
> options in the deployment/jobspec (same as we do for flink configuration
> currently).
> 
> Furthermore it would be beneficial if users could control reconciliation
> specific settings like timeouts, reschedule delays etc on a per deployment
> basis.
> 
> 
> *Proposal 1*The more conservative proposal would be to add a new
> *operatorConfiguration* field to the deployment spec that the operator
> would use during the controller loop (merged with the default operator
> config). This makes the operator very extensible with new options and would
> also allow overrides to the default operator config on a per deployment
> basis.
> 
> 
> *Proposal 2*I would actually go one step further and propose that we should
> merge *flinkConfiguration* and *operatorConfiguration* -as whether
> something affects the flink job submission/job or the operator behaviour
> does not really make a difference to the end user. For users the operator
> is part of flink so having a multiple configuration maps could simply cause
> confusion.
> We could simply prefix all operator related configs with
> `kubernetes.operator` to ensure that we do not accidentally conflict with
> flink native config options.
> If we go this route I would even go as far as to naming it simply
> *configuration* for sake of simplicity.
> 
> I personally would go with proposal 2 to make this as simple as possible
> for the users.
> 
> Please let me know what you think!
> Gyula
>

Re: [DISCUSS] FLIP-220: Temporal State

2022-04-22 Thread Nicholas Jiang

Hi David.

Thanks for the proposal of Temporal State. I'm working for the benchmark of 
CEPOperator. I have question about how much performance improvement can 
Temporal State bring to CEPOperator and whether there is any expected 
performance?

Best,
Nicholas Jiang

On 2022/04/11 12:54:15 David Anderson wrote:
> Greetings, Flink developers.
> 
> I would like to open up a discussion of a proposal [1] to add a new kind of
> state to Flink.
> 
> The goal here is to optimize a fairly common pattern, which is using
> 
> MapState>
> 
> to store lists of events associated with timestamps. This pattern is used
> internally in quite a few operators that implement sorting and joins, and
> it also shows up in user code, for example, when implementing custom
> windowing in a KeyedProcessFunction.
> 
> Nico Kruber, Seth Wiesman, and I have implemented a POC that achieves a
> more than 2x improvement in throughput when performing these operations on
> RocksDB by better leveraging the capabilities of the RocksDB state backend.
> 
> See FLIP-220 [1] for details.
> 
> Best,
> David
> 
> [1] https://cwiki.apache.org/confluence/x/Xo_FD
>

Re: [DISCUSS] FLIP-91: Support SQL Client Gateway

2022-04-22 Thread Nicholas Jiang

Hi Shengkai.

Thanks for driving the proposal of SQL Client Gateway. I have some knowledge of 
Kyuubi and have some questions about the design:

1.Do the public interfaces of GatewayService refer to any service? If referring 
to HiveService, does GatewayService need interfaces like getQueryId etc.

2.What's the behavior of SQL Client Gateway working on Yarn or K8S? Does the 
SQL Client Gateway support application or session mode on Yarn?

3.Is there any event trigger in the operation state machine?

4.What's the return schema for the public interfaces of GatewayService? Like 
getTable interface, what's the return value schema?

5.How does the user get the operation log?

Thanks,
Nicholas Jiang

On 2022/04/21 06:42:30 Shengkai Fang wrote:
> Hi, Flink developers.
> 
> I want to start a discussion about the FLIP-91: Support Flink SQL
> Gateway[1]. Flink SQL Gateway is a service that allows users to submit and
> manage their jobs in the online environment with the pluggable endpoints.
> The reason why we introduce the Gateway with pluggable endpoints is that
> many users have their preferences. For example, the HiveServer2 users
> prefer to use the gateway with HiveServer2-style API, which has numerous
> tools. However, some filnk-native users may prefer to use the REST API.
> Therefore, we propose the SQL Gateway with pluggable endpoint.
> 
> In the FLIP, we also propose the REST endpoint, which has the similar
> APIs compared to the gateway in the ververica/flink-sql-gateway[2]. At the
> last, we discuss how to use the SQL Client to submit the statement to the
> Gateway with the REST API.
> 
> I am glad that you can give some feedback about FLIP-91.
> 
> Best,
> Shengkai
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> [2] https://github.com/ververica/flink-sql-gateway
>

Re: [DISCUSS] FLIP-223: Support HiveServer2 Endpoint

2022-04-22 Thread Nicholas Jiang

Hi Shengkai.

Thanks for driving the proposal of HiveServer2 Endpoint support. For the 
"GatewayService API Change", I don't think the motivation for supporting 
HiveServer2 endpoint need to change the GatewayService API, in other words, 
integrating the Hive ecosystem should not require changing the service 
interface. If you confirm to change GatewayService interface, IMO, the proposal 
could be discussed in FLIP-91 because the public interfaces are defined in 
FLIP-91.

In addtion, how to support different Hive versions and how to guarantee 
compatibility is not mentioned in the design. What's the behavior of the 
compatibility?

Finally, for the public interfaces, could you please fully provide its 
definition including input parameters and the corresponding return value 
schema?  

Thanks,
Nicholas Jiang

On 2022/04/21 06:45:13 Shengkai Fang wrote:
> Hi, Flink developers.
> 
> I want to start a discussion about the FLIP-223: Support HiveServer2
> Endpoint[1]. The Endpoint will implement the thrift interface exposed by
> the HiveServer2, and users' BI, CLI and other tools based on the
> HiveServer2 can also be seamlessly migrated to the Flink SQL Gateway. After
> the FLIP finishes, the users can have almost the same experience in the
> Flink SQL Gateway with the HiveServer2 endpoint as in the HiveServer2.
> 
> 
> I am glad that you can give some feedback about FLIP-223.
> 
> Best,
> Shengkai
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-223+Support+HiveServer2+Endpoint
>

Re: [VOTE] Apache Flink Table Store 0.1.0, release candidate #2

2022-05-04 Thread Nicholas Jiang

Hi everyone,

+1 for the release (non-binding). 

- Built and compiled source codes [PASSED]
- Went through quick start guide [PASSED]
- Checked README.md [PASSED]
- Checked that use the table store jar to build query table application [PASSED]

Best regards,

Nicholas Jiang

On 2022/04/29 02:24:09 Jingsong Li wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #2 for the version 0.1.0 of
> Apache Flink Table Store, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> **Release Overview**
> 
> As an overview, the release consists of the following:
> a) Table Store canonical source distribution, to be deployed to the
> release repository at dist.apache.org
> b) Maven artifacts to be deployed to the Maven Central Repository
> 
> **Staging Areas to Review**
> 
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a) and b) can be found in the corresponding dev
> repository at dist.apache.org [2]
> * All artifacts for c) can be found at the Apache Nexus Repository [3]
> * Pre Bundled Binaries Jar can work fine with quick start [4][5]
> 
> All artifacts are signed with the key
> 2C2B6A653B07086B65E4369F7C76245E0A318150 [6]
> 
> Other links for your review:
> * JIRA release notes [7]
> * source code tag "release-0.1.0-rc2" [8]
> * PR to update the website Downloads page to include Table Store
> links [9]
> 
> **Vote Duration**
> 
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
> 
> Best,
> Jingsong Lee
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Table+Store+Release
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-table-store-0.1.0-rc2/
> [3] https://repository.apache.org/content/repositories/orgapacheflink-1502/
> [4] 
> https://repository.apache.org/content/repositories/orgapacheflink-1502/org/apache/flink/flink-table-store-dist/0.1.0/flink-table-store-dist-0.1.0.jar
> [5] 
> https://nightlies.apache.org/flink/flink-table-store-docs-release-0.1/docs/try-table-store/quick-start/
> [6] https://dist.apache.org/repos/dist/release/flink/KEYS
> [7] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351234
> [8] https://github.com/apache/flink-table-store/tree/release-0.1.0-rc2
> [9] https://github.com/apache/flink-web/pull/531
>

Re: [DISCUSS] FLIP-91: Support SQL Client Gateway

2022-05-04 Thread Nicholas Jiang

Hi Shengkai,

I have another concern about the submission of batch job. Does the Flink SQL 
gateway support to submit batch job? In Kyuubi, BatchProcessBuilder is used to 
submit batch job. What about the Flink SQL gateway?

Best regards,
Nicholas Jiang

On 2022/04/24 03:28:36 Shengkai Fang wrote:
> Hi. Jiang.
> 
> Thanks for your feedback！
> 
> > Do the public interfaces of GatewayService refer to any service?
> 
> We will only expose one GatewayService implementation. We will put the
> interface into the common package and the developer who wants to implement
> a new endpoint can just rely on the interface package rather than the
> implementation.
> 
> > What's the behavior of SQL Client Gateway working on Yarn or K8S? Does
> the SQL Client Gateway support application or session mode on Yarn?
> 
> I think we can support SQL Client Gateway to submit the jobs in
> application/sesison mode.
> 
> > Is there any event trigger in the operation state machine?
> 
> Yes. I have already updated the content and add more details about the
> state machine. During the revise, I found that I mix up the two concepts:
> job submission and job execution. In fact, we only control the submission
> mode at the gateway layer. Therefore, we don't need to mapping the
> JobStatus here. If the user expects that the synchronization behavior is to
> wait for the completion of the job execution before allowing the next
> statement to be executed, then the Operation lifecycle should also contains
> the job's execution, which means users should set `table.dml-sync`.
> 
> > What's the return schema for the public interfaces of GatewayService?
> Like getTable interface, what's the return value schema?
> 
> The API of the GatewayService return the java objects and the endpoint can
> organize the objects with expected schema. The return results is also list
> the section ComponetAPI#GatewayService#API. The return type of the
> GatewayService#getTable is `ContextResolvedTable`.
> 
> > How does the user get the operation log?
> 
> The OperationManager will register the LogAppender before the Operation
> execution. The Log Appender will hijack the logger and also write the log
> that related to the Operation to another files. When users wants to fetch
> the Operation log, the GatewayService will read the content in the file and
> return.
> 
> Best,
> Shengkai
> 
> 
> 
> 
> Nicholas Jiang  于2022年4月22日周五 16:21写道：
> 
> > Hi Shengkai.
> >
> > Thanks for driving the proposal of SQL Client Gateway. I have some
> > knowledge of Kyuubi and have some questions about the design:
> >
> > 1.Do the public interfaces of GatewayService refer to any service? If
> > referring to HiveService, does GatewayService need interfaces like
> > getQueryId etc.
> >
> > 2.What's the behavior of SQL Client Gateway working on Yarn or K8S? Does
> > the SQL Client Gateway support application or session mode on Yarn?
> >
> > 3.Is there any event trigger in the operation state machine?
> >
> > 4.What's the return schema for the public interfaces of GatewayService?
> > Like getTable interface, what's the return value schema?
> >
> > 5.How does the user get the operation log?
> >
> > Thanks,
> > Nicholas Jiang
> >
> > On 2022/04/21 06:42:30 Shengkai Fang wrote:
> > > Hi, Flink developers.
> > >
> > > I want to start a discussion about the FLIP-91: Support Flink SQL
> > > Gateway[1]. Flink SQL Gateway is a service that allows users to submit
> > and
> > > manage their jobs in the online environment with the pluggable endpoints.
> > > The reason why we introduce the Gateway with pluggable endpoints is that
> > > many users have their preferences. For example, the HiveServer2 users
> > > prefer to use the gateway with HiveServer2-style API, which has numerous
> > > tools. However, some filnk-native users may prefer to use the REST API.
> > > Therefore, we propose the SQL Gateway with pluggable endpoint.
> > >
> > > In the FLIP, we also propose the REST endpoint, which has the similar
> > > APIs compared to the gateway in the ververica/flink-sql-gateway[2]. At
> > the
> > > last, we discuss how to use the SQL Client to submit the statement to the
> > > Gateway with the REST API.
> > >
> > > I am glad that you can give some feedback about FLIP-91.
> > >
> > > Best,
> > > Shengkai
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > [2] https://github.com/ververica/flink-sql-gateway
> > >
> >
>

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Nicholas Jiang

Congrats Yang!

Best regards,
Nicholas Jiang

On 2022/05/05 11:18:10 Xintong Song wrote:
> Hi all,
> 
> I'm very happy to announce that Yang Wang has joined the Flink PMC!
> 
> Yang has been consistently contributing to our community, by contributing
> codes, participating in discussions, mentoring new contributors, answering
> questions on mailing lists, and giving talks on Flink at
> various conferences and events. He is one of the main contributors and
> maintainers in Flink's Native Kubernetes / Yarn integrations and the Flink
> Kubernetes Operator.
> 
> Congratulations and welcome, Yang!
> 
> Thank you~
> 
> Xintong Song (On behalf of the Apache Flink PMC)
>

Re: [VOTE] FLIP-91: Support SQL Gateway

2022-05-22 Thread Nicholas Jiang

+1 (non-binding)

Best,
Nicholas Jiang

On 2022/05/20 02:38:39 Shengkai Fang wrote:
> Hi, everyone.
> 
> Thanks for your feedback for FLIP-91: Support SQL Gateway[1] on the
> discussion thread[2]. I'd like to start a vote for it. The vote will be
> open for at least 72 hours unless there is an objection or not enough votes.
> 
> Best,
> Shengkai
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Gateway
> [2]https://lists.apache.org/thread/gr7soo29z884r1scnz77r2hwr2xmd9b0
>

Re: [DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-05-30 Thread Nicholas Jiang

Hi Dian,

Thanks for your feedback about supprting the within between events feature. I 
have updated the FLIP for introduction of 'Pattern#within(WithinType 
withInType, Time windowTime)' interface. Regarding your comments, I have the 
following thoughts:

- Regarding the API, the name `partialWithin` sounds a little weird. Is it 
possible to find a name which is more intuitive?

I have introduced the 'Pattern#within(WithinType withInType, Time windowTime)' 
interface to cover the interval corresponds to the maximum time gap between 
events. With this interface, end-user can define the maximum time interval 
between the first and last event or between the before and after event for the 
matching pattern. The within(windowTime) can invoke Pattern#within(WithinType 
withInType, Time windowTime) with FIRST_AND_LAST within type and there is no 
incompatibility for within(windowTime). From the user's perspective, the 
interval corresponds to the maximum time gap in the introduced interface is 
clear and user-friendly so that there is no need to explain the semantics of 
the few corner cases mentioned above.

- Besides, this FLIP only describes how the newly introduced API will be used, 
however, it lacks details about how you will implement it.

I have updated the 'Proposed Changes' and explain the concrete implementation 
from the perspective of NFA compilation and execution. The main implementation 
includes constructing the mapping between the name of each computing state and 
the window time in the compilation phase. In the running phase, the latest 
timestamp of the current computing state needs to be maintained when computing 
the next computing state, which timestamp is used to check whether the 
computation state has timed out.

If there are other questions, any feedback is welcome.

Regards,
Nicholas Jiang

On 2022/05/07 09:16:55 Dian Fu wrote:
> Hi Nicholas,
> 
> Thanks a lot for bringing up this discussion. If I recall it correctly,
> this feature has been requested many times by the users and is among one of
> the most requested features in CEP. So big +1 to this feature overall.
> 
> Regarding the API, the name `partialWithin` sounds a little weird. Is it
> possible to find a name which is more intuitive? Other possible solutions:
> - Reuse the existing `Pattern.within` method and change its semantic to the
> maximum time interval between patterns. Currently `Pattern.within` is used
> to define the maximum time interval between the first event and the last
> event. However, the Pattern object represents only one node in a pattern
> sequence and so it doesn't make much sense to define the maximum time
> interval between the first event and the last event on the Pattern object,
> e.g. we could move it to  `PatternStreamBuilder`. However, if we choose
> this option, we'd better consider how to keep backward compatibility.
> - Introduce a series of methods when appending a new pattern to the
> existing one, e.g. `Pattern.followedBy(Pattern group, Time
> timeInterval)`. As timeInterval is a property between patterns and so it
> makes sense to define this property when appending a new pattern. However,
> the drawback is that we need to introduce a series of methods instead of
> only one method.
> 
> We need also to make the semantic clear in a few corner cases, e.g.
> - What's the semantic of `A.followedBy(B).times(3).partialWithin(1 min)`?
> Doesn't it mean that all three B events should occur in 1 minute or only
> the first B event should occur in 1 minute?
> - What's the semantic of
> `A.followedBy(GroupPattern.begin("B").followedBy("C")).partialWithin(1
> min)``? Doesn't it mean that B and C should occur after A in 1 minute?
> 
> Besides, this FLIP only describes how the newly introduced API will be
> used, however, it lacks details about how you will implement it. It doesn't
> need to be very detailed, however, you should describe the basic ideas
> behind it, e.g. how will you support A.notFollowedBy(B).partialWithin(1
> min)? It could make sure that you have considered it thoroughly and also
> makes others confident that this feature could be implemented in a clean
> way.
> 
> Regards,
> Dian
> 
> 
> 
> On Fri, May 6, 2022 at 7:32 PM yue ma  wrote:
> 
> > hi Nicholas,
> >
> > Thanks for bringing this discussion, we also think it's a useful feature.
> > Some fine-grained timeout pattern matching  can be implemented in CEP which
> > makes Flink CEP more powerful
> >
> > Nicholas  于2022年5月5日周四 14:28写道：
> >
> > > Hi everyone,
> > >
> > >
> > >
> > >
> > > Pattern#withIn interface in CEP defines the maximum time interval in
> > which
> > >

Re: [DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-05-30 Thread Nicholas Jiang

Hi Yue,

Thanks for providing the benefit of this feature. After this feature is merged, 
you are welcome to try this feature in business scenarios.

Regards,
Nicholas Jiang

On 2022/05/06 11:31:48 yue ma wrote:
> hi Nicholas,
> 
> Thanks for bringing this discussion, we also think it's a useful feature.
> Some fine-grained timeout pattern matching  can be implemented in CEP which
> makes Flink CEP more powerful
> 
> Nicholas  于2022年5月5日周四 14:28写道：
> 
> > Hi everyone,
> >
> >
> >
> >
> > Pattern#withIn interface in CEP defines the maximum time interval in which
> > a matching pattern has to be completed in order to be considered valid,
> > which interval corresponds to the maximum time gap between first and the
> > last event. The interval representing the maximum time gap between events
> > is required to define in the scenario like purchasing good within a maximum
> > of 5 minutes after browsing.
> >
> >
> >
> >
> > I would like to start a discussion about FLIP-228[1], in which within
> > between events is proposed in Pattern to support the definition of the
> > maximum time interval in which a completed partial matching pattern is
> > considered valid, which interval represents the maximum time gap between
> > events for partial matching Pattern.
> >
> >
> >
> >
> > Hence we propose the Pattern#partialWithin interface to define the maximum
> > time interval in which a completed partial matching pattern is considered
> > valid. Please take a look at the FLIP page [1] to get more details. Any
> > feedback about the FLIP-228 would be appreciated!
> >
> >
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-228%3A+Support+Within+between+events+in+CEP+Pattern
> >
> >
> >
> >
> > Best regards,
> >
> > Nicholas Jiang
>

Re: [DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-05-30 Thread Nicholas Jiang

Hi Martijn,

Sorry for later reply. This feature is only supported in DataStream and doesn't 
be supported in MATCH_RECOGNIZE because the SQL syntax of MATCH_RECOGNIZE does 
not contain the semantics of this feature, which requires modification of the 
SQL syntax. The support above MATCH_RECOGNIZE is suitable for new FLIP to 
discuss.

Regards,
Nicholas Jiang

On 2022/05/25 11:36:33 Martijn Visser wrote:
> Hi Nicholas,
> 
> Thanks for creating the FLIP, I can imagine that there will be many use
> cases who can be created using this new feature.
> 
> The FLIP doesn't mention anything with regards to SQL, could this feature
> also be supported when using MATCH_RECOGNIZE?
> 
> Best regards,
> 
> Martijn
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
> 
> 
> On Sat, 7 May 2022 at 11:17, Dian Fu  wrote:
> 
> > Hi Nicholas,
> >
> > Thanks a lot for bringing up this discussion. If I recall it correctly,
> > this feature has been requested many times by the users and is among one of
> > the most requested features in CEP. So big +1 to this feature overall.
> >
> > Regarding the API, the name `partialWithin` sounds a little weird. Is it
> > possible to find a name which is more intuitive? Other possible solutions:
> > - Reuse the existing `Pattern.within` method and change its semantic to the
> > maximum time interval between patterns. Currently `Pattern.within` is used
> > to define the maximum time interval between the first event and the last
> > event. However, the Pattern object represents only one node in a pattern
> > sequence and so it doesn't make much sense to define the maximum time
> > interval between the first event and the last event on the Pattern object,
> > e.g. we could move it to  `PatternStreamBuilder`. However, if we choose
> > this option, we'd better consider how to keep backward compatibility.
> > - Introduce a series of methods when appending a new pattern to the
> > existing one, e.g. `Pattern.followedBy(Pattern group, Time
> > timeInterval)`. As timeInterval is a property between patterns and so it
> > makes sense to define this property when appending a new pattern. However,
> > the drawback is that we need to introduce a series of methods instead of
> > only one method.
> >
> > We need also to make the semantic clear in a few corner cases, e.g.
> > - What's the semantic of `A.followedBy(B).times(3).partialWithin(1 min)`?
> > Doesn't it mean that all three B events should occur in 1 minute or only
> > the first B event should occur in 1 minute?
> > - What's the semantic of
> > `A.followedBy(GroupPattern.begin("B").followedBy("C")).partialWithin(1
> > min)``? Doesn't it mean that B and C should occur after A in 1 minute?
> >
> > Besides, this FLIP only describes how the newly introduced API will be
> > used, however, it lacks details about how you will implement it. It doesn't
> > need to be very detailed, however, you should describe the basic ideas
> > behind it, e.g. how will you support A.notFollowedBy(B).partialWithin(1
> > min)? It could make sure that you have considered it thoroughly and also
> > makes others confident that this feature could be implemented in a clean
> > way.
> >
> > Regards,
> > Dian
> >
> >
> >
> > On Fri, May 6, 2022 at 7:32 PM yue ma  wrote:
> >
> > > hi Nicholas,
> > >
> > > Thanks for bringing this discussion, we also think it's a useful feature.
> > > Some fine-grained timeout pattern matching  can be implemented in CEP
> > which
> > > makes Flink CEP more powerful
> > >
> > > Nicholas  于2022年5月5日周四 14:28写道：
> > >
> > > > Hi everyone,
> > > >
> > > >
> > > >
> > > >
> > > > Pattern#withIn interface in CEP defines the maximum time interval in
> > > which
> > > > a matching pattern has to be completed in order to be considered valid,
> > > > which interval corresponds to the maximum time gap between first and
> > the
> > > > last event. The interval representing the maximum time gap between
> > events
> > > > is required to define in the scenario like purchasing good within a
> > > maximum
> > > > of 5 minutes after browsing.
> > > >
> > > >
> > > >
> > > >
> > > > I would like to start a discussion about FLIP-228[1], in which within
> > > > between events is proposed in Pattern to support the definition of the
> > >

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.0.0, release candidate #3

2022-05-31 Thread Nicholas Jiang

Hi Yang!

+1 (not-binding)

Verified the following successfully:

- Build from source, build container from source
- Run the examples for application, session and session job deployments 
successfully and without any errors.

Regards,
Nicholas Jiang

On 2022/05/31 06:26:02 Yang Wang wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #3 for the version 1.0.0 of
> Apache Flink Kubernetes Operator,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> **Release Overview**
> 
> As an overview, the release consists of the following:
> a) Kubernetes Operator canonical source distribution (including the
> Dockerfile), to be deployed to the release repository at dist.apache.org
> b) Kubernetes Operator Helm Chart to be deployed to the release repository
> at dist.apache.org
> c) Maven artifacts to be deployed to the Maven Central Repository
> d) Docker image to be pushed to dockerhub
> 
> **Staging Areas to Review**
> 
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a,b) can be found in the corresponding dev repository
> at dist.apache.org [1]
> * All artifacts for c) can be found at the Apache Nexus Repository [2]
> * The docker image for d) is staged on github [7]
> 
> All artifacts are signed with the key
> 2FF2977BBBFFDF283C6FE7C6A301006F3591EE2C [3]
> 
> Other links for your review:
> * JIRA release notes [4]
> * source code tag "release-1.0.0-rc3" [5]
> * PR to update the website Downloads page to include Kubernetes Operator
> links [6]
> 
> **Vote Duration**
> 
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
> 
> **Note on Verification**
> 
> You can follow the basic verification guide here[8].
> Note that you don't need to verify everything yourself, but please make
> note of what you have tested together with your +- vote.
> 
> Thanks,
> Yang
> 
> [1]
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.0.0-rc3/
> [2] https://repository.apache.org/content/repositories/orgapacheflink-1505/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351500
> [5]
> https://github.com/apache/flink-kubernetes-operator/tree/release-1.0.0-rc3
> [6] https://github.com/apache/flink-web/pull/542
> [7] ghcr.io/apache/flink-kubernetes-operator:52b50c2
> [8]
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
>

Re: [DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-06-08 Thread Nicholas Jiang

Hi Dian,

Thanks for your feedback about the Public Interface update for supporting the 
within between events feature. I have left the comments for above points:

- Regarding the pattern API, should we also introduce APIs such as 
Pattern.times(int from, int to, Time windowTime) to indicate the time interval 
between events matched in the loop?

IMO, we could not introduce the mentioned APIs for indication of the time 
interval between events. For example Pattern.times(int from, int to, Time 
windowTime), the user can use Pattern.times(int from, int 
to).within(BEFORE_AND_AFTER, windowTime) to indicate the time interval between 
the before and after event.

- Regarding the naming of the classes, does it make sense to rename 
`WithinType` to `InternalType` or `WindowType`? For the enum values inside it, 
the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not intuitive 
for me. The candidates that come to my mind: - `RELATIVE_TO_FIRST` and 
`RELATIVE_TO_PREVIOUS` - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`

IMO, the `WithinType` naming could directly the situation for the time 
interval. In addtion. the enum values of the `WithinType` could update to 
`PREVIOUS_AND_NEXT` and `FIRST_AND_LAST` which directly indicate the time 
interval within the PREVIOUS and NEXT event and within the FIRST and LAST 
event. `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS` are not clear to 
understand which event is relative to FIRST or PREVIOUS event.

Best,
Nicholas Jiang

On 2022/06/06 07:48:22 Dian Fu wrote:
> Hi Nicholas,
> 
> Thanks a lot for the update.
> 
> Regarding the pattern API, should we also introduce APIs such as
> Pattern.times(int from, int to, Time windowTime) to indicate the time
> interval between events matched in the loop?
> 
> Regarding the naming of the classes, does it make sense to rename
> `WithinType` to `InternalType` or `WindowType`? For the enum values inside
> it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not
> intuitive for me. The candidates that come to my mind:
> - `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS`
> - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`
> 
> Regards,
> Dian
> 
> On Tue, May 31, 2022 at 2:56 PM Nicholas Jiang 
> wrote:
> 
> > Hi Martijn,
> >
> > Sorry for later reply. This feature is only supported in DataStream and
> > doesn't be supported in MATCH_RECOGNIZE because the SQL syntax of
> > MATCH_RECOGNIZE does not contain the semantics of this feature, which
> > requires modification of the SQL syntax. The support above MATCH_RECOGNIZE
> > is suitable for new FLIP to discuss.
> >
> > Regards,
> > Nicholas Jiang
> >
> > On 2022/05/25 11:36:33 Martijn Visser wrote:
> > > Hi Nicholas,
> > >
> > > Thanks for creating the FLIP, I can imagine that there will be many use
> > > cases who can be created using this new feature.
> > >
> > > The FLIP doesn't mention anything with regards to SQL, could this feature
> > > also be supported when using MATCH_RECOGNIZE?
> > >
> > > Best regards,
> > >
> > > Martijn
> > > https://twitter.com/MartijnVisser82
> > > https://github.com/MartijnVisser
> > >
> > >
> > > On Sat, 7 May 2022 at 11:17, Dian Fu  wrote:
> > >
> > > > Hi Nicholas,
> > > >
> > > > Thanks a lot for bringing up this discussion. If I recall it correctly,
> > > > this feature has been requested many times by the users and is among
> > one of
> > > > the most requested features in CEP. So big +1 to this feature overall.
> > > >
> > > > Regarding the API, the name `partialWithin` sounds a little weird. Is
> > it
> > > > possible to find a name which is more intuitive? Other possible
> > solutions:
> > > > - Reuse the existing `Pattern.within` method and change its semantic
> > to the
> > > > maximum time interval between patterns. Currently `Pattern.within` is
> > used
> > > > to define the maximum time interval between the first event and the
> > last
> > > > event. However, the Pattern object represents only one node in a
> > pattern
> > > > sequence and so it doesn't make much sense to define the maximum time
> > > > interval between the first event and the last event on the Pattern
> > object,
> > > > e.g. we could move it to  `PatternStreamBuilder`. However, if we choose
> > > > this option, we'd better consider how to keep backward compatibility.
> > > > - Introduce a series of methods when appending a new pattern to the
> > > > existing one, e.g. `Pattern.followedBy(Pattern gr

Re: MATCH_RECOGNIZE And Semantics

2022-06-08 Thread Nicholas Jiang

Hi Atri, Martijn,

MATCH_RECOGNIZE will support Batch mode in Flink 1.16. The pull request[1] 
which support MATCH_RECOGNIZE in Batch mode will be merge to master branch 
today. You could use in master branch.

Best,
Nicholas Jiang

[1]https://github.com/apache/flink/pull/18408

On 2022/06/08 08:30:17 Atri Sharma wrote:
> Ah, thanks. I should have been clearer -- I meant MATCH_RECOGNIZE for
> batch mode.
> 
> Thanks for clarifying!
> 
> On Wed, Jun 8, 2022 at 1:58 PM Martijn Visser  
> wrote:
> >
> > Hi Atri,
> >
> > Everything around MATCH_RECOGNIZE is documented [1]. Support in Batch mode
> > for MATCH_RECOGNIZE is planned for 1.16.
> >
> > Best regards,
> >
> > Martijn
> >
> > [1]
> > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/sql/queries/match_recognize/
> >
> > Op wo 8 jun. 2022 om 10:24 schreef Atri Sharma :
> >
> > > Hello,
> > >
> > > At my day job, we have run into a requirement to support MATCH_RECOGNIZE.
> > >
> > > I wanted to check if this is already a part of the roadmap, and if
> > > anybody is working on it.
> > >
> > > If not, I am willing to work on this, with the community's guidance.
> > >
> > > --
> > > Regards,
> > >
> > > Atri
> > > Apache Concerted
> > >
> 
> -- 
> Regards,
> 
> Atri
> Apache Concerted
>

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.0.0 released

2022-06-08 Thread Nicholas Jiang

Congrats! Thanks Yang for driving the 1.0.0 release, and thanks to all 
contribution.

Best,
Nicholas Jiang

On 2022/06/08 10:54:37 Lijie Wang wrote:
> Congrats! Thanks Yang for driving the release, and thanks to all
> contributors!
> 
> Best,
> Lijie
> 
> John Gerassimou  于2022年6月6日周一 22:38写道：
> 
> > Thank you for all your efforts!
> >
> > Thanks
> > John
> >
> > On Sun, Jun 5, 2022 at 10:33 PM Aitozi  wrote:
> >
> >> Thanks Yang and Nice to see it happen.
> >>
> >> Best,
> >> Aitozi.
> >>
> >> Yang Wang  于2022年6月5日周日 16:14写道：
> >>
> >>> The Apache Flink community is very happy to announce the release of
> >>> Apache Flink Kubernetes Operator 1.0.0.
> >>>
> >>> The Flink Kubernetes Operator allows users to manage their Apache Flink
> >>> applications and their lifecycle through native k8s tooling like kubectl.
> >>> This is the first production ready release and brings numerous
> >>> improvements and new features to almost every aspect of the operator.
> >>>
> >>> Please check out the release blog post for an overview of the release:
> >>>
> >>> https://flink.apache.org/news/2022/06/05/release-kubernetes-operator-1.0.0.html
> >>>
> >>> The release is available for download at:
> >>> https://flink.apache.org/downloads.html
> >>>
> >>> Maven artifacts for Flink Kubernetes Operator can be found at:
> >>>
> >>> https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator
> >>>
> >>> Official Docker image for Flink Kubernetes Operator applications can be
> >>> found at:
> >>> https://hub.docker.com/r/apache/flink-kubernetes-operator
> >>>
> >>> The full release notes are available in Jira:
> >>>
> >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351500
> >>>
> >>> We would like to thank all contributors of the Apache Flink community
> >>> who made this release possible!
> >>>
> >>> Regards,
> >>> Gyula & Yang
> >>>
> >>
>

Re: [DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-06-08 Thread Nicholas Jiang

Hi Dian,

About the indication of the time interval between events matched in the loop. I 
have updated the FLIP and introduced a series of times interface to specify 
that this pattern can occur the specified times and interval corresponds to the 
maximum time gap between previous and next event for each times. 

The within(withinType, windowTime) is used to configure the same time of the 
matching window for each times, but the times(int times, windowTimes) can 
configure the different time interval corresponds to the maximum time gap 
between previous and next event for each times, which is fully considered for 
time interval between events matched in the loop or times case.

Best,
Nicholas Jiang

On 2022/06/08 08:11:58 Nicholas Jiang wrote:
> Hi Dian,
> 
> Thanks for your feedback about the Public Interface update for supporting the 
> within between events feature. I have left the comments for above points:
> 
> - Regarding the pattern API, should we also introduce APIs such as 
> Pattern.times(int from, int to, Time windowTime) to indicate the time 
> interval between events matched in the loop?
> 
> IMO, we could not introduce the mentioned APIs for indication of the time 
> interval between events. For example Pattern.times(int from, int to, Time 
> windowTime), the user can use Pattern.times(int from, int 
> to).within(BEFORE_AND_AFTER, windowTime) to indicate the time interval 
> between the before and after event.
> 
> - Regarding the naming of the classes, does it make sense to rename 
> `WithinType` to `InternalType` or `WindowType`? For the enum values inside 
> it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not 
> intuitive for me. The candidates that come to my mind: - `RELATIVE_TO_FIRST` 
> and `RELATIVE_TO_PREVIOUS` - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`
> 
> IMO, the `WithinType` naming could directly the situation for the time 
> interval. In addtion. the enum values of the `WithinType` could update to 
> `PREVIOUS_AND_NEXT` and `FIRST_AND_LAST` which directly indicate the time 
> interval within the PREVIOUS and NEXT event and within the FIRST and LAST 
> event. `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS` are not clear to 
> understand which event is relative to FIRST or PREVIOUS event.
> 
> Best,
> Nicholas Jiang
> 
> On 2022/06/06 07:48:22 Dian Fu wrote:
> > Hi Nicholas,
> > 
> > Thanks a lot for the update.
> > 
> > Regarding the pattern API, should we also introduce APIs such as
> > Pattern.times(int from, int to, Time windowTime) to indicate the time
> > interval between events matched in the loop?
> > 
> > Regarding the naming of the classes, does it make sense to rename
> > `WithinType` to `InternalType` or `WindowType`? For the enum values inside
> > it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not
> > intuitive for me. The candidates that come to my mind:
> > - `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS`
> > - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`
> > 
> > Regards,
> > Dian
> > 
> > On Tue, May 31, 2022 at 2:56 PM Nicholas Jiang 
> > wrote:
> > 
> > > Hi Martijn,
> > >
> > > Sorry for later reply. This feature is only supported in DataStream and
> > > doesn't be supported in MATCH_RECOGNIZE because the SQL syntax of
> > > MATCH_RECOGNIZE does not contain the semantics of this feature, which
> > > requires modification of the SQL syntax. The support above MATCH_RECOGNIZE
> > > is suitable for new FLIP to discuss.
> > >
> > > Regards,
> > > Nicholas Jiang
> > >
> > > On 2022/05/25 11:36:33 Martijn Visser wrote:
> > > > Hi Nicholas,
> > > >
> > > > Thanks for creating the FLIP, I can imagine that there will be many use
> > > > cases who can be created using this new feature.
> > > >
> > > > The FLIP doesn't mention anything with regards to SQL, could this 
> > > > feature
> > > > also be supported when using MATCH_RECOGNIZE?
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > > https://twitter.com/MartijnVisser82
> > > > https://github.com/MartijnVisser
> > > >
> > > >
> > > > On Sat, 7 May 2022 at 11:17, Dian Fu  wrote:
> > > >
> > > > > Hi Nicholas,
> > > > >
> > > > > Thanks a lot for bringing up this discussion. If I recall it 
> > > > > correctly,
> > > > > this feature has been requested many times by the users and is among
> > > one of
> > > > > the most requested

Re: [VOTE] FLIP-223: Support HiveServer2 Endpoint

2022-06-08 Thread Nicholas Jiang

+1 (not-binding)

Best,
Nicholas Jiang

On 2022/06/07 05:31:21 Shengkai Fang wrote:
> Hi, everyone.
> 
> Thanks for all feedback for FLIP-223: Support HiveServer2 Endpoint[1] on
> the discussion thread[2]. I'd like to start a vote for it. The vote will be
> open for at least 72 hours unless there is an objection or not enough votes.
> 
> Best,
> Shengkai
> 
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint
> [2] https://lists.apache.org/thread/9r1j7ho2m8zbqy3tl7vvj9gnocggwr6x
>

Re: [DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-06-10 Thread Nicholas Jiang

Hi Dian, Guys,

Thanks for your suggestion for the `PREVIOUS_AND_CURRENT`. I have updated the 
naming of WithinType value in the FLIP. 

If there are no other questions, I will start the VOTE thread next Monday.

Regards,
Nicholas Jiang

On 2022/06/10 10:02:46 Dian Fu wrote:
> Hi Nicholas,
> 
> Regarding the naming of `WithinType`, I'm OK with it. For
> `PREVIOUS_AND_NEXT`, I guess `PREVIOUS_AND_CURRENT` makes more sense.
> What's your thought?
> 
> Regards,
> Dian
> 
> On Thu, Jun 9, 2022 at 10:09 AM Nicholas Jiang 
> wrote:
> 
> > Hi Dian,
> >
> > About the indication of the time interval between events matched in the
> > loop. I have updated the FLIP and introduced a series of times interface to
> > specify that this pattern can occur the specified times and interval
> > corresponds to the maximum time gap between previous and next event for
> > each times.
> >
> > The within(withinType, windowTime) is used to configure the same time of
> > the matching window for each times, but the times(int times, windowTimes)
> > can configure the different time interval corresponds to the maximum time
> > gap between previous and next event for each times, which is fully
> > considered for time interval between events matched in the loop or times
> > case.
> >
> > Best,
> > Nicholas Jiang
> >
> > On 2022/06/08 08:11:58 Nicholas Jiang wrote:
> > > Hi Dian,
> > >
> > > Thanks for your feedback about the Public Interface update for
> > supporting the within between events feature. I have left the comments for
> > above points:
> > >
> > > - Regarding the pattern API, should we also introduce APIs such as
> > Pattern.times(int from, int to, Time windowTime) to indicate the time
> > interval between events matched in the loop?
> > >
> > > IMO, we could not introduce the mentioned APIs for indication of the
> > time interval between events. For example Pattern.times(int from, int to,
> > Time windowTime), the user can use Pattern.times(int from, int
> > to).within(BEFORE_AND_AFTER, windowTime) to indicate the time interval
> > between the before and after event.
> > >
> > > - Regarding the naming of the classes, does it make sense to rename
> > `WithinType` to `InternalType` or `WindowType`? For the enum values inside
> > it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not
> > intuitive for me. The candidates that come to my mind: -
> > `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS` - `WHOLE_MATCH` and
> > `RELATIVE_TO_PREVIOUS`
> > >
> > > IMO, the `WithinType` naming could directly the situation for the time
> > interval. In addtion. the enum values of the `WithinType` could update to
> > `PREVIOUS_AND_NEXT` and `FIRST_AND_LAST` which directly indicate the time
> > interval within the PREVIOUS and NEXT event and within the FIRST and LAST
> > event. `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS` are not clear to
> > understand which event is relative to FIRST or PREVIOUS event.
> > >
> > > Best,
> > > Nicholas Jiang
> > >
> > > On 2022/06/06 07:48:22 Dian Fu wrote:
> > > > Hi Nicholas,
> > > >
> > > > Thanks a lot for the update.
> > > >
> > > > Regarding the pattern API, should we also introduce APIs such as
> > > > Pattern.times(int from, int to, Time windowTime) to indicate the time
> > > > interval between events matched in the loop?
> > > >
> > > > Regarding the naming of the classes, does it make sense to rename
> > > > `WithinType` to `InternalType` or `WindowType`? For the enum values
> > inside
> > > > it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not
> > > > intuitive for me. The candidates that come to my mind:
> > > > - `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS`
> > > > - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > On Tue, May 31, 2022 at 2:56 PM Nicholas Jiang <
> > nicholasji...@apache.org>
> > > > wrote:
> > > >
> > > > > Hi Martijn,
> > > > >
> > > > > Sorry for later reply. This feature is only supported in DataStream
> > and
> > > > > doesn't be supported in MATCH_RECOGNIZE because the SQL syntax of
> > > > > MATCH_RECOGNIZE does not contain the semantics of this feature, which
> > > > > requires modification of the SQL syntax. Th

Re: [ANNOUNCE] New Apache Flink PMC Member - Jingsong Lee

2022-06-13 Thread Nicholas Jiang

Congratulations, Jingsong!

Regards,
Nicholas Jiang

On 2022/06/13 07:01:11 Qingsheng Ren wrote:
> Congratulations, Jingsong! 
> 
> Best, 
> Qingsheng
> 
> > On Jun 13, 2022, at 14:58, Becket Qin  wrote:
> > 
> > Hi all,
> > 
> > I'm very happy to announce that Jingsong Lee has joined the Flink PMC!
> > 
> > Jingsong became a Flink committer in Feb 2020 and has been continuously
> > contributing to the project since then, mainly in Flink SQL. He has been
> > quite active in the mailing list, fixing bugs, helping verifying releases,
> > reviewing patches and FLIPs. Jingsong is also devoted to pushing Flink SQL
> > to new use cases. He spent a lot of time in implementing the Flink
> > connectors for Apache Iceberg. Jingsong is also the primary driver behind
> > the effort of flink-table-store, which aims to provide a stream-batch
> > unified storage for Flink dynamic tables.
> > 
> > Congratulations and welcome, Jingsong!
> > 
> > Cheers,
> > 
> > Jiangjie (Becket) Qin
> > (On behalf of the Apache Flink PMC)
> 
>

Re: [ANNOUNCE] New Apache Flink Committer - Ingo Bürk

2021-12-12 Thread Nicholas Jiang

Congratulations, Ingo! 

Best,
Nicholas Jiang

Re: [ANNOUNCE] New Apache Flink Committer - Matthias Pohl

2021-12-12 Thread Nicholas Jiang

Congratulations, Matthias!

Best,
Nicholas Jiang

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-12 Thread Nicholas Jiang

Hi Martijn,

   IMO, in this FLIP, we only need to introduce the general design of the Table 
API/SQL level. As for the design details, you can create a new FLIP. And do we 
need to take into account the support for Batch mode if you expand the 
MATCH_RECOGNIZE function? About the dynamic rule engine design, do you have any 
comments? This core of the FLIP is about the multiple rule and dynamic rule 
changing mechanism.

Best,
Nicholas Jiang

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-13 Thread Nicholas Jiang

Hi Konstantin,

   Thanks for your feedback. The point that add a timestamp to each rule that 
determines the start time from which the rule makes sense to me. At present, 
The timestamp is current time at default, so no timestamp field represents the 
start time from which the rule. And about the renaming rule, your suggestion 
looks good to me and no any new concept introduces. But does this introduce 
Rule concept or reuse the Pattern concept for the DynamicPattern renaming?

Best,
Nicholas Jiang

On 2021/12/13 07:45:04 Konstantin Knauf wrote:
> Thanks, Yufeng, for starting this discussion. I think this will be a very
> popular feature. I've seen a lot of users asking for this in the past. So,
> generally big +1.
> 
> I think we should have a rough idea on how to expose this feature in the
> other APIs.
> 
> Two ideas:
> 
> 1. In order to make this more deterministic in case of reprocessing and
> out-of-orderness, I am wondering if we can add a timestamp to each rule
> that determines the start time from which the rule should be in effect.
> This can be an event or a processing time depending on the characteristics
> of the pipeline. The timestamp would default to Long.MIN_TIMESTAMP if not
> provided, which means effectively immediately. This could also be a follow
> up, if you think it will make the implementation too complicated initially.
> 
> 2. I am wondering, if we should name Rule->DynamicPatternHolder or so and
> CEP.rule-> CEP.dynamicPatterns instead (other classes correspondingly)?
> Rule is quite ambiguous and DynamicPattern seems more descriptive to me.
> 
> On Mon, Dec 13, 2021 at 4:30 AM Nicholas Jiang 
> wrote:
> 
> > Hi Martijn,
> >
> >IMO, in this FLIP, we only need to introduce the general design of the
> > Table API/SQL level. As for the design details, you can create a new FLIP.
> > And do we need to take into account the support for Batch mode if you
> > expand the MATCH_RECOGNIZE function? About the dynamic rule engine design,
> > do you have any comments? This core of the FLIP is about the multiple rule
> > and dynamic rule changing mechanism.
> >
> > Best,
> > Nicholas Jiang
> >
> 
> 
> -- 
> 
> Konstantin Knauf
> 
> https://twitter.com/snntrable
> 
> https://github.com/knaufk
>

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-13 Thread Nicholas Jiang

Hi Konstantin,

   About the renaming for the Rule, I mean that the difference between the Rule 
and Pattern is that the Rule not only contains the Pattern, but also how to 
match the Pattern, and how to process after matched. If renaming 
DynamicPattern, I'm concerned that this name couldn't represent how to match 
the Pattern, and how to process after matched but the Rule could explain this. 
Therefore I prefer to rename the Rule not the DynamicPattern.

Best,
Nicholas Jiang


On 2021/12/13 08:23:23 Konstantin Knauf wrote:
> Hi Nicholas,
> 
> I am not sure I understand your question about renaming. I think the most
> important member of the current Rule class is the Pattern, the KeySelector
> and PatternProcessFunction are more auxiliary if you will. That's why, I
> think, it would be ok to rename Rule to DynamicPatternHolder although it
> contains more than just a Pattern.
> 
> Cheers,
> 
> Konstantin
> 
> On Mon, Dec 13, 2021 at 9:16 AM Nicholas Jiang 
> wrote:
> 
> > Hi Konstantin,
> >
> >Thanks for your feedback. The point that add a timestamp to each rule
> > that determines the start time from which the rule makes sense to me. At
> > present, The timestamp is current time at default, so no timestamp field
> > represents the start time from which the rule. And about the renaming rule,
> > your suggestion looks good to me and no any new concept introduces. But
> > does this introduce Rule concept or reuse the Pattern concept for the
> > DynamicPattern renaming?
> >
> > Best,
> > Nicholas Jiang
> >
> > On 2021/12/13 07:45:04 Konstantin Knauf wrote:
> > > Thanks, Yufeng, for starting this discussion. I think this will be a very
> > > popular feature. I've seen a lot of users asking for this in the past.
> > So,
> > > generally big +1.
> > >
> > > I think we should have a rough idea on how to expose this feature in the
> > > other APIs.
> > >
> > > Two ideas:
> > >
> > > 1. In order to make this more deterministic in case of reprocessing and
> > > out-of-orderness, I am wondering if we can add a timestamp to each rule
> > > that determines the start time from which the rule should be in effect.
> > > This can be an event or a processing time depending on the
> > characteristics
> > > of the pipeline. The timestamp would default to Long.MIN_TIMESTAMP if not
> > > provided, which means effectively immediately. This could also be a
> > follow
> > > up, if you think it will make the implementation too complicated
> > initially.
> > >
> > > 2. I am wondering, if we should name Rule->DynamicPatternHolder or so and
> > > CEP.rule-> CEP.dynamicPatterns instead (other classes correspondingly)?
> > > Rule is quite ambiguous and DynamicPattern seems more descriptive to me.
> > >
> > > On Mon, Dec 13, 2021 at 4:30 AM Nicholas Jiang  > >
> > > wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > >IMO, in this FLIP, we only need to introduce the general design of
> > the
> > > > Table API/SQL level. As for the design details, you can create a new
> > FLIP.
> > > > And do we need to take into account the support for Batch mode if you
> > > > expand the MATCH_RECOGNIZE function? About the dynamic rule engine
> > design,
> > > > do you have any comments? This core of the FLIP is about the multiple
> > rule
> > > > and dynamic rule changing mechanism.
> > > >
> > > > Best,
> > > > Nicholas Jiang
> > > >
> > >
> > >
> > > --
> > >
> > > Konstantin Knauf
> > >
> > > https://twitter.com/snntrable
> > >
> > > https://github.com/knaufk
> > >
> >
> 
> 
> -- 
> 
> Konstantin Knauf
> 
> https://twitter.com/snntrable
> 
> https://github.com/knaufk
>

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-13 Thread Nicholas Jiang

Hi DianFu,
 
 Thanks for your feedback of the FLIP. 
 
 About the mentioned question for the `getLatestRules`, IMO, this doesn't 
need to rename into `getRuleChanges` because this method is used for getting 
the total amount of the latest rules which has been updated once. 
 
 About the CEP.rule method, the CEP.dynamicPattern renaming is confusing 
for users. The dynamic pattern only creates the PatternStream not the 
DataStream. From the concept, a dynamic pattern is also a pattern, not contains 
the PatternProcessFunction. If renaming the CEP.rule into CEP.dynamicPattern, 
the return value of the method couldn't include the  PatternProcessFunction, 
only returns the PatternStream. I think the difference between the Rule and the 
Pattern is that Rule contains the PatternProcessFunction, but the Pattern or 
DynamicPattern doesn't contain the function.

Best
Nicholas Jiang

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-14 Thread Nicholas Jiang

Hi Konstantin,

Thanks for your detailed explanation for DynamicPattern[Holder] renaming. I 
have another idea for this renaming, what about renaming the Rule to 
PatternProcessor? The CEP means that complex event processing, thus the name 
PatternProcessor corresponds to the concept of CEP.
A PatternProcessor contains the specific pattern and how to process the 
pattern, and this contains the dynamic meaning. What's more, CEP.rule() method 
could be renamed to CEP.patternProcess(). WDYT?

Best,
Nicholas Jiang

On 2021/12/14 07:32:46 Konstantin Knauf wrote:
> Hi Nicholas,
> 
> I understand that a Rule contains more than the Pattern. Still, I favor
> DynamicPattern[Holder] over Rule, because the term "Rule" does not exist in
> Flink's CEP implementation so far and "dynamic" seems to be the important
> bit here.
> 
> Cheers,
> 
> Konstantin
> 
> On Tue, Dec 14, 2021 at 4:46 AM Nicholas Jiang 
> wrote:
> 
> > Hi DianFu,
> >
> >  Thanks for your feedback of the FLIP.
> >
> >  About the mentioned question for the `getLatestRules`, IMO, this
> > doesn't need to rename into `getRuleChanges` because this method is used
> > for getting the total amount of the latest rules which has been updated
> > once.
> >
> >  About the CEP.rule method, the CEP.dynamicPattern renaming is
> > confusing for users. The dynamic pattern only creates the PatternStream not
> > the DataStream. From the concept, a dynamic pattern is also a pattern, not
> > contains the PatternProcessFunction. If renaming the CEP.rule into
> > CEP.dynamicPattern, the return value of the method couldn't include the
> > PatternProcessFunction, only returns the PatternStream. I think the
> > difference between the Rule and the Pattern is that Rule contains the
> > PatternProcessFunction, but the Pattern or DynamicPattern doesn't contain
> > the function.
> >
> > Best
> > Nicholas Jiang
> >
> 
> 
> -- 
> 
> Konstantin Knauf
> 
> https://twitter.com/snntrable
> 
> https://github.com/knaufk
>

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-14 Thread Nicholas Jiang

Hi Martijin,

Thanks for your feedback about my email reply. I left comments for all the 
points you mentioned:

-- Since this FLIP addresses the most watched Jira ticket for Flink 
(https://issues.apache.org/jira/browse/FLINK-7129), I'm assuming this will be 
used a lot. Does this also mean that you would like to take ownership of the 
CEP library as a whole?

About the Jira ticket for Flink 
(https://issues.apache.org/jira/browse/FLINK-7129), if Dawid Wysakowicz has no 
time to work on this ticket, the ticket could be assigned to Yunfeng Zhou or 
me. We could drive the ticket and push the implementation of this FLIP.

-- If we want to support multiple rule and dynamic rule changing for use cases 
in domains like risk controlling or fraud, I do think we need to have a good 
look at eventual consistency. What do we do in situations where the Operator 
Coordinator can't access the database? I could imagine that it makes sense to 
make it configurable how often the database will be queried for new or updated 
rules or how many retries the Operator Coordinator will take before failing the 
job.

In the baseline implemenation, how often the database will be queried for new 
is configurable for the rule(pattern processor) discoverer, only this point 
doesn't mention in this FLIP. 

-- A similar concern is what I have if for whatever reason the different 
taskmanagers can't get the latest rules, so some taskmanagers might run on the 
latest rule changes while some might use older versions. Those type of issues 
can be quite hard to debug. Do we want to introduce the config option to fail a 
job in case a taskmanager doesn't get the latest rules?

Good point. If for whatever reason the different taskmanagers can't get the 
latest rule, the Operator Coordinator could send a heartbeat to all 
taskmanagers with the latest rules and check the heartbeat response from all 
the taskmanagers whether the latest rules of the taskmanager is equal to these 
of the Operator Coordinator.

--  Do we need to take certain guarantees (like at least once) in account for 
this setup and/or document these? What happens in the situation where the 
cluster crashes and has to recover from a savepoint of let's say 3 hours ago, 
but the rules in the database have changed 2 hours ago. That means for the 
events that are processed again after 2 hours, the output can be different 
because the rules have changed.

The certain guarantees is needed in the whole mechanism. We would like to 
document the behavior for all failover and crash cases to tell users with the 
process logic. For the failover in this FLIP, we will add the detailed 
explanation to show the process logic for failover and crash cases.

-- In my previous job, we've created a similar system like this. The 
differences there were that we didn't use the jobmanager to send the results to 
the taskmanagers, but the taskmanagers queried the database in periodic 
intervals. Each taskmanager retrieved all the rules that were applicable and 
cached them. Is this also something that you considered?

We have consided about the solution mentioned above. In this solution, I have 
some questions about how to guarantee the consistency of the rule between each 
TaskManager. By having a coodinator in the JobManager to centrally manage the 
latest rules, the latest rules of all TaskManagers are consistent with those of 
the JobManager, so as to avoid the inconsistencies that may be encountered in 
the above solution. Can you introduce how this solution guarantees the 
consistency of the rules?

About setting a timestamp for when a rule should be active, in the latest FLIP, 
PatternProcessor interface adds the `getTimestamp` method to support the 
ability for the event time or the processing time.

In summary, the current design is that JobManager tells all TaskManagers the 
latest rules through OperatorCoodinator, and will initiate a heartbeat to check 
whether the latest rules on each TaskManager are consistent. We will describe 
how to deal with the Failover scenario in more detail on FLIP.

Best,
Nicholas Jiang

On 2021/12/14 13:06:16 Martijn Visser wrote:
> Hi all,
> 
> > IMO, in this FLIP, we only need to introduce the general design of the
> Table API/SQL level. As for the design details, you can create a new FLIP.
> 
> Do you think that the current section on Table/SQL API support is
> sufficient as a general design?
> 
> > And do we need to take into account the support for Batch mode if you
> expand the MATCH_RECOGNIZE function?
> 
> Yes, I do think so since adding support for Batch mode for MATCH_RECOGNIZE
> is envisioned for Flink 1.15, but is also in danger of not making it.
> Relevant ticket number is https://issues.apache.org/jira/browse/FLINK-24865
> 
> With regards to the content of the FLIP, I have a couple of questions or
> concerns:
> 
> * Since this FLIP addresse

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-19 Thread Nicholas Jiang

Hi Yue,

Thanks for your feedback of the FLIP. I have addressed your questions and made 
a corresponding explanation as follows:

-- About Pattern Updating. If we use PatternProcessoerDiscoverer to update the 
rules, will it increase the load of JM? For example, if the user wants the 
updated rule to take effect immediately,, which means that we need to set a 
shorter check interval or there is another scenario when users rarely update 
the pattern, will the PatternProcessoerDiscoverer be in most of the time Do 
useless checks ? Will a lazy update mode could be used, which the pattern only 
be updated when triggered by the user, and do nothing at other times?

PatternProcessoerDiscoverer is a user-defined interface to discover the 
PatternProcessor updates. Periodically checking the PatternProcessor in the 
database is a implementation of the PatternProcessoerDiscoverer interface, 
which is that periodically querys all the PatternProcessor table in certain 
interval. This implementation indeeds has the useless checks, and could 
directly integrates the changelog of the table. In addition, in addition to the 
implementation of periodically checking the database, there are other 
implementations such as the PatternProcessor that provides Restful services to 
receive updates. 

--  I still have some confusion about how Key Generating Opertator and 
CepOperator (Pattern Matching & Processing Operator) work together. If there 
are N PatternProcessors, will the Key Generating Opertator generate N 
keyedStreams, and then N CepOperator would process each Key separately ? Or 
every CepOperator Task would process all patterns, if so, does the key type in 
each PatternProcessor need to be the same?

Firstly the Pattern Matching & Processing Operator is not the CepOperator at 
present, because CepOperator mechanism is based on the NFAState. Secondly if 
there are N PatternProcessors, the Key Generating Opertator combines all the 
keyedStreams with keyBy() operation, thus the Pattern Matching & Processing 
Operator would process all the patterns. In other words, the KeySelector of the 
PatternProcessor is used for the Key Generating Opertator, and the Pattern and 
PatternProceessFunction of the PatternProcessor are used for the Pattern 
Matching & Processing Operator. Lastly the key type in each PatternProcessor is 
the same, regarded as Object type. 

-- Maybe need to pay attention to it when implementing it .If some Pattern has 
been removed or updated, will the partially matched results in StateBackend 
would be clean up or We rely on state ttl to clean up these expired states.

If certain Pattern has been removed or updated, the partially matched results 
in StateBackend would be clean up until the next checkpoint. The partially 
matched result doesn't depend on the state ttl of the StateBackend. 

4. Will the PatternProcessorManager keep all the active PatternProcessor in 
memory? We have also Support Multiple Rule and Dynamic Rule Changing. But we 
are facing such a problem, some users’ usage scenarios are that they want to 
have their own pattern for each user_id, which means that there could be 
thousands of patterns, which would make the performance of Pattern Matching 
very poor. We are also trying to solve this problem.

The PatternProcessorManager keeps all the active PatternProcessor in memory. 
For scenarios that they want to have their own pattern for each user_id, IMO, 
is it possible to reduce the fine-grained pattern of PatternProcessor to solve 
the performance problem of the Pattern Matching, for example, a pattern 
corresponds to a group of users? The scenarios mentioned above need to be 
solved by case by case.

Best,
Nicholas Jiang

On 2021/12/17 11:43:10 yue ma wrote:
> Glad to see the Community's progress in Flink CEP. After reading this Flip,
> I have few questions, would you please take a look  ?
> 
> 1. About Pattern Updating. If we use PatternProcessoerDiscoverer to update
> the rules, will it increase the load of JM? For example, if the user wants
> the updated rule to take effect immediately,, which means that we need to
> set a shorter check interval  or there is another scenario when users
> rarely update the pattern, will the PatternProcessoerDiscoverer be in most
> of the time Do useless checks ? Will a lazy update mode could be used,
> which the pattern only be updated when triggered by the user, and do
> nothing at other times ?
> 
> 2.   I still have some confusion about how Key Generating Opertator and
> CepOperator (Pattern Matching & Processing Operator) work together. If
> there are N PatternProcessors, will the Key Generating Opertator generate N
> keyedStreams, and then N CepOperator would process each Key separately ? Or
> every CepOperator Task would process all patterns, if so, does the key type
> in each PatternProcessor need to be the same ?
> 
> 3. Maybe need to pay atten

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-20 Thread Nicholas Jiang

Hi Konstantin, Martijn

Thanks for the detailed feedback in the discussion. What I still have left to 
answer/reply to:

-- Martijn: Just to be sure, this indeed would mean that if for whatever reason 
the heartbeat timeout, it would crash the job, right?

IMO, if for whatever reason the heartbeat timeout, it couldn't check the 
PatternProcessor consistency between the OperatorCoordinator and the subtasks 
so that the job would be crashed.

-- Konstantin: What I was concerned about is that we basically let users run a 
UserFunction in the OperatorCoordinator, which it does not seem to have been 
designed for.

In general, we have reached an agreement on the design of this FLIP, but there 
are some concerns on the OperatorCoordinator, about whether basically let users 
run a UserFunction in the OperatorCoordinator is designed for 
OperatorCoordinator. We would like to invite Becket Qin who is the author of 
OperatorCoordinator to help us to answer this concern.

Best,
Nicholas Jiang


On 2021/12/20 10:07:14 Martijn Visser wrote:
> Hi all,
> 
> Really like the discussion on this topic moving forward. I really think
> this feature will be much appreciated by the Flink users. What I still have
> left to answer/reply to:
> 
> -- Good point. If for whatever reason the different taskmanagers can't get
> the latest rule, the Operator Coordinator could send a heartbeat to all
> taskmanagers with the latest rules and check the heartbeat response from
> all the taskmanagers whether the latest rules of the taskmanager is equal
> to these of the Operator Coordinator.
> 
> Just to be sure, this indeed would mean that if for whatever reason the
> heartbeat timeout, it would crash the job, right?
> 
> -- We have consided about the solution mentioned above. In this solution, I
> have some questions about how to guarantee the consistency of the rule
> between each TaskManager. By having a coodinator in the JobManager to
> centrally manage the latest rules, the latest rules of all TaskManagers are
> consistent with those of the JobManager, so as to avoid the inconsistencies
> that may be encountered in the above solution. Can you introduce how this
> solution guarantees the consistency of the rules?
> 
> The consistency that we could guarantee was based on how often each
> TaskManager would do a refresh and how often we would accept a refresh to
> fail. We set the refresh time to a relatively short one (30 seconds) and
> maximum failures to 3. That meant that we could guarantee that rules would
> be updated in < 2 minutes or else the job would crash. That was sufficient
> for our use cases. This also really depends on how big your cluster is. I
> can imagine that if you have a large scale cluster that you want to run,
> you don't want to DDOS the backend system where you have your rules stored.
> 
> -- In summary, the current design is that JobManager tells all TaskManagers
> the latest rules through OperatorCoodinator, and will initiate a heartbeat
> to check whether the latest rules on each TaskManager are consistent. We
> will describe how to deal with the Failover scenario in more detail on FLIP.
> 
> Thanks for that. I think having the JobManager tell the TaskManagers the
> applicable rules would indeed end up being the best design.
> 
> -- about the concerns around consistency raised by Martijn: I think a lot
> of those can be mitigated by using an event time timestamp from which the
> rules take effect. The reprocessing scenario, for example, is covered by
> this. If a pattern processor should become active as soon as possible,
> there will still be inconsistencies between Taskmanagers, but "as soon as
> possible" is vague anyway, which is why I think that's ok.
> 
> I think an event timestamp is indeed a really important one. We also used
> that in my previous role, with the ruleActivationTimestamp compared to
> eventtime (well, actually we used Kafka logAppend time because
> eventtime wasn't always properly set so we used that time to overwrite the
> eventtime from the event itself).
> 
> Best regards,
> 
> Martijn
> 
> On Mon, 20 Dec 2021 at 09:08, Konstantin Knauf  wrote:
> 
> > Hi Nicholas, Hi Junfeng,
> >
> > about the concerns around consistency raised by Martijn: I think a lot of
> > those can be mitigated by using an event time timestamp from which the
> > rules take effect. The reprocessing scenario, for example, is covered by
> > this. If a pattern processor should become active as soon as possible,
> > there will still be inconsistencies between Taskmanagers, but "as soon as
> > possible" is vague anyway, which is why I think that's ok.
> >
> > about naming: The naming with "PatternProcessor" sounds good

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-28 Thread Nicholas Jiang

Hi David,

Thanks for your feedback of the FLIP. I addressed the comments above and share 
the thoughts about the question mentioned:

*About how this will work with the OperatorCoordinator for re-processing of the 
historical data using the OperatorCoordinator?*

OperatorCoordinator will checkpoint the full amount of PatternProcessor data. 
For the reprocessing of historical data, you can read the PatternProcessor 
snapshots saved by this checkpoint from a certain historical checkpoint, and 
then recreate the historical data through these PatternProcessor snapshots.

About the side-input (custom source / operator + broadcast), Becket has given 
the explanation for the OperatorCoordinator V.S. side-input / broadcast stream. 
You could share your thoughts about this.

Best,
Nicholas Jiang

On 2021/12/21 08:24:53 David Morávek wrote:
> Hi Yunfeng,
> 
> thanks for drafting this FLIP, this will be a great addition into the CEP
> toolbox!
> 
> Apart from running user code in JM, which want to avoid in general, I'd
> have one more another concern about using the OperatorCoordinator and that
> is re-processing of the historical data. Any thoughts about how this will
> work with the OC?
> 
> I have a slight feeling that a side-input (custom source / operator +
> broadcast) would a better fit for this case. This would simplify the
> consistency concerns (watermarks + pushback) and the re-processing of
> historical data.
> 
> Best,
> D.
> 
> 
> On Tue, Dec 21, 2021 at 6:47 AM Nicholas Jiang 
> wrote:
> 
> > Hi Konstantin, Martijn
> >
> > Thanks for the detailed feedback in the discussion. What I still have left
> > to answer/reply to:
> >
> > -- Martijn: Just to be sure, this indeed would mean that if for whatever
> > reason the heartbeat timeout, it would crash the job, right?
> >
> > IMO, if for whatever reason the heartbeat timeout, it couldn't check the
> > PatternProcessor consistency between the OperatorCoordinator and the
> > subtasks so that the job would be crashed.
> >
> > -- Konstantin: What I was concerned about is that we basically let users
> > run a UserFunction in the OperatorCoordinator, which it does not seem to
> > have been designed for.
> >
> > In general, we have reached an agreement on the design of this FLIP, but
> > there are some concerns on the OperatorCoordinator, about whether basically
> > let users run a UserFunction in the OperatorCoordinator is designed for
> > OperatorCoordinator. We would like to invite Becket Qin who is the author
> > of OperatorCoordinator to help us to answer this concern.
> >
> > Best,
> > Nicholas Jiang
> >
> >
> > On 2021/12/20 10:07:14 Martijn Visser wrote:
> > > Hi all,
> > >
> > > Really like the discussion on this topic moving forward. I really think
> > > this feature will be much appreciated by the Flink users. What I still
> > have
> > > left to answer/reply to:
> > >
> > > -- Good point. If for whatever reason the different taskmanagers can't
> > get
> > > the latest rule, the Operator Coordinator could send a heartbeat to all
> > > taskmanagers with the latest rules and check the heartbeat response from
> > > all the taskmanagers whether the latest rules of the taskmanager is equal
> > > to these of the Operator Coordinator.
> > >
> > > Just to be sure, this indeed would mean that if for whatever reason the
> > > heartbeat timeout, it would crash the job, right?
> > >
> > > -- We have consided about the solution mentioned above. In this
> > solution, I
> > > have some questions about how to guarantee the consistency of the rule
> > > between each TaskManager. By having a coodinator in the JobManager to
> > > centrally manage the latest rules, the latest rules of all TaskManagers
> > are
> > > consistent with those of the JobManager, so as to avoid the
> > inconsistencies
> > > that may be encountered in the above solution. Can you introduce how this
> > > solution guarantees the consistency of the rules?
> > >
> > > The consistency that we could guarantee was based on how often each
> > > TaskManager would do a refresh and how often we would accept a refresh to
> > > fail. We set the refresh time to a relatively short one (30 seconds) and
> > > maximum failures to 3. That meant that we could guarantee that rules
> > would
> > > be updated in < 2 minutes or else the job would crash. That was
> > sufficient
> > > for our use cases. This also really depends on how big your cluster is. I
> > > can imagine

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-30 Thread Nicholas Jiang

Hi Konstantin, Becket, Martijn,

Thanks for sharing your feedback. What other concerns do you have about 
OperatorCoodinator? If an agreement is reached on OperatorCoodinator, I will 
start the voting thread.

Best,
Nicholas Jiang

On 2021/12/22 03:19:58 Becket Qin wrote:
> Hi Konstantin,
> 
> Thanks for sharing your thoughts. Please see the reply inline below.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> On Tue, Dec 21, 2021 at 7:14 PM Konstantin Knauf  wrote:
> 
> > Hi Becket, Hi Nicholas,
> >
> > Thanks for joining the discussion.
> >
> > 1 ) Personally, I would argue that we should only run user code in the
> > Jobmanager/Jobmaster if we can not avoid it. It seems wrong to me to
> > encourage users to e.g. run a webserver on the Jobmanager, or continuously
> > read patterns from a Kafka Topic on the Jobmanager, but both of these I see
> > happening with the current design. We've had lots of issues with
> > classloading leaks and other stability issues on the Jobmanager and making
> > this more complicated, if there is another way, seems unnecessary.
> 
> 
> I think the key question here is what primitive does Flink provide to
> facilitate the user implementation of their own control logic / control
> plane? It looks that previously, Flink assumes that all the user logic is
> just data processing logic without any control / coordination requirements.
> However, it turns out that a decent control plane abstraction is required
> in association with the data processing logic in many cases, including
> Source / Sink and other user defined operators in general. The fact that we
> ended up with adding the SplitEnumerator and GlobalCommitter are just two
> examples of the demand of such coordination among user defined logics.
> There are other cases that we see in ecosystem projects, such as
> deep-learning-on-flink[1]. Now we see this again in CEP.
> 
> Such control plane primitives are critical to the extensibility of a
> project. If we look at other projects, exposing such control plane logic is
> quite common. For example, Hadoop ended up with exposing YARN as a public
> API to the users, which is extremely popular. Kafka consumers exposed the
> consumer group rebalance logic to the users via ConsumerPartitionAssigner,
> which is also a control plane primitive.
> 
> To me it is more important to think about how we can improve the stability
> of such a control plane mechanism, instead of simply saying no to the users.
> 
> 
> 
> 
> > 2) In addition, I suspect that, over time we will have to implement all the
> > functionality that regular sources already provide around consistency
> > (watermarks, checkpointing) for the PatternProcessorCoordinator, too.
> 
> 
> I think OperatorCoordinator should have a generic communication mechanism
> for all the operators, not specific to Source. We should probably have an
> AbstractOperatorCoordinator help dealing with the communication layer, and
> leave the state maintenance and event handling logic to the user code.
> 
> 
> > 3) I understand that running on the Jobmanager is easier if you want to
> > launch a REST server directly. Here my question would be: does this really
> > need to be solved inside of Flink or couldn't you start a webserver next to
> > Flink? If we start using the Jobmanager as a REST server users will expect
> > that e.g. it is highly available and can be load balanced and we quickly
> > need to think about aspects that we never wanted to think about in the
> > context of a Flink Jobmanager.
> >
> 
> I think the REST API is just for receiving commands targeting a running
> Flink job. If the job fails, the REST API would be useless.
> 
> 
> > So, can you elaborate a bit more, why a side-input/broadcast stream is
> >
> > a) more difficult
> > b) has vague semantics (To me semantics of a stream-stream seem clearer
> > when it comes to out-of-orderness, late data, reprocessing or batch
> > execution mode.)
> 
> 
> I do agree that having the user defined control logic defined in the JM
> increases the chance of instability. In that case, we may think of other
> solutions and I am fully open to that. But the side-input / broadcast
> stream seems more like a bandaid instead of a carefully designed control
> plane mechanism.
> 
> A decent control plane requires two-way communication, so information can
> be reported / collected from the entity being controlled, and the
> coordinator / controller can send decisions or commands to the entities
> accordingly, just like our TM / JM communication. IIUC, this is not
> achievable with the existing side-input / broadcast stream as both of them
> a

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-03-08 Thread Nicholas Jiang

Hi Thomas,
Thanks for your detailed reply for the design of Hybrid Source. I would
reply to the questions you mentioned as follows:

1.Has there been related activity since the FLIP was created?
Yes, the FLIP has the initial version of Hybrid Source implementation. You'd
better refer to the repository: g...@github.com:wuchong/flink-hackathon.git .

2.The actual FLIP-27 source reader would need to signal to the
"HybridSourceReader" (HSR) that they are done and then the HSR would send
the switch event to the coordinator?
The communication in Hybrid Source should be between "HybridSourceReader"
and "HybridSplitEnumerator". After a "SourceReader" in "HybridSourceReader"
finishes reading the source split assigned from split enumerator and
switches to another "SourceReader",  "HybridSourceReader" send a signal to
"HybridSplitEnumerator" to notify that the "SourceReader" finishes reading.
Then "HybridSplitEnumerator" receives the "SourceReaderSwitchedEvent" signal
and switch to another "SplitEnumerator".

3.The actual split type that flows between enumerator and reader would be
"HybridSourceSplit" and it would wrap the specific split (in the example
either HDFS or Kafka)?
"HybridSourceSplit" wraps the source split types for the "SourceSplit"
created by the "SplitEnumerator" in "HybridSplitEnumerator".

4."HybridSplitEnumerator" still needs a way to extract them from the actual
enumerator. That could be done by the enumerator checkpoint state mapping
function looking at the current split assignments, which would not require
modification of existing enumerators?
IMO, the above way to extract them from the actual enumerator could be
defined in the state conversion function which is invoked in
"HybridSplitEnumerator" when switch one "SwitchableSplitEnumerator" to
another "SwitchableSplitEnumerator". Because the "SwitchableSplitEnumerator"
interface could return the end state of the actual enumerator.

I don't know if it has answered your confusion for Hybrid Source. If you
still have questions, keep discussion with me. Thanks for your attention.

Best,
Nicholas Jiang



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-03-08 Thread Nicholas Jiang

Hi Kezhu,

Thanks for your detailed points for the Hybrid Source. I follow your
opinions and make a corresponding explanation as follows:

1.Would the Hybrid Source be possible to use this feature to switch/chain
multiple homogeneous sources?

"HybridSource" supports to switch/chain multiple homogeneous sources, which
have the respective implementation for "SwitchableSource" and
"SwitchableSplitEnumerator". "HybridSource" doesn't limit whether the
Sources consisted is homogeneous. From the user's perspective, User only
adds the "SwitchableSource" into "HybridSource" and leaves the smooth
migration operation to "HybridSource".

2."setStartState" is actually a reposition operation for next source to
start in job runtime?

IMO, "setStartState" is used to determine the initial position of the new
source for smooth migration, not reposition operation. More importantly, the
"State" mentioned here refers to the start and end positions of reading
source.

3.This conversion should be implementation detail of next source, not
converter function in my opinion?

The state conversion is of course an implementation detail and included in
the switching mechanism, that should provide users with the conversion
interface for conversion, which is defined in converter function. What's
more, when users has already implemented "SwitchableSource" and added to the
Hybrid Source, the users don't need to implement the "SwitchableSource" for
the different conversion. From the user's perspective, users could define
the different converter functions and create the "SwitchableSource" for the
addition of "HybridSource", no need to implement a Source for the converter
function.

4.No configurable start-position. In this situation combination of above
three joints is a nop, and 
"HybridSource" is a chain of start-position pre-configured sources?

Indeed there is no configurable start-position, and this configuration could
be involved in the feature. Users could use
"SwitchableSplitEnumerator#setStartState" interface or the configuration
parameters to configure start-position. 

5.I am wonder whether end-position is a must and how it could be useful for
end users in a generic-enough source?

"getEndState" interface is used for the smooth migration scenario, which
could return null value if it is not needed. In the Hybrid Source mechanism,
this interface is required for the switching between the sources consisted,
otherwise there is no any way to get end-position of upstream source. In
summary, Hybrid Source needs to be able to set the start position and get
the end position of each Source, otherwise there is no use to build Hybrid
Source.

6.Is it possible for converter function to do blocking operations? How to
respond to checkpoint request when switching split enumerators cross
sources? Does end-position or start-position need to be stored in checkpoint
state or not?

The converter function only simply converts the state of upstream source to
the state of downstream source, not blocking operations. The way to respond
the checkpoint request when switching split enumerators cross sources is
send the corresponding "SourceEvent" to coordination. The end-position or
start-position don't need to be stored in checkpoint state, only implements
the "getEndState" interface for end-position.

Best,
Nicholas Jiang



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [DISCUSS] Flink-RocketMQ connector

2021-03-09 Thread Nicholas Jiang

Hi Heng Du,

Thanks for starting the discussion. 

>From the user's perspective, the integration between RocketMQ and Flink is
user-friendly. I agree with you the point that polish the existing
implementation and add new features, such as table connector
support. But before that, you need to provide RocketMQ's Source and Sink
(Legacy and FLIP-27 based), TableSource and TableSink. This integration
invokes the cooperation between Flink and RocketMQ community, therefore for
this community cooperation you could negotiate with Flink community
developers like PMC etc.

Best,
Nicholas Jiang



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [ANNOUNCE] New Apache Flink Committer - Rui Li

2021-04-21 Thread Nicholas Jiang

Congrats, Rui!

Best,
Nicholas Jiang



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-05-17 Thread Nicholas Jiang

Hi Thomas,

   Sorry for later reply for your POC. I have reviewed the based abstract
implementation of your pull request:
https://github.com/apache/flink/pull/15924. IMO, for the switching
mechanism, this level of abstraction is not concise enough, which doesn't
make connector contribution easier. In theory, it is necessary to introduce
a set of interfaces to support the switching mechanism. The SwitchableSource
and SwitchableSplitEnumerator interfaces are needed for connector
expansibility. 
   In other words, the whole switching process of above mentioned PR is
different from that mentioned in FLIP-150. In the above implementation, the
source reading switching is executed after receving the SwitchSourceEvent,
which could be before the sending SourceReaderFinishEvent. This timeline of
source reading switching could be discussed here.
   @Stephan @Becket, if you are available, please help to review the
abstract implementation, and compare with the interfaces mentioned in
FLIP-150.

Thanks,
Nicholas Jiang



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [ANNOUNCE] New PMC member: Xintong Song

2021-06-16 Thread Nicholas Jiang

Congratulations, Xintong!



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [ANNOUNCE] New PMC member: Arvid Heise

2021-06-16 Thread Nicholas Jiang

Congratulations, Arvid!



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [VOTE] Apache Flink Table Store 0.2.0, release candidate #2

2022-08-18 Thread Nicholas Jiang

Hi all!

+1 for the release (non-binding). I've verified the jar with SQL client and 
listed the check items as follows:

* Compiled the sources and built the source distribution - PASSED
* Ran through Quick Start Guide - PASSED
* Checked Spark 2.3.4&3.3.0 reader and catalog with table store jar - PASSED
* Checked all NOTICE files - PASSED

Regards,
Nicholas Jiang

On 2022/08/17 10:16:54 Jingsong Li wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #2 for the version 0.2.0 of
> Apache Flink Table Store, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> **Release Overview**
> 
> As an overview, the release consists of the following:
> a) Table Store canonical source distribution to be deployed to the
> release repository at dist.apache.org
> b) Table Store binary convenience releases to be deployed to the
> release repository at dist.apache.org
> c) Maven artifacts to be deployed to the Maven Central Repository
> 
> **Staging Areas to Review**
> 
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a) and b) can be found in the corresponding dev
> repository at dist.apache.org [2]
> * All artifacts for c) can be found at the Apache Nexus Repository [3]
> 
> All artifacts are signed with the key
> 2C2B6A653B07086B65E4369F7C76245E0A318150 [4]
> 
> Other links for your review:
> * JIRA release notes [5]
> * source code tag "release-0.2.0-rc2" [6]
> * PR to update the website Downloads page to include Table Store links [7]
> 
> **Vote Duration**
> 
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
> 
> Best,
> Jingsong Lee
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Table+Store+Release
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-table-store-0.2.0-rc2/
> [3] https://repository.apache.org/content/repositories/orgapacheflink-1523/
> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> [5] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351570
> [6] https://github.com/apache/flink-table-store/tree/release-0.2.0-rc2
> [7] https://github.com/apache/flink-web/pull/562
>

Re: [VOTE] Apache Flink Table Store 0.2.0, release candidate #3

2022-08-24 Thread Nicholas Jiang

Hi all!

+1 for the release (non-binding). I've verified the jar with SQL client and 
listed the check items as follows:

* Compiled the sources and built the source distribution - PASSED
* Ran through Quick Start Guide - PASSED
* Checked Spark 2.3.4&3.3.0 reader and catalog with table store jar - PASSED
* Checked all NOTICE files - PASSED
* Checked the website updates - PASSED

Regards,
Nicholas Jiang

On 2022/08/24 07:46:00 Jingsong Li wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #3 for the version 0.2.0 of
> Apache Flink Table Store, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> **Release Overview**
> 
> As an overview, the release consists of the following:
> a) Table Store canonical source distribution to be deployed to the
> release repository at dist.apache.org
> b) Table Store binary convenience releases to be deployed to the
> release repository at dist.apache.org
> c) Maven artifacts to be deployed to the Maven Central Repository
> 
> **Staging Areas to Review**
> 
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a) and b) can be found in the corresponding dev
> repository at dist.apache.org [2]
> * All artifacts for c) can be found at the Apache Nexus Repository [3]
> 
> All artifacts are signed with the key
> 2C2B6A653B07086B65E4369F7C76245E0A318150 [4]
> 
> Other links for your review:
> * JIRA release notes [5]
> * source code tag "release-0.2.0-rc3" [6]
> * PR to update the website Downloads page to include Table Store links [7]
> 
> **Vote Duration**
> 
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
> 
> Best,
> Jingsong Lee
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Table+Store+Release
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-table-store-0.2.0-rc3/
> [3] https://repository.apache.org/content/repositories/orgapacheflink-1526/
> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> [5] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351570
> [6] https://github.com/apache/flink-table-store/tree/release-0.2.0-rc3
> [7] https://github.com/apache/flink-web/pull/562
>

[jira] [Created] (FLINK-14956) MemoryMappedBoundedData Compressed Buffer Slicer

2019-11-26 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-14956:
--

 Summary: MemoryMappedBoundedData Compressed Buffer Slicer
 Key: FLINK-14956
 URL: https://issues.apache.org/jira/browse/FLINK-14956
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Reporter: Nicholas Jiang
 Attachments: Compress-Read.png, Compress-Write.png

MemoryMappedBoundedData, implementation of BoundedData simply through 
ByteBuffers backed by memory, uses compressed buffer slicer to improve I/O 
performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19646) StreamExecutionEnvironment support new Source interface based on FLIP-27

2020-10-14 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-19646:
--

 Summary: StreamExecutionEnvironment support new Source interface 
based on FLIP-27
 Key: FLINK-19646
 URL: https://issues.apache.org/jira/browse/FLINK-19646
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Nicholas Jiang


StreamExecutionEnvironment currently supports new Source interface based on 
FLIP-27 in Java, but doesn't support in Python. PyFlink 
StreamExecutionEnvironment should add methods related to new Source interface 
like from_source etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19984) Add TypeSerializerTestCoverageTest to check whether tests based on SerializerTestBase and TypeSerializerUpgradeTestBase

2020-11-04 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-19984:
--

 Summary: Add TypeSerializerTestCoverageTest to check whether tests 
based on SerializerTestBase and TypeSerializerUpgradeTestBase
 Key: FLINK-19984
 URL: https://issues.apache.org/jira/browse/FLINK-19984
 Project: Flink
  Issue Type: Improvement
Reporter: Nicholas Jiang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-31050) Supports IN predicate in ORC format of table store

2023-02-13 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-31050:
--

 Summary: Supports IN predicate in ORC format of table store
 Key: FLINK-31050
 URL: https://issues.apache.org/jira/browse/FLINK-31050
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.4.0
Reporter: Nicholas Jiang


Supports IN predicate push down in ORC format of table store.
h4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31076) Supports filter predicate in Parquet format of table store

2023-02-14 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-31076:
--

 Summary: Supports filter predicate in Parquet format of table store
 Key: FLINK-31076
 URL: https://issues.apache.org/jira/browse/FLINK-31076
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Nicholas Jiang
 Fix For: table-store-0.4.0


Parquet format is the main file format of table store, which doesn't support 
filter predicate. Filter predicate should also support in Parquet format, not 
only the ORC format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31190) Supports Spark call procedure command on Table Store

2023-02-22 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-31190:
--

 Summary: Supports Spark call procedure command on Table Store
 Key: FLINK-31190
 URL: https://issues.apache.org/jira/browse/FLINK-31190
 Project: Flink
  Issue Type: New Feature
  Components: Table Store
Affects Versions: table-store-0.4.0
Reporter: Nicholas Jiang
 Fix For: table-store-0.4.0


At present Hudi and Iceberg supports the Spark call procedure command to 
execute the table service action etc. Flink Table Store could also support 
Spark call procedure command to run compaction etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31194) ntroduces savepoint mechanism of Table Store

2023-02-22 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-31194:
--

 Summary: ntroduces savepoint mechanism of Table Store
 Key: FLINK-31194
 URL: https://issues.apache.org/jira/browse/FLINK-31194
 Project: Flink
  Issue Type: New Feature
  Components: Table Store
Affects Versions: table-store-0.4.0
Reporter: Nicholas Jiang
 Fix For: table-store-0.4.0


Disaster Recovery is very much mission critical for any software. Especially 
when it comes to data systems, the impact could be very serious leading to 
delay in business decisions or even wrong business decisions at times. Flink 
Table Store could introduce savepoint mechanism to assist users in recovering 
data from a previous state.

As the name suggest, "savepoint" saves the table as of the snapshot, so that it 
lets you restore the table to this savepoint at a later point in snapshot if 
need be. Care is taken to ensure cleaner will not clean up any files that are 
savepointed. On similar lines, savepoint cannot be triggered on a snapshot that 
is already cleaned up. In simpler terms, this is synonymous to taking a backup, 
just that we don't make a new copy of the table, but just save the state of the 
table elegantly so that we can restore it later when in need.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31232) Parquet format supports MULTISET type for Table Store

2023-02-26 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-31232:
--

 Summary: Parquet format supports MULTISET type for Table Store
 Key: FLINK-31232
 URL: https://issues.apache.org/jira/browse/FLINK-31232
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.4.0
Reporter: Nicholas Jiang


Parquet format should support MULTISET type for Table Store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31250) ParquetSchemaConverter supports MULTISET type for parquet format

2023-02-27 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-31250:
--

 Summary: ParquetSchemaConverter supports MULTISET type for parquet 
format
 Key: FLINK-31250
 URL: https://issues.apache.org/jira/browse/FLINK-31250
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.18.0
Reporter: Nicholas Jiang
 Fix For: 1.18.0


ParquetSchemaConverter supports ARRAY, MAP and ROW type, doesn't support 
MULTISET type. ParquetSchemaConverter should support MULTISET type for parquet 
format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-27097) Document custom validator implementations

2022-04-06 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-27097:
--

 Summary: Document custom validator implementations
 Key: FLINK-27097
 URL: https://issues.apache.org/jira/browse/FLINK-27097
 Project: Flink
  Issue Type: Sub-task
Reporter: Nicholas Jiang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27392) CEP Pattern supports definition of the maximum time gap between events

2022-04-25 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-27392:
--

 Summary: CEP Pattern supports definition of the maximum time gap 
between events
 Key: FLINK-27392
 URL: https://issues.apache.org/jira/browse/FLINK-27392
 Project: Flink
  Issue Type: Improvement
  Components: Library / CEP
Affects Versions: 1.16.0
Reporter: Nicholas Jiang
 Fix For: 1.16.0


At present, Pattern#withIn defines the maximum time interval in which a 
matching pattern has to be completed in order to be considered valid. The 
interval corresponds to the maximum time gap between first and the last event. 
The maximum time gap between events is needed in the certain scenario, for 
example, purchases a good within a maximum of 5 minutes after browsing.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27401) CEP supports AggregateFunction in IterativeCondition

2022-04-25 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-27401:
--

 Summary: CEP supports AggregateFunction in IterativeCondition
 Key: FLINK-27401
 URL: https://issues.apache.org/jira/browse/FLINK-27401
 Project: Flink
  Issue Type: Improvement
  Components: Library / CEP
Affects Versions: 1.16.0
Reporter: Nicholas Jiang
 Fix For: 1.16.0


IterativeCondition only exposes the filter interface. For the aggregation 
operation in the condition, since the condition may be called multiple times in 
the NFA process, an event may cause the custom aggregation state in the 
condition to be updated multiple times, for example, filters goods with a total 
of more than 1000 orders in the past 10 minutes. AggregateFunction is 
introduced in IterativeCondition to initialize and maintain aggregation state.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-21045) Support 'load module' and 'unload module' SQL syntax

2021-01-19 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-21045:
--

 Summary: Support 'load module' and 'unload module' SQL syntax
 Key: FLINK-21045
 URL: https://issues.apache.org/jira/browse/FLINK-21045
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.13.0
    Reporter: Nicholas Jiang


At present, Flink SQL doesn't support the 'load module' and 'unload module' SQL 
syntax. It's necessary for uses in the situation that users load and unload 
user-defined module through table api or sql client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21849) 'SHOW MODULES' tests for CliClientITCase lack the default module test case

2021-03-17 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-21849:
--

 Summary: 'SHOW MODULES' tests for CliClientITCase lack the default 
module test case
 Key: FLINK-21849
 URL: https://issues.apache.org/jira/browse/FLINK-21849
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Affects Versions: 1.13.0
Reporter: Nicholas Jiang
 Fix For: 1.13.0


Currently 'SHOW MODULES' command tests for CliClientITCase lack the default 
module test case, which default module is core module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22793) HybridSource Table Implementation

2021-05-27 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-22793:
--

 Summary: HybridSource Table Implementation
 Key: FLINK-22793
 URL: https://issues.apache.org/jira/browse/FLINK-22793
 Project: Flink
  Issue Type: Sub-task
Reporter: Nicholas Jiang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-24336) PyFlink

2021-09-18 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-24336:
--

 Summary: PyFlink
 Key: FLINK-24336
 URL: https://issues.apache.org/jira/browse/FLINK-24336
 Project: Flink
  Issue Type: Bug
Reporter: Nicholas Jiang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-28560) Support Spark 3.3 profile for SparkSource

2022-07-14 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-28560:
--

 Summary: Support Spark 3.3 profile for SparkSource
 Key: FLINK-28560
 URL: https://issues.apache.org/jira/browse/FLINK-28560
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.2.0
Reporter: Nicholas Jiang
 Fix For: table-store-0.2.0


flink-table-store-spark module support Spark 3.0~3.2 profile, which has 
published the Spark 3.3.0 version. Spark 3.3 profile can be introduced for 
SparkSource to follow the release version of Spark 3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28578) Upgrade Spark version of flink-table-store-spark to 3.1.3, 3.2.2 or 3.3.0 or later

2022-07-17 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-28578:
--

 Summary: Upgrade Spark version of flink-table-store-spark to 
3.1.3, 3.2.2 or 3.3.0 or later
 Key: FLINK-28578
 URL: https://issues.apache.org/jira/browse/FLINK-28578
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.2.0
Reporter: Nicholas Jiang
 Fix For: table-store-0.2.0


CVE-2022-33891: Apache Spark shell command injection vulnerability via Spark 
UI. Upgrade to supported Apache Spark maintenance release 3.1.3, 3.2.2 or 3.3.0 
or later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28705) Update copyright year to 2014-2022 in NOTICE files

2022-07-27 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-28705:
--

 Summary: Update copyright year to 2014-2022 in NOTICE files
 Key: FLINK-28705
 URL: https://issues.apache.org/jira/browse/FLINK-28705
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Nicholas Jiang
 Fix For: table-store-0.3.0


Copyright year of the NOTICE file in Flink Table Store should be '2014-2022'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28840) Introduce Roadmap document of Flink Table Store

2022-08-05 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-28840:
--

 Summary: Introduce Roadmap document of Flink Table Store
 Key: FLINK-28840
 URL: https://issues.apache.org/jira/browse/FLINK-28840
 Project: Flink
  Issue Type: Improvement
Reporter: Nicholas Jiang
 Fix For: table-store-0.3.0


The Flink Table Store subproject needs its own roadmap document to present an 
overview of the general direction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28903) flink-table-store-hive-catalog could not shade hive-shims-0.23

2022-08-10 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-28903:
--

 Summary: flink-table-store-hive-catalog could not shade 
hive-shims-0.23
 Key: FLINK-28903
 URL: https://issues.apache.org/jira/browse/FLINK-28903
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Nicholas Jiang
 Fix For: table-store-0.3.0


flink-table-store-hive-catalog could not shade hive-shims-0.23 because 
artifactSet doesn't include hive-shims-0.23 and the minimizeJar is set to true. 
The exception is as follows:
{code:java}
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.flink.table.store.shaded.org.apache.hadoop.hive.metastore.HiveMetaStoreClient
    at 
org.apache.flink.table.store.shaded.org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1708)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.shaded.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.shaded.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.shaded.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.shaded.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:97)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.hive.HiveCatalog.createClient(HiveCatalog.java:380)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.hive.HiveCatalog.(HiveCatalog.java:80) 
~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.hive.HiveCatalogFactory.create(HiveCatalogFactory.java:51)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.file.catalog.CatalogFactory.createCatalog(CatalogFactory.java:93)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.connector.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:62)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.connector.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:57)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.connector.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:31)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.factories.FactoryUtil.createCatalog(FactoryUtil.java:428)
 ~[flink-table-api-java-uber-1.15.1.jar:1.15.1]
    at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.createCatalog(TableEnvironmentImpl.java:1356)
 ~[flink-table-api-java-uber-1.15.1.jar:1.15.1]
    at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:)
 ~[flink-table-api-java-uber-1.15.1.jar:1.15.1]
    at 
org.apache.flink.table.client.gateway.local.LocalExecutor.lambda$executeOperation$3(LocalExecutor.java:209)
 ~[flink-sql-client-1.15.1.jar:1.15.1]
    at 
org.apache.flink.table.client.gateway.context.ExecutionContext.wrapClassLoader(ExecutionContext.java:88)
 ~[flink-sql-client-1.15.1.jar:1.15.1]
    at 
org.apache.flink.table.client.gateway.local.LocalExecutor.executeOperation(LocalExecutor.java:209)
 ~[flink-sql-client-1.15.1.jar:1.15.1]
    ... 10 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
~[?:1.8.0_181]
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[?:1.8.0_181]
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_181]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[?:1.8.0_181]
    at 
org.apache.flink.table.store.shaded.org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1706)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.shaded.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.shaded.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
 ~[flink-table-store-dist-0.3-SNAPSHOT.jar:0.3-SNAPSHOT]
    at 
org.apache.flink.table.store.shaded.org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.get

[jira] [Created] (FLINK-28930) Add "What is Flink Table Store?" link to flink website

2022-08-11 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-28930:
--

 Summary: Add "What is Flink Table Store?" link to flink website
 Key: FLINK-28930
 URL: https://issues.apache.org/jira/browse/FLINK-28930
 Project: Flink
  Issue Type: Improvement
  Components: Project Website, Table Store
Reporter: Nicholas Jiang


Similar to statefun and ml projects we should also add a "What is Flink Table 
Store?" link to the menu pointing to the doc site. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-29756) Support materialized column to improve query performance for complex types

2022-10-25 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-29756:
--

 Summary: Support materialized column to improve query performance 
for complex types
 Key: FLINK-29756
 URL: https://issues.apache.org/jira/browse/FLINK-29756
 Project: Flink
  Issue Type: New Feature
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Nicholas Jiang
 Fix For: table-store-0.3.0


In the world of data warehouse, it is very common to use one or more columns 
from a complex type such as a map, or to put many subfields into it. These 
operations can greatly affect query performance because:
 # These operations are very wasteful IO. For example, if we have a field type 
of Map, which contains dozens of subfields, we need to read the entire column 
when reading this column. And Spark will traverse the entire map to get the 
value of the target key.
 # Cannot take advantage of vectorized reads when reading nested type columns.
 # Filter pushdown cannot be used when reading nested columns.

It is necessary to introduce the materialized column feature in Flink Table 
Store, which transparently solves the above problems of arbitrary columnar 
storage (not just Parquet).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-17610) Align the behavior of result of internal map state to return empty iterator

2020-05-11 Thread Nicholas Jiang (Jira)

Nicholas Jiang created FLINK-17610:
--

 Summary: Align the behavior of result of internal map state to 
return empty iterator
 Key: FLINK-17610
 URL: https://issues.apache.org/jira/browse/FLINK-17610
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Affects Versions: 1.11.0
Reporter: Nicholas Jiang


There are different behaviors of result of internal map state. For 
HeapMapState, #entries(), #keys(), #values(), and #iterator() would all return 
null. However, for RocksDBMapState, #entries() would return null, while 
#keys(), #values(), #iterator() would return empty iterator. UserFacingMapState 
would align these behaviors to empty iterator for users, therefore this should 
better align these internal behavior to return empty iterator instead of null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

70 matches

Mail list logo