date:20240825

Re: [PR] [FLINK-35444][pipeline-connector][paimon] Paimon Pipeline Connector support changing column names to lowercase for Hive metastore [flink-cdc]

2024-08-25 Thread via GitHub


kyzheng196 commented on code in PR #3569:
URL: https://github.com/apache/flink-cdc/pull/3569#discussion_r1730277217


##
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/schema/Schema.java:
##
@@ -396,6 +397,9 @@ private void checkColumn(String columnName, DataType type) {
  * @param columnNames columns that form a unique primary key
  */
 public Builder primaryKey(String... columnNames) {
+for (int i = 0; i < columnNames.length; i++) {
+columnNames[i] = columnNames[i].toLowerCase();

Review Comment:
   Hi @lvyanquan, thank you for your kind suggestions.
   I've modified PaimonMetaDataApplier and PaimonMetadataApplierTest 
accordingly, and keep common class unchanged as before. All test cases have 
passed, hope this may meet your expectations, many thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-36150) tables.exclude is not valid if scan.binlog.newly-added-table.enabled is true.

2024-08-25 Thread Hongshun Wang (Jira)

Hongshun Wang created FLINK-36150:
-

 Summary: tables.exclude is not valid if 
scan.binlog.newly-added-table.enabled is true.
 Key: FLINK-36150
 URL: https://issues.apache.org/jira/browse/FLINK-36150
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Affects Versions: cdc-3.1.1
Reporter: Hongshun Wang
 Fix For: cdc-3.2.0


Current, `tables.exclude` is provided for user to exclude some table, because
 tables passed to DataSource will be filtered when MySqlDataSourceFactory 
creates DataSource.
However, when scan.binlog.newly-added-table.enabled is true, new table ddl from 
binlog will be read and won't be filtered by `tables.exclude`.
 
This Jira aims to pass `tables.exclude` to MySqlSourceConfig and will use it 
when find tables from mysql database.
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-36150) tables.exclude is not valid if scan.binlog.newly-added-table.enabled is true.

2024-08-25 Thread Hongshun Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-36150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876487#comment-17876487
 ] 

Hongshun Wang commented on FLINK-36150:
---

I'd like to do it.

> tables.exclude is not valid if scan.binlog.newly-added-table.enabled is true.
> -
>
> Key: FLINK-36150
> URL: https://issues.apache.org/jira/browse/FLINK-36150
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Affects Versions: cdc-3.1.1
>Reporter: Hongshun Wang
>Priority: Blocker
> Fix For: cdc-3.2.0
>
>
> Current, `tables.exclude` is provided for user to exclude some table, because
>  tables passed to DataSource will be filtered when MySqlDataSourceFactory 
> creates DataSource.
> However, when scan.binlog.newly-added-table.enabled is true, new table ddl 
> from binlog will be read and won't be filtered by `tables.exclude`.
>  
> This Jira aims to pass `tables.exclude` to MySqlSourceConfig and will use it 
> when find tables from mysql database.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36150) tables.exclude is not valid if scan.binlog.newly-added-table.enabled is true.

2024-08-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-36150:
---
Labels: pull-request-available  (was: )

> tables.exclude is not valid if scan.binlog.newly-added-table.enabled is true.
> -
>
> Key: FLINK-36150
> URL: https://issues.apache.org/jira/browse/FLINK-36150
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Affects Versions: cdc-3.1.1
>Reporter: Hongshun Wang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: cdc-3.2.0
>
>
> Current, `tables.exclude` is provided for user to exclude some table, because
>  tables passed to DataSource will be filtered when MySqlDataSourceFactory 
> creates DataSource.
> However, when scan.binlog.newly-added-table.enabled is true, new table ddl 
> from binlog will be read and won't be filtered by `tables.exclude`.
>  
> This Jira aims to pass `tables.exclude` to MySqlSourceConfig and will use it 
> when find tables from mysql database.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [FLINK-36128] Promote LENIENT as the default schema change behavior [flink-cdc]

2024-08-25 Thread via GitHub


yuxiqian opened a new pull request, #3574:
URL: https://github.com/apache/flink-cdc/pull/3574

   This closes FLINK-36128.
   
   Currently, default schema evolution mode "EVOLVE" could not handle 
exceptions, and might not be able to restore from existing state correctly 
after failover. Before we can "trigger checkpoints on demand" which is not 
possible prior to in Flink 1.19, making "LENIENT" a default option might be 
more suitable.
   
   This PR also contains a hotfix patch for PrePartitionOperator broadcasting 
failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-36128) Promote LENIENT mode as the default schema evolution behavior

2024-08-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-36128:
---
Labels: pull-request-available  (was: )

> Promote LENIENT mode as the default schema evolution behavior
> -
>
> Key: FLINK-36128
> URL: https://issues.apache.org/jira/browse/FLINK-36128
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Reporter: yux
>Priority: Blocker
>  Labels: pull-request-available
>
> Currently, default schema evolution mode "EVOLVE" could not handle 
> exceptions, and might not be able to restore from existing state correctly 
> after failover. Before we can "manually trigger checkpoint" that was 
> introduced in Flink 1.19, making "LENIENT" a default option might be more 
> suitable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-36128] Promote LENIENT as the default schema change behavior [flink-cdc]

2024-08-25 Thread via GitHub


leonardBang commented on code in PR #3574:
URL: https://github.com/apache/flink-cdc/pull/3574#discussion_r1730301097


##
flink-cdc-cli/src/main/java/org/apache/flink/cdc/cli/parser/YamlPipelineDefinitionParser.java:
##
@@ -172,6 +183,23 @@ private SinkDef toSinkDef(JsonNode sinkNode) {
 Optional.ofNullable(sinkNode.get(EXCLUDE_SCHEMA_EVOLUTION_TYPES))
 .ifPresent(e -> e.forEach(tag -> 
excludedSETypes.add(tag.asText(;
 
+if (includedSETypes.isEmpty()) {
+// If no schema evolution types are specified, include all schema 
evolution types by
+// default.
+Arrays.stream(SchemaChangeEventTypeFamily.ALL)
+.map(SchemaChangeEventType::getTag)
+.forEach(includedSETypes::add);
+}
+
+if (excludedSETypes.isEmpty()
+&& SchemaChangeBehavior.LENIENT.equals(schemaChangeBehavior)) {
+// In lenient mode, we exclude DROP_TABLE and TRUNCATE_TABLE by 
default. This could be
+// overridden by manually specifying excluded types.

Review Comment:
   minor: we should add schema behavior  document ASAP



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-36128] Promote LENIENT as the default schema change behavior [flink-cdc]

2024-08-25 Thread via GitHub


yuxiqian commented on code in PR #3574:
URL: https://github.com/apache/flink-cdc/pull/3574#discussion_r1730302627


##
flink-cdc-cli/src/main/java/org/apache/flink/cdc/cli/parser/YamlPipelineDefinitionParser.java:
##
@@ -172,6 +183,23 @@ private SinkDef toSinkDef(JsonNode sinkNode) {
 Optional.ofNullable(sinkNode.get(EXCLUDE_SCHEMA_EVOLUTION_TYPES))
 .ifPresent(e -> e.forEach(tag -> 
excludedSETypes.add(tag.asText(;
 
+if (includedSETypes.isEmpty()) {
+// If no schema evolution types are specified, include all schema 
evolution types by
+// default.
+Arrays.stream(SchemaChangeEventTypeFamily.ALL)
+.map(SchemaChangeEventType::getTag)
+.forEach(includedSETypes::add);
+}
+
+if (excludedSETypes.isEmpty()
+&& SchemaChangeBehavior.LENIENT.equals(schemaChangeBehavior)) {
+// In lenient mode, we exclude DROP_TABLE and TRUNCATE_TABLE by 
default. This could be
+// overridden by manually specifying excluded types.

Review Comment:
   Tracked with FLINK-36151.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-36151) Add documentations for Schema Evolution related options

2024-08-25 Thread yux (Jira)

yux created FLINK-36151:
---

 Summary: Add documentations for Schema Evolution related options
 Key: FLINK-36151
 URL: https://issues.apache.org/jira/browse/FLINK-36151
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: yux


CDC Documentations should be updated to reflect recent changes of schema change 
features, like new TRY_EVOLVE and LENIENT mode, `include.schema.change` and 
`exclude.schema.change` options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-36145) Change JobSpec.flinkStateSnapshotReference to snapshotReference

2024-08-25 Thread Mate Czagany (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-36145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876505#comment-17876505
 ] 

Mate Czagany commented on FLINK-36145:
--

I think snapshotReference would work well.

> Change JobSpec.flinkStateSnapshotReference to snapshotReference
> ---
>
> Key: FLINK-36145
> URL: https://issues.apache.org/jira/browse/FLINK-36145
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Blocker
> Fix For: kubernetes-operator-1.10.0
>
>
> To avoid redundant / verbose naming we should change this field name in the 
> spec before it's released:
> JobSpec.flinkStateSnapshotReference -> JobSpec.snapshotReference or 
> JobSpec.stateSnapshotReference



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [FLINK-36145][snapshot] Rename flinkStateSnapshotReference to snapshotReference [flink-kubernetes-operator]

2024-08-25 Thread via GitHub


mateczagany opened a new pull request, #870:
URL: https://github.com/apache/flink-kubernetes-operator/pull/870

   ## What is the purpose of the change
   
   Rename JobSpec.flinkStateSnapshotReference to JobSpec.snapshotReference. 
This should make less verbose and easier to memorize for users. 
   
   flinkStateSnapshotReference field has not been released officially yet, so 
this change should not affect production users.
   
   ## Brief change log
   
   - Rename in Java, Markdown files and examples
   - Regenerate CRDs
   
   ## Verifying this change
   
   - Unit tests
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changes to the `CustomResourceDescriptors`: 
yes
 - Core observer or reconciler logic that is regularly executed: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-36145) Change JobSpec.flinkStateSnapshotReference to snapshotReference

2024-08-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-36145:
---
Labels: pull-request-available  (was: )

> Change JobSpec.flinkStateSnapshotReference to snapshotReference
> ---
>
> Key: FLINK-36145
> URL: https://issues.apache.org/jira/browse/FLINK-36145
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.10.0
>
>
> To avoid redundant / verbose naming we should change this field name in the 
> spec before it's released:
> JobSpec.flinkStateSnapshotReference -> JobSpec.snapshotReference or 
> JobSpec.stateSnapshotReference



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-35177] Fix DataGen Connector documentation [flink]

2024-08-25 Thread via GitHub


morozov commented on PR #24692:
URL: https://github.com/apache/flink/pull/24692#issuecomment-2308895535

   @GOODBOY008 done. FWIW, you can reference code blocks right on GitHub. This 
way, then are better readable and could be navigated to:
   
   
https://github.com/apache/flink/blob/4faf0966766e3734792f80ed66e512aa3033cacd/flink-connectors/flink-connector-datagen/src/main/java/org/apache/flink/connector/datagen/source/DataGeneratorSource.java#L79-L86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-36145) Change JobSpec.flinkStateSnapshotReference to snapshotReference

2024-08-25 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-36145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876532#comment-17876532
 ] 

Gyula Fora commented on FLINK-36145:


[~mateczagany] let's hold off on this for a little bit, I would like to ask 1-2 
people just to get it right. 
On a second thought, snapshotReference feels a little off.

Previously we had initialSavepointPath which was pretty straightforward.

Maybe initialStateReference is more descriptive. Or stateReference if we want 
shorter but that may be a bit misleading given the behaviour.

[~thw] [~rmetzger] any thoughts?

> Change JobSpec.flinkStateSnapshotReference to snapshotReference
> ---
>
> Key: FLINK-36145
> URL: https://issues.apache.org/jira/browse/FLINK-36145
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.10.0
>
>
> To avoid redundant / verbose naming we should change this field name in the 
> spec before it's released:
> JobSpec.flinkStateSnapshotReference -> JobSpec.snapshotReference or 
> JobSpec.stateSnapshotReference



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-36145) Change JobSpec.flinkStateSnapshotReference to snapshotReference

2024-08-25 Thread Thomas Weise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-36145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876534#comment-17876534
 ] 

Thomas Weise commented on FLINK-36145:
--

`snapshotReference` sounds a bit misleading, `initialStateReference` is better. 
I think the "initial" part is actually important so it is clear that the 
reference only applies till next snapshot. 

> Change JobSpec.flinkStateSnapshotReference to snapshotReference
> ---
>
> Key: FLINK-36145
> URL: https://issues.apache.org/jira/browse/FLINK-36145
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.10.0
>
>
> To avoid redundant / verbose naming we should change this field name in the 
> spec before it's released:
> JobSpec.flinkStateSnapshotReference -> JobSpec.snapshotReference or 
> JobSpec.stateSnapshotReference



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-34555][table] Migrate JoinConditionTypeCoerceRule to java. [flink]

2024-08-25 Thread via GitHub


snuyanzin commented on code in PR #24420:
URL: https://github.com/apache/flink/pull/24420#discussion_r1730414834


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/JoinConditionTypeCoerceRule.java:
##
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.logical;
+
+import org.apache.flink.table.api.TableException;
+import org.apache.flink.table.planner.plan.utils.FlinkRexUtil;
+
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.SqlTypeUtil;
+import org.apache.calcite.tools.RelBuilder;
+import org.immutables.value.Value;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Planner rule that coerces the both sides of EQUALS(`=`) operator in Join 
condition to the same
+ * type while sans nullability.
+ *
+ * For most cases, we already did the type coercion during type validation 
by implicit type
+ * coercion or during sqlNode to relNode conversion, this rule just does a 
rechecking to ensure a
+ * strongly uniform equals type, so that during a HashJoin shuffle we can have 
the same hashcode of
+ * the same value.
+ */
+@Value.Enclosing
+public class JoinConditionTypeCoerceRule
+extends 
RelRule {
+
+public static final JoinConditionTypeCoerceRule INSTANCE =
+
JoinConditionTypeCoerceRule.JoinConditionTypeCoerceRuleConfig.DEFAULT.toRule();
+
+private JoinConditionTypeCoerceRule(JoinConditionTypeCoerceRuleConfig 
config) {
+super(config);
+}
+
+@Override
+public boolean matches(RelOptRuleCall call) {
+Join join = call.rel(0);
+if (join.getCondition().isAlwaysTrue()) {
+return false;
+}
+RelDataTypeFactory typeFactory = call.builder().getTypeFactory();
+return hasEqualsRefsOfDifferentTypes(typeFactory, join.getCondition());
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+Join join = call.rel(0);
+RelBuilder builder = call.builder();
+RexBuilder rexBuilder = builder.getRexBuilder();
+RelDataTypeFactory typeFactory = builder.getTypeFactory();
+
+List joinFilters = 
RelOptUtil.conjunctions(join.getCondition());
+List newJoinFilters =
+joinFilters.stream()
+.map(
+filter -> {
+if (filter instanceof RexCall) {
+RexCall c = (RexCall) filter;
+if 
(c.getKind().equals(SqlKind.EQUALS)) {

Review Comment:
   ```suggestion
   if (c.getKind() == SqlKind.EQUALS) {
   ```
   it's better to follow same approach in code
   e.g. below it is compared with `==` which is preferred for  enums 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-34505][table] Migrate WindowGroupReorderRule to java. [flink]

2024-08-25 Thread via GitHub


snuyanzin commented on PR #24375:
URL: https://github.com/apache/flink/pull/24375#issuecomment-2308961960

   still failing on ci


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-35177] Fix DataGen Connector documentation [flink]

2024-08-25 Thread via GitHub


snuyanzin commented on PR #24692:
URL: https://github.com/apache/flink/pull/24692#issuecomment-2308984427

   if we are talking about examples
   there is already existing one in examples module which is quite close to the 
one from description
   
https://github.com/apache/flink/blob/4faf0966766e3734792f80ed66e512aa3033cacd/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/datagen/DataGenerator.java#L36-L43
   
   how about having the same approach both in docs and in this example?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-34467] bump flink version to 1.20.0 [flink-connector-kafka]

2024-08-25 Thread via GitHub


HuangZhenQiu commented on PR #111:
URL: 
https://github.com/apache/flink-connector-kafka/pull/111#issuecomment-2309026627

   @AHeise 
   Thanks for the reply. I totally understand the pain points of maintain 
multiple flink version compatibility for a connector. In each Flink release, 
there are always some new experimental interfaces in api or runtime introduced. 
Shall we consider the solution from Apache Hudi or Apache Iceberg? Both of them 
use a separate module for different flink versions. Some classes are replicated 
into different modules as needed. 
   https://github.com/apache/hudi/tree/master/hudi-flink-datasource
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-36152) Traverse through the superclass hierarchy to extract generic type.

2024-08-25 Thread Xinglong Wang (Jira)

Xinglong Wang created FLINK-36152:
-

 Summary: Traverse through the superclass hierarchy to extract 
generic type.
 Key: FLINK-36152
 URL: https://issues.apache.org/jira/browse/FLINK-36152
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.16.1, 2.0.0
Reporter: Xinglong Wang


In our case, there's a `ConcreteLookupFunction extends AbstractLookupFunction`, 
and `AbstractLookupFunction extends TableFunction`. 

However `Class#getGenericSuperclass` only return the direct superclass, so it 
cannot extract the correct generic type `RowData`.

I can reproduce the exception below:
{code:java}
// 
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/runtime/stream/sql/FunctionITCase.java

@Test
void testLookupTableFunctionWithoutHintLevel2()
throws ExecutionException, InterruptedException {

testLookupTableFunctionBase(LookupTableWithoutHintLevel2Function.class.getName());
}

// ... ...

public static class LookupTableWithoutHintLevel2Function
extends LookupTableWithoutHintLevel1Function {}{code}
{code:java}
org.apache.flink.table.api.ValidationException: Cannot extract a data type from 
an internal 'org.apache.flink.table.data.RowData' class without further 
information. Please use annotations to define the full logical type.
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:424)
    at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:419)
    at 
org.apache.flink.table.types.extraction.DataTypeExtractor.checkForCommonErrors(DataTypeExtractor.java:425)
    at 
org.apache.flink.table.types.extraction.DataTypeExtractor.extractDataTypeOrError(DataTypeExtractor.java:330)
    at 
org.apache.flink.table.types.extraction.DataTypeExtractor.extractDataTypeOrRawWithTemplate(DataTypeExtractor.java:290)
    ... 53 more
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [FLINK-36152] Fix incorrect type extraction in case of cascaded inheritance when DataTypeHint is not configured [flink]

2024-08-25 Thread via GitHub


littleeleventhwolf opened a new pull request, #25251:
URL: https://github.com/apache/flink/pull/25251


   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follow [the 
conventions for tests defined in our code quality 
guide](https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#7-testing).
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-36152) Traverse through the superclass hierarchy to extract generic type.

2024-08-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-36152:
---
Labels: pull-request-available  (was: )

> Traverse through the superclass hierarchy to extract generic type.
> --
>
> Key: FLINK-36152
> URL: https://issues.apache.org/jira/browse/FLINK-36152
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 2.0.0, 1.16.1
>Reporter: Xinglong Wang
>Priority: Major
>  Labels: pull-request-available
>
> In our case, there's a `ConcreteLookupFunction extends 
> AbstractLookupFunction`, and `AbstractLookupFunction extends 
> TableFunction`. 
> However `Class#getGenericSuperclass` only return the direct superclass, so it 
> cannot extract the correct generic type `RowData`.
> I can reproduce the exception below:
> {code:java}
> // 
> flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/runtime/stream/sql/FunctionITCase.java
> @Test
> void testLookupTableFunctionWithoutHintLevel2()
> throws ExecutionException, InterruptedException {
> 
> testLookupTableFunctionBase(LookupTableWithoutHintLevel2Function.class.getName());
> }
> // ... ...
> public static class LookupTableWithoutHintLevel2Function
> extends LookupTableWithoutHintLevel1Function {}{code}
> {code:java}
> org.apache.flink.table.api.ValidationException: Cannot extract a data type 
> from an internal 'org.apache.flink.table.data.RowData' class without further 
> information. Please use annotations to define the full logical type.
>     at 
> org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:424)
>     at 
> org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:419)
>     at 
> org.apache.flink.table.types.extraction.DataTypeExtractor.checkForCommonErrors(DataTypeExtractor.java:425)
>     at 
> org.apache.flink.table.types.extraction.DataTypeExtractor.extractDataTypeOrError(DataTypeExtractor.java:330)
>     at 
> org.apache.flink.table.types.extraction.DataTypeExtractor.extractDataTypeOrRawWithTemplate(DataTypeExtractor.java:290)
>     ... 53 more
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-36152] Fix incorrect type extraction in case of cascaded inheritance when DataTypeHint is not configured [flink]

2024-08-25 Thread via GitHub


flinkbot commented on PR #25251:
URL: https://github.com/apache/flink/pull/25251#issuecomment-2309047657

   
   ## CI report:
   
   * 3f1b92e8d262a3a601624ade2b17bf126f553117 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-35589][cdc-common] Support MemorySize type in FlinkCDC ConfigOptions. [flink-cdc]

2024-08-25 Thread via GitHub


github-actions[bot] commented on PR #3437:
URL: https://github.com/apache/flink-cdc/pull/3437#issuecomment-2309056140

   This pull request has been automatically marked as stale because it has not 
had recent activity for 60 days. It will be closed in 30 days if no further 
activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-35589) Support MemorySize type in FlinkCDC ConfigOptions

2024-08-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35589:
---
Labels: pull-request-available  (was: )

> Support MemorySize type in FlinkCDC ConfigOptions 
> --
>
> Key: FLINK-35589
> URL: https://issues.apache.org/jira/browse/FLINK-35589
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Affects Versions: cdc-3.2.0
>Reporter: LvYanquan
>Priority: Not a Priority
>  Labels: pull-request-available
> Fix For: cdc-3.2.0
>
>
> This allow user to set MemorySize config type like Flink.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-34467] bump flink version to 1.20.0 [flink-connector-kafka]

2024-08-25 Thread via GitHub


HuangZhenQiu commented on PR #111:
URL: 
https://github.com/apache/flink-connector-kafka/pull/111#issuecomment-2309083950

   I think we can drop support for flink 1.17 and flink 1.18 first in this PR 
https://github.com/apache/flink-connector-kafka/pull/102.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [FLINK-36151] Add schema evolution related docs [flink-cdc]

2024-08-25 Thread via GitHub


yuxiqian opened a new pull request, #3575:
URL: https://github.com/apache/flink-cdc/pull/3575

   This closes FLINK-36151 by adding missing schema evolution related concepts 
into Flink CDC docs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-36151) Add documentations for Schema Evolution related options

2024-08-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-36151:
---
Labels: pull-request-available  (was: )

> Add documentations for Schema Evolution related options
> ---
>
> Key: FLINK-36151
> URL: https://issues.apache.org/jira/browse/FLINK-36151
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: yux
>Priority: Major
>  Labels: pull-request-available
>
> CDC Documentations should be updated to reflect recent changes of schema 
> change features, like new TRY_EVOLVE and LENIENT mode, 
> `include.schema.change` and `exclude.schema.change` options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36153) MySQL fails to handle schema change events In Timestamp or Earliest Offset startup mode

2024-08-25 Thread yux (Jira)

yux created FLINK-36153:
---

 Summary: MySQL fails to handle schema change events In Timestamp 
or Earliest Offset startup mode
 Key: FLINK-36153
 URL: https://issues.apache.org/jira/browse/FLINK-36153
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: yux


Currently, if MySQL source is trying to startup fro a binlog position where 
there are schema changes within range, job will fail due to non-replayable 
schema change events.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-36153) MySQL fails to handle schema change events In Timestamp or Earliest Offset startup mode

2024-08-25 Thread yux (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-36153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876559#comment-17876559
 ] 

yux commented on FLINK-36153:
-

[~Leonard] Please assign this to me.

> MySQL fails to handle schema change events In Timestamp or Earliest Offset 
> startup mode
> ---
>
> Key: FLINK-36153
> URL: https://issues.apache.org/jira/browse/FLINK-36153
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: yux
>Priority: Major
>
> Currently, if MySQL source is trying to startup fro a binlog position where 
> there are schema changes within range, job will fail due to non-replayable 
> schema change events.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36153) MySQL fails to handle schema change events In Timestamp or Earliest Offset startup mode

2024-08-25 Thread yux (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yux updated FLINK-36153:

Issue Type: Bug  (was: Improvement)

> MySQL fails to handle schema change events In Timestamp or Earliest Offset 
> startup mode
> ---
>
> Key: FLINK-36153
> URL: https://issues.apache.org/jira/browse/FLINK-36153
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Reporter: yux
>Priority: Major
>
> Currently, if MySQL source is trying to startup fro a binlog position where 
> there are schema changes within range, job will fail due to non-replayable 
> schema change events.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-36142][table] Remove TestTableSourceWithTime and its subclasses to prepare removing TableEnvironmentInternal#registerTableSourceInternal [flink]

2024-08-25 Thread via GitHub


xuyangzhong commented on PR #25243:
URL: https://github.com/apache/flink/pull/25243#issuecomment-2309129941

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-35177] Fix DataGen Connector documentation [flink]

2024-08-25 Thread via GitHub


morozov commented on PR #24692:
URL: https://github.com/apache/flink/pull/24692#issuecomment-2309145899

   > how about having the same approach both in docs and in this example?
   
   @snuyanzin what exactly do you propose?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-36131) FlinkSQL upgraded from 1.13.1 to 1.18.1 metrics does not display data

2024-08-25 Thread mumu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-36131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876564#comment-17876564
 ] 

mumu commented on FLINK-36131:
--

[~hackergin]  I am using Flink-sql-connector-kafka-3.0.2. jar.

> FlinkSQL upgraded from 1.13.1 to 1.18.1 metrics does not display data
> -
>
> Key: FLINK-36131
> URL: https://issues.apache.org/jira/browse/FLINK-36131
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.18.1
>Reporter: mumu
>Priority: Major
> Attachments: image-2024-08-22-17-33-21-336.png, 
> image-2024-08-23-16-42-27-450.png, 截屏2024-08-24 14.01.51.png
>
>
> FlinkSQL upgraded from 1.13.1 to 1.18.1 metrics does not display data.
> 0.Source___call[1].KafkaSourceReader.topic.xxxll_agg.partition.1.currentOffset
> I can confirm that there was data available before the upgrade.
> !image-2024-08-22-17-33-21-336.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-36100) Support ESCAPE in built-in function LIKE formally

2024-08-25 Thread lincoln lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee reassigned FLINK-36100:
---

Assignee: Dylan He

> Support ESCAPE in built-in function LIKE formally
> -
>
> Key: FLINK-36100
> URL: https://issues.apache.org/jira/browse/FLINK-36100
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: Dylan He
>Assignee: Dylan He
>Priority: Major
>  Labels: pull-request-available
>
> Flink does not formally support ESCAPE in built-in function LIKE, but in some 
> cases we do need it because '_' and '%' are interpreted as special wildcard 
> characters, preventing their use in their literal sense.
> And currently, if we forcefully use ESCAPE characters, we will get unexpected 
> results like the cases below.
> {code:SQL}
> > SELECT 'TE_ST' LIKE '%E\_S%';
>  FALSE
> > SELECT 'TE_ST' LIKE '%E\_S%' ESCAPE '\';
>  ERROR
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-36141) [mysql]Set the scanNewlyAddedTableEnabled is true, but does not support automatic capture new added table

2024-08-25 Thread zhongyuandong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-36141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876218#comment-17876218
 ] 

zhongyuandong edited comment on FLINK-36141 at 8/26/24 2:28 AM:


[~kunni] Is flink-cdc 3.3.0 only supported?   We are using 
flink-cdc-source-connectors 3.0.1, Switching to flinks-CDC-pipelin-connectors 
was a pretty big adjustment for us, how can I use this feature? When will it be 
used?


was (Author: JIRAUSER306741):
[~kunni] Is flink-cdc 3.4.0 only supported?   We are using 
flink-cdc-source-connectors 3.0.1, how can I use this feature? When will it be 
used?

> [mysql]Set the scanNewlyAddedTableEnabled is true, but does not support 
> automatic capture new added table
> -
>
> Key: FLINK-36141
> URL: https://issues.apache.org/jira/browse/FLINK-36141
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: zhongyuandong
>Priority: Critical
> Attachments: image-2024-08-23-11-07-26-384.png, 
> image-2024-08-23-11-07-38-182.png, image-2024-08-23-11-08-31-324.png, 
> image-2024-08-23-11-08-44-156.png
>
>
> 1、 see the flink-cdc3.0.1 debezium reader stage, specially added 
> scanNewlyAddedTableEnabled is true filtered out, so lead to be inaccessible 
> to the new table without restart? Based on 3.0.1 version design, testing the 
> scanNewlyAddedTableEnabled = false (lost manually add table, restart 
> operations cannot identify to), but it can automatically capture new added 
> table with regular expressions without restarting them, is this a bug or 
> purposely so design? (Why is it designed this way?) 
> 2、 flink-cdc2.3.0 without this logic, can support the capture new added 
> table. TODO: there is still very little chance that we can't capture new 
> added table. Is this the reason why flink-cdc2.4.0 and later versions no 
> longer support dynamic capture new added table?
> 3. Which version solves this problem? How to solve it?
> !image-2024-08-23-11-08-31-324.png!
> !image-2024-08-23-11-08-44-156.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-36153) MySQL fails to handle schema change events In Timestamp or Earliest Offset startup mode

2024-08-25 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-36153:
--

Assignee: yux

> MySQL fails to handle schema change events In Timestamp or Earliest Offset 
> startup mode
> ---
>
> Key: FLINK-36153
> URL: https://issues.apache.org/jira/browse/FLINK-36153
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Reporter: yux
>Assignee: yux
>Priority: Major
>
> Currently, if MySQL source is trying to startup fro a binlog position where 
> there are schema changes within range, job will fail due to non-replayable 
> schema change events.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-36005) Specify to get part of the potgresql field

2024-08-25 Thread Jira


[ 
https://issues.apache.org/jira/browse/FLINK-36005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876569#comment-17876569
 ] 

宇宙先生 commented on FLINK-36005:
--

who  focus on the issu？ thanks，I think it is a valueable iuuse. Thanks.

>  Specify to get part of the potgresql field
> ---
>
> Key: FLINK-36005
> URL: https://issues.apache.org/jira/browse/FLINK-36005
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Affects Versions: cdc-3.1.1
>Reporter: 宇宙先生
>Priority: Critical
> Attachments: image-2024-08-08-14-39-10-566.png
>
>
> when I use finkCDC to ingesting data from postgresql,and I only can obtain 
> some fileds not all table fileds,because of privileges. There are some fields 
> that I can't have permission to see，some I can have.  But it encounters error 
> that privileges errors. when I debug the flikcdc program ,I found that the 
> initialization phase requires permissions for the whole table, and I guess 
> this is an area that can be optimized, please consider it.
> !image-2024-08-08-14-39-10-566.png!
> many thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-36100][table] Support ESCAPE in built-in function LIKE [flink]

2024-08-25 Thread via GitHub


lincoln-lil commented on code in PR #25225:
URL: https://github.com/apache/flink/pull/25225#discussion_r1730585595


##
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/expressions/ScalarFunctionsTest.scala:
##
@@ -387,55 +387,91 @@ class ScalarFunctionsTest extends ScalarTypesTestBase {
 
   @Test
   def testLike(): Unit = {
+// true
 testAllApis('f0.like("Th_s%"), "f0 LIKE 'Th_s%'", "TRUE")
-
 testAllApis('f0.like("%is a%"), "f0 LIKE '%is a%'", "TRUE")
-
-testSqlApi("'abcxxxdef' LIKE 'abcx%'", "TRUE")
-testSqlApi("'abcxxxdef' LIKE '%%def'", "TRUE")
-testSqlApi("'abcxxxdef' LIKE 'abcxxxdef'", "TRUE")
-testSqlApi("'abcxxxdef' LIKE '%xdef'", "TRUE")
-testSqlApi("'abcxxxdef' LIKE 'abc%def%'", "TRUE")
-testSqlApi("'abcxxxdef' LIKE '%abc%def'", "TRUE")
-testSqlApi("'abcxxxdef' LIKE '%abc%def%'", "TRUE")
-testSqlApi("'abcxxxdef' LIKE 'abc%def'", "TRUE")
+testAllApis("abcxxxdef".like("abcx%"), "'abcxxxdef' LIKE 'abcx%'", "TRUE")
+testAllApis("abcxxxdef".like("%%def"), "'abcxxxdef' LIKE '%%def'", "TRUE")
+testAllApis("abcxxxdef".like("abcxxxdef"), "'abcxxxdef' LIKE 'abcxxxdef'", 
"TRUE")
+testAllApis("abcxxxdef".like("%xdef"), "'abcxxxdef' LIKE '%xdef'", "TRUE")
+testAllApis("abcxxxdef".like("abc%def%"), "'abcxxxdef' LIKE 'abc%def%'", 
"TRUE")
+testAllApis("abcxxxdef".like("%abc%def"), "'abcxxxdef' LIKE '%abc%def'", 
"TRUE")
+testAllApis("abcxxxdef".like("%abc%def%"), "'abcxxxdef' LIKE '%abc%def%'", 
"TRUE")
+testAllApis("abcxxxdef".like("abc%def"), "'abcxxxdef' LIKE 'abc%def'", 
"TRUE")
 
 // false
-testSqlApi("'abcxxxdef' LIKE 'abdxxxdef'", "FALSE")
-testSqlApi("'abcxxxdef' LIKE '%xqef'", "FALSE")
-testSqlApi("'abcxxxdef' LIKE 'abc%qef%'", "FALSE")
-testSqlApi("'abcxxxdef' LIKE '%abc%qef'", "FALSE")
-testSqlApi("'abcxxxdef' LIKE '%abc%qef%'", "FALSE")
-testSqlApi("'abcxxxdef' LIKE 'abc%qef'", "FALSE")
+testAllApis("abcxxxdef".like("abdxxxdef"), "'abcxxxdef' LIKE 'abdxxxdef'", 
"FALSE")
+testAllApis("abcxxxdef".like("%xqef"), "'abcxxxdef' LIKE '%xqef'", "FALSE")
+testAllApis("abcxxxdef".like("abc%qef%"), "'abcxxxdef' LIKE 'abc%qef%'", 
"FALSE")
+testAllApis("abcxxxdef".like("%abc%qef"), "'abcxxxdef' LIKE '%abc%qef'", 
"FALSE")
+testAllApis("abcxxxdef".like("%abc%qef%"), "'abcxxxdef' LIKE '%abc%qef%'", 
"FALSE")
+testAllApis("abcxxxdef".like("abc%qef"), "'abcxxxdef' LIKE 'abc%qef'", 
"FALSE")
+
+// reported in FLINK-36100
+testAllApis("TE_ST".like("%E_S%"), "'TE_ST' LIKE '%E_S%'", "TRUE")
+testAllApis("TE-ST".like("%E_S%"), "'TE-ST' LIKE '%E_S%'", "TRUE")
+testAllApis("TE_ST".like("%E\\_S%"), "'TE_ST' LIKE '%E\\_S%'", "TRUE")
+testAllApis("TE-ST".like("%E\\_S%"), "'TE-ST' LIKE '%E\\_S%'", "FALSE")
   }
 
   @Test
   def testNotLike(): Unit = {
 testAllApis(!'f0.like("Th_s%"), "f0 NOT LIKE 'Th_s%'", "FALSE")
-
 testAllApis(!'f0.like("%is a%"), "f0 NOT LIKE '%is a%'", "FALSE")
+
+// reported in FLINK-36100
+testSqlApi("'TE_ST' NOT LIKE '%E_S%'", "FALSE")
+testSqlApi("'TE-ST' NOT LIKE '%E_S%'", "FALSE")
+testSqlApi("'TE_ST' NOT LIKE '%E\\_S%'", "FALSE")
+testSqlApi("'TE-ST' NOT LIKE '%E\\_S%'", "TRUE")
   }
 
   @Test
   def testLikeWithEscape(): Unit = {

Review Comment:
   Add cases with invalid escape chars



##
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/functions/SqlLikeUtils.java:
##
@@ -108,15 +108,18 @@ static String sqlToRegexLike(String sqlPattern, char 
escapeChar) {
 final StringBuilder javaPattern = new StringBuilder(len + len);
 for (i = 0; i < len; i++) {
 char c = sqlPattern.charAt(i);
-if (JAVA_REGEX_SPECIALS.indexOf(c) >= 0) {
-javaPattern.append('\\');
-}
 if (c == escapeChar) {
 if (i == (sqlPattern.length() - 1)) {
 throw invalidEscapeSequence(sqlPattern, i);
 }
 char nextChar = sqlPattern.charAt(i + 1);
-if ((nextChar == '_') || (nextChar == '%') || (nextChar == 
escapeChar)) {
+if ((nextChar == '_') || (nextChar == '%')) {

Review Comment:
   We'd better also add some tests into `FlinkSqlLikeUtilsTest` to cover these 
changes.



##
docs/data/sql_functions.yml:
##
@@ -49,10 +49,10 @@ comparison:
   - sql: value1 NOT BETWEEN [ ASYMMETRIC | SYMMETRIC ] value2 AND value3
 description: By default (or with the ASYMMETRIC keyword), returns TRUE if 
value1 is less than value2 or greater than value3. With the SYMMETRIC keyword, 
returns TRUE if value1 is not inclusively between value2 and value3. When 
either value2 or value3 is NULL, returns TRUE or UNKNOWN. E.g., 12 NOT BETWEEN 
15 AND 12 returns TRUE; 12 NOT BETWEEN SYMMETRIC 15 AND 12 returns FALSE; 12 
NOT BETWEEN NULL AND 15 returns UNKNOWN; 12 NOT BETWEEN 15 AND NULL returns 
TRUE; 12 NO

Re: [PR] [FLINK-36100][table] Support ESCAPE in built-in function LIKE [flink]

2024-08-25 Thread via GitHub


lincoln-lil commented on PR #25225:
URL: https://github.com/apache/flink/pull/25225#issuecomment-2309221648

   Also verified the result with escaping characters, all results with 
supported patterns are consistent with mysql:
   
![image](https://github.com/user-attachments/assets/51919cb7-fb44-4ada-a450-4baea85b7ff5)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-26943][table] Add the built-in function DATE_ADD [flink]

2024-08-25 Thread via GitHub


dylanhz commented on code in PR #24988:
URL: https://github.com/apache/flink/pull/24988#discussion_r1730611766


##
docs/data/sql_functions.yml:
##
@@ -653,6 +653,16 @@ temporal:
 CURRENT_WATERMARK(ts) IS NULL
 OR ts > CURRENT_WATERMARK(ts)
   ```
+  - sql: DATE_ADD(startDate, numDays)
+table: startDate.dateAdd(numDays)
+description: |
+  Returns the date numDays after startDate.
+  If numDays is negative, -numDays are subtracted from startDate.
+  
+  `startDate , numDays `
+  
+  Returns a `DATE`, `NULL` if any of the arguments are `NULL` or result 
overflows or date string invalid.

Review Comment:
   It seems both of them are acceptable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [FLINK-26944][table] Add the built-in function ADD_MONTHS [flink]

2024-08-25 Thread via GitHub


dylanhz opened a new pull request, #25252:
URL: https://github.com/apache/flink/pull/25252

   ## What is the purpose of the change
   
   Add the built-in function ADD_MONTHS.
   Examples:
   
   ```SQL
   > SELECT ADD_MONTHS('2016-08-31', 1);
2016-09-30
   > SELECT ADD_MONTHS('2016-08-31', -6);
2016-02-29 
   ```
   
   ## Brief change log
   
   [FLINK-26944](https://issues.apache.org/jira/browse/FLINK-26944)
   
   
   ## Verifying this change
   
   `TimeFunctionsITCase#addMonthsTestCases`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? yes
 - If yes, how is the feature documented? docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [FLINK-35579] update frocksdb version to v8.10.0 [flink]

2024-08-25 Thread via GitHub


mayuehappy opened a new pull request, #25253:
URL: https://github.com/apache/flink/pull/25253

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follow [the 
conventions for tests defined in our code quality 
guide](https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#7-testing).
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-35579) Update the FrocksDB version in FLINK

2024-08-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35579:
---
Labels: pull-request-available  (was: )

> Update the FrocksDB version in FLINK
> 
>
> Key: FLINK-35579
> URL: https://issues.apache.org/jira/browse/FLINK-35579
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / State Backends
>Affects Versions: 2.0.0
>Reporter: Yue Ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-26944][table] Add the built-in function ADD_MONTHS [flink]

2024-08-25 Thread via GitHub


flinkbot commented on PR #25252:
URL: https://github.com/apache/flink/pull/25252#issuecomment-2309230928

   
   ## CI report:
   
   * 47ce8c16deeb3c373c47d6c99edb8dce50723759 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-36039][autoscaler] Support clean historical event handler records in JDBC event handler [flink-kubernetes-operator]

2024-08-25 Thread via GitHub


1996fanrui merged PR #865:
URL: https://github.com/apache/flink-kubernetes-operator/pull/865


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Resolved] (FLINK-36039) Support clean historical event handler records in JDBC event handler

2024-08-25 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan resolved FLINK-36039.
-
Fix Version/s: kubernetes-operator-1.10.0
   Resolution: Fixed

Merged to main(1.10.0) via: 6a426b2ff60331b89d67371279f400f8761bf1f3

> Support clean historical event handler records in JDBC event handler
> 
>
> Key: FLINK-36039
> URL: https://issues.apache.org/jira/browse/FLINK-36039
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler
>Reporter: RocMarshal
>Assignee: RocMarshal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.10.0
>
>
> Currently, the autoscaler generates a large amount of historical data for 
> event handlers. As the system runs for a long time, the volume of historical 
> data will continue to grow. It is necessary to support automatic cleanup of 
> data within a fixed period.
> Based on the creation time timestamp, the following approach for cleaning up 
> historical data might be a way:
>  * Introduce the parameter {{autoscaler.standalone.jdbc-event-handler.ttl}}
>  * 
>  ** Type: Duration
>  ** Default value: 90 days
>  ** Setting it to 0 means disabling the cleanup functionality.
>  * In the {{JdbcAutoScalerEventHandler}} constructor, introduce a scheduled 
> job. Also, add an internal interface method {{close}} for 
> {{AutoScalerEventHandler & JobAutoScaler}}  to stop and clean up related 
> logic.
>  * Cleanup logic:
>  # 
>  ## Query the messages with {{create_time}} less than {{(currentTime - ttl)}} 
> and find the maximum {{maxId}} in this collection.
>  ## Delete 4096 messages at a time from the collection with IDs less than 
> {{{}maxId{}}}.
>  ## Wait 10 ms between each deletion until the cleanup is complete.
>  ## Scan and delete expired data daily 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-35579] update frocksdb version to v8.10.0 [flink]

2024-08-25 Thread via GitHub


flinkbot commented on PR #25253:
URL: https://github.com/apache/flink/pull/25253#issuecomment-2309236263

   
   ## CI report:
   
   * df3cd8e4998d6289622c426e94c52023e15b75a8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-36117] Implement AsyncKeyedStateBackend for RocksDBKeyedStateBackend and HeapKeyedStateBackend [flink]

2024-08-25 Thread via GitHub


Zakelly commented on code in PR #25233:
URL: https://github.com/apache/flink/pull/25233#discussion_r1730658801


##
flink-runtime/src/main/java/org/apache/flink/runtime/state/v2/ValueStateWrapper.java:
##
@@ -0,0 +1,87 @@
+//
+// Source code recreated from a .class file by IntelliJ IDEA
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.state.v2;
+
+import org.apache.flink.api.common.state.v2.StateFuture;
+import org.apache.flink.api.common.state.v2.ValueState;
+import org.apache.flink.core.state.StateFutureImpl;
+
+import java.io.IOException;
+
+public class ValueStateWrapper implements ValueState {

Review Comment:
   How about moving this under package `o.a.f.r.s.v2.adaptor`?



##
flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java:
##
@@ -475,6 +482,32 @@ public LocalRecoveryConfig getLocalRecoveryConfig() {
 return localRecoveryConfig;
 }
 
+@Override
+public void setup(@Nonnull StateRequestHandler stateRequestHandler) {}
+
+@Nonnull
+@Override
+public  S 
createState(
+@Nonnull N defaultNamespace,
+@Nonnull TypeSerializer namespaceSerializer,
+@Nonnull org.apache.flink.runtime.state.v2.StateDescriptor 
stateDesc)
+throws Exception {
+StateDescriptorTransformer transformer = new 
StateDescriptorTransformer();
+StateDescriptor stateDescV1 = 
transformer.getStateDescriptor(stateDesc);
+State state = getOrCreateKeyedState(namespaceSerializer, stateDescV1);
+if (stateDescV1.getType() == StateDescriptor.Type.VALUE) {
+return (S) new ValueStateWrapper((ValueState) state);
+}
+throw new UnsupportedOperationException(
+String.format("Unsupported state type: %s", 
stateDesc.getType()));
+}
+
+@Nonnull
+@Override
+public StateExecutor createStateExecutor() {
+return null;
+}
+

Review Comment:
   I suggest move this part to a wrapper/adaptor converting a 
`KeyedStateBackend` to `AsyncKeyedStateBackend`. The adaptor should exist in 
`v2.adaptor` package. And we could wrap `KeyedStateBackend` instance in 
`StreamTaskStateInitializerImpl#streamOperatorStateContext`, WDYT?



##
flink-runtime/src/main/java/org/apache/flink/runtime/state/v2/StateDescriptorTransformer.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.state.v2;
+
+public class StateDescriptorTransformer {
+public org.apache.flink.api.common.state.StateDescriptor 
getStateDescriptor(

Review Comment:
   I'd suggest a util class and static util function for this.



##
flink-runtime/src/main/java/org/apache/flink/runtime/state/v2/ValueStateWrapper.java:
##
@@ -0,0 +1,87 @@
+//
+// Source code recreated from a .class file by IntelliJ IDEA
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable

[PR] [FLINK-35850][table] Add the built-in function DATEDIFF [flink]

2024-08-25 Thread via GitHub


dylanhz opened a new pull request, #25254:
URL: https://github.com/apache/flink/pull/25254

   ## What is the purpose of the change
   
   Add the built-in function DATEDIFF.
   Examples:
   
   ```SQL
   > SELECT DATEDIFF('2009-07-31', '2009-07-30');
1
   > SELECT DATEDIFF('2009-07-30', '2009-07-31');
-1
   ```
   
   ## Brief change log
   
   [FLINK-35850](https://issues.apache.org/jira/browse/FLINK-35850)
   
   
   ## Verifying this change
   
   `TimeFunctionsITCase#datediffTestCases`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? yes
 - If yes, how is the feature documented? docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-35850) Add DATEDIFF function

2024-08-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35850:
---
Labels: pull-request-available  (was: )

> Add DATEDIFF function
> -
>
> Key: FLINK-35850
> URL: https://issues.apache.org/jira/browse/FLINK-35850
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Dylan He
>Assignee: Dylan He
>Priority: Major
>  Labels: pull-request-available
>
> Add DATEDIFF function as the same in Hive & Spark & MySQL, this function is 
> similar to DATE_ADD and DATE_SUB, whose time interval unit is fixed to day 
> compared with TIMESTAMPDIFF.
> 
> Returns the number of days from {{startDate}} to {{{}endDate{}}}, the time 
> parts of the values are omitted.
> Syntax:
> {code:sql}
> DATEDIFF(endDate, startDate)
> {code}
> Arguments:
>  * {{{}endDate{}}}: A DATE expression.
>  * {{{}startDate{}}}: A DATE expression.
> Returns:
> An INTEGER.
> See also:
>  * 
> [Hive|https://cwiki.apache.org/confluence/display/Hive/Hive+UDFs#HiveUDFs-DateFunctions]
>  * 
> [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions]
>  * 
> [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/datediff.html]
>  * 
> [MySQL|https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_datediff]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-35850][table] Add the built-in function DATEDIFF [flink]

2024-08-25 Thread via GitHub


flinkbot commented on PR #25254:
URL: https://github.com/apache/flink/pull/25254#issuecomment-2309366807

   
   ## CI report:
   
   * 833d174116d4cf9cff4caa763214abfffa109ee5 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-35859) [flink-cdc] Fix: The assigner is not ready to offer finished split information, this should not be called

2024-08-25 Thread Hongshun Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876584#comment-17876584
 ] 

Hongshun Wang commented on FLINK-35859:
---

[~pacinogong] the docs of AssignerStatus has already showed thay:  only 
INITIAL_ASSIGNING_FINISHED and 
NEWLY_ADDED_ASSIGNING_FINISHED will be transformed to NEWLY_ADDED_ASSIGNING.
If status is NEWLY_ADDED_ASSIGNING_SNAPSHOT_FINISHED and continue , an 
exception will be thrown in this case(sometime maybe more complex)

> [flink-cdc] Fix: The assigner is not ready to offer finished split 
> information, this should not be called
> -
>
> Key: FLINK-35859
> URL: https://issues.apache.org/jira/browse/FLINK-35859
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Affects Versions: cdc-3.1.1
>Reporter: Hongshun Wang
>Assignee: Hongshun Wang
>Priority: Minor
> Fix For: cdc-3.2.0
>
>
> When use CDC with newly added table,  an error occurs: 
> {code:java}
> The assigner is not ready to offer finished split information, this should 
> not be called. {code}
> It's because:
> 1. when stop then restart the job , the status is 
> NEWLY_ADDED_ASSIGNING_SNAPSHOT_FINISHED.
>  
> 2. Then Enumerator will send each reader with 
> BinlogSplitUpdateRequestEvent to update binlog. (see 
> org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator#syncWithReaders).
> 3. The Reader will suspend binlog reader then send 
> BinlogSplitMetaRequestEvent to Enumerator.
> 4. The Enumerator found that some tables are not sent, an error will occur
> {code:java}
> private void sendBinlogMeta(int subTask, BinlogSplitMetaRequestEvent 
> requestEvent) {
> // initialize once
> if (binlogSplitMeta == null) {
> final List finishedSnapshotSplitInfos =
> splitAssigner.getFinishedSplitInfos();
> if (finishedSnapshotSplitInfos.isEmpty()) {
> LOG.error(
> "The assigner offers empty finished split information, 
> this should not happen");
> throw new FlinkRuntimeException(
> "The assigner offers empty finished split information, 
> this should not happen");
> }
> binlogSplitMeta =
> Lists.partition(
> finishedSnapshotSplitInfos, 
> sourceConfig.getSplitMetaGroupSize());
>} 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-35859) [flink-cdc] Fix: The assigner is not ready to offer finished split information, this should not be called

2024-08-25 Thread Hongshun Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876584#comment-17876584
 ] 

Hongshun Wang edited comment on FLINK-35859 at 8/26/24 5:45 AM:


[~pacinogong] . the docs of AssignerStatus has already showed that:  only 
INITIAL_ASSIGNING_FINISHED and 
NEWLY_ADDED_ASSIGNING_FINISHED will be transformed to NEWLY_ADDED_ASSIGNING.
If status is NEWLY_ADDED_ASSIGNING_SNAPSHOT_FINISHED and continue to read new 
table , an exception will be thrown in this case(sometime maybe more complex)


was (Author: JIRAUSER298968):
[~pacinogong] . the docs of AssignerStatus has already showed that:  only 
INITIAL_ASSIGNING_FINISHED and 
NEWLY_ADDED_ASSIGNING_FINISHED will be transformed to NEWLY_ADDED_ASSIGNING.
If status is NEWLY_ADDED_ASSIGNING_SNAPSHOT_FINISHED and continue , an 
exception will be thrown in this case(sometime maybe more complex)

> [flink-cdc] Fix: The assigner is not ready to offer finished split 
> information, this should not be called
> -
>
> Key: FLINK-35859
> URL: https://issues.apache.org/jira/browse/FLINK-35859
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Affects Versions: cdc-3.1.1
>Reporter: Hongshun Wang
>Assignee: Hongshun Wang
>Priority: Minor
> Fix For: cdc-3.2.0
>
>
> When use CDC with newly added table,  an error occurs: 
> {code:java}
> The assigner is not ready to offer finished split information, this should 
> not be called. {code}
> It's because:
> 1. when stop then restart the job , the status is 
> NEWLY_ADDED_ASSIGNING_SNAPSHOT_FINISHED.
>  
> 2. Then Enumerator will send each reader with 
> BinlogSplitUpdateRequestEvent to update binlog. (see 
> org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator#syncWithReaders).
> 3. The Reader will suspend binlog reader then send 
> BinlogSplitMetaRequestEvent to Enumerator.
> 4. The Enumerator found that some tables are not sent, an error will occur
> {code:java}
> private void sendBinlogMeta(int subTask, BinlogSplitMetaRequestEvent 
> requestEvent) {
> // initialize once
> if (binlogSplitMeta == null) {
> final List finishedSnapshotSplitInfos =
> splitAssigner.getFinishedSplitInfos();
> if (finishedSnapshotSplitInfos.isEmpty()) {
> LOG.error(
> "The assigner offers empty finished split information, 
> this should not happen");
> throw new FlinkRuntimeException(
> "The assigner offers empty finished split information, 
> this should not happen");
> }
> binlogSplitMeta =
> Lists.partition(
> finishedSnapshotSplitInfos, 
> sourceConfig.getSplitMetaGroupSize());
>} 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-35859) [flink-cdc] Fix: The assigner is not ready to offer finished split information, this should not be called

2024-08-25 Thread Hongshun Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876584#comment-17876584
 ] 

Hongshun Wang edited comment on FLINK-35859 at 8/26/24 5:45 AM:


[~pacinogong] . the docs of AssignerStatus has already showed that:  only 
INITIAL_ASSIGNING_FINISHED and 
NEWLY_ADDED_ASSIGNING_FINISHED will be transformed to NEWLY_ADDED_ASSIGNING.
If status is NEWLY_ADDED_ASSIGNING_SNAPSHOT_FINISHED and continue , an 
exception will be thrown in this case(sometime maybe more complex)


was (Author: JIRAUSER298968):
[~pacinogong] the docs of AssignerStatus has already showed thay:  only 
INITIAL_ASSIGNING_FINISHED and 
NEWLY_ADDED_ASSIGNING_FINISHED will be transformed to NEWLY_ADDED_ASSIGNING.
If status is NEWLY_ADDED_ASSIGNING_SNAPSHOT_FINISHED and continue , an 
exception will be thrown in this case(sometime maybe more complex)

> [flink-cdc] Fix: The assigner is not ready to offer finished split 
> information, this should not be called
> -
>
> Key: FLINK-35859
> URL: https://issues.apache.org/jira/browse/FLINK-35859
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Affects Versions: cdc-3.1.1
>Reporter: Hongshun Wang
>Assignee: Hongshun Wang
>Priority: Minor
> Fix For: cdc-3.2.0
>
>
> When use CDC with newly added table,  an error occurs: 
> {code:java}
> The assigner is not ready to offer finished split information, this should 
> not be called. {code}
> It's because:
> 1. when stop then restart the job , the status is 
> NEWLY_ADDED_ASSIGNING_SNAPSHOT_FINISHED.
>  
> 2. Then Enumerator will send each reader with 
> BinlogSplitUpdateRequestEvent to update binlog. (see 
> org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator#syncWithReaders).
> 3. The Reader will suspend binlog reader then send 
> BinlogSplitMetaRequestEvent to Enumerator.
> 4. The Enumerator found that some tables are not sent, an error will occur
> {code:java}
> private void sendBinlogMeta(int subTask, BinlogSplitMetaRequestEvent 
> requestEvent) {
> // initialize once
> if (binlogSplitMeta == null) {
> final List finishedSnapshotSplitInfos =
> splitAssigner.getFinishedSplitInfos();
> if (finishedSnapshotSplitInfos.isEmpty()) {
> LOG.error(
> "The assigner offers empty finished split information, 
> this should not happen");
> throw new FlinkRuntimeException(
> "The assigner offers empty finished split information, 
> this should not happen");
> }
> binlogSplitMeta =
> Lists.partition(
> finishedSnapshotSplitInfos, 
> sourceConfig.getSplitMetaGroupSize());
>} 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-35964][table] Add the built-in function STARTSWITH & ENDSWITH [flink]

2024-08-25 Thread via GitHub


lincoln-lil commented on code in PR #25156:
URL: https://github.com/apache/flink/pull/25156#discussion_r1730692658


##
flink-python/pyflink/table/expression.py:
##
@@ -1028,6 +1028,16 @@ def truncate(self, n: Union[int, 'Expression[int]'] = 0) 
-> 'Expression[T]':
 
 #  string functions 
--
 
+def starts_with(self, start_expr) -> 'Expression':
+"""
+Returns if expr begins with start_expr. If start_expr is empty, the 
result is true.

Review Comment:
   ditto



##
docs/data/sql_functions.yml:
##
@@ -329,6 +329,18 @@ string:
   STRING1.overlay(STRING2, INT1)
   STRING1.overlay(STRING2, INT1, INT2)
 description: Returns a string that replaces INT2 (STRING2's length by 
default) characters of STRING1 with STRING2 from position INT1. E.g., 
'xtest'.overlay('', 6) returns "x"; 'xtest'.overlay('', 
6, 2) returns "xst".
+  - sql: STARTSWITH(expr, startExpr)
+table: expr.startsWith(startExpr)
+description: |
+  Returns if expr begins with startExpr. If startExpr is empty, the result 
is true.

Review Comment:
   -> "Returns whether expr starts with startExpr" ?



##
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/internal/BaseExpressions.java:
##
@@ -846,6 +847,19 @@ public OutType truncate() {
 
 // String operations
 
+/**
+ * Returns if {@code expr} begins with {@code startExpr}. If {@code 
startExpr} is empty, the

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-35177] Fix DataGen Connector documentation [flink]

2024-08-25 Thread via GitHub


snuyanzin commented on PR #24692:
URL: https://github.com/apache/flink/pull/24692#issuecomment-2309389579

   the idea is to have same example both in doc and example module where 
compilation is checked during ci process


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-36146) NoSuchElement exception from SingleThreadFetcherManager

2024-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-36146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kim Gräsman updated FLINK-36146:

Environment: AWS EMR/Yarn  (was: N/A)

> NoSuchElement exception from SingleThreadFetcherManager
> ---
>
> Key: FLINK-36146
> URL: https://issues.apache.org/jira/browse/FLINK-36146
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
> Environment: AWS EMR/Yarn
>Reporter: Kim Gräsman
>Priority: Minor
>
> We're running Flink 1.14.2, but this appears to be an issue still on 
> mainline, so I thought I'd report it.
> When running with high parallelism we've noticed a spurious error triggered 
> by a FileSource reader from S3;
> {code:java}
> 2024-08-19 15:23:07,044 INFO  
> org.apache.flink.connector.base.source.reader.SourceReaderBase [] - Finished 
> reading split(s) [543131]
> 2024-08-19 15:23:07,044 INFO  
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher [] - 
> Finished reading from splits [543131]
> 2024-08-19 15:23:07,044 INFO  
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager [] 
> - Closing splitFetcher 157 because it is idle.
> 2024-08-19 15:23:07,045 INFO  
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher [] - 
> Shutting down split fetcher 157
> 2024-08-19 15:23:07,045 INFO  
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher [] - Split 
> fetcher 157 exited.
> 2024-08-19 15:23:07,048 INFO  
> org.apache.flink.connector.base.source.reader.SourceReaderBase [] - Adding 
> split(s) to reader: [FileSourceSplit: ... [0, 21679984)  hosts=[localhost] 
> ID=201373 position=null]
> 2024-08-19 15:23:07,064 INFO  
> org.apache.flink.connector.base.source.reader.SourceReaderBase [] - Closing 
> Source Reader.
> 2024-08-19 15:23:07,069 WARN  org.apache.flink.runtime.taskmanager.Task       
>              [] - Source: ... -> ... (114/1602)#0 (...) switched from RUNNING 
> to FAILED with failure cause: java.util.NoSuchElementException
>         at 
> java.base/java.util.concurrent.ConcurrentHashMap$ValueIterator.next(ConcurrentHashMap.java:3471)
>         at 
> org.apache.flink.connector.base.source.reader.fetcher.SingleThreadFetcherManager.getRunningFetcher(SingleThreadFetcherManager.java:94)
>         at 
> org.apache.flink.connector.base.source.reader.fetcher.SingleThreadFetcherManager.addSplits(SingleThreadFetcherManager.java:82)
>         at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.addSplits(SourceReaderBase.java:242)
>         at 
> org.apache.flink.streaming.api.operators.SourceOperator.handleOperatorEvent(SourceOperator.java:428)
>         at 
> org.apache.flink.streaming.runtime.tasks.OperatorEventDispatcherImpl.dispatchEventToHandlers(OperatorEventDispatcherImpl.java:70)
>         at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.dispatchOperatorEvent(RegularOperatorChain.java:83)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$dispatchOperatorEvent$19(StreamTask.java:1473)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
>         at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>         at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
>         at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
>         at java.base/java.lang.Thread.run(Thread.java:829) {code}
> I believe this may be caused by a tiny TOCTOU race in 
> {{{}SingleThreadedFetcherManager{}}}. I'll admit that I don't fully 
> understand what the execution flows through that code look like, but the use 
> of atomic and synchronized indicate that it's used by multiple threads. If 
> that's not the case, this report can be safely ignored.
> The backtrace points to 
> [https://github.com/apache/flink/blob/4faf0966766e3734792f80ed66e512aa3033cacd/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/read

Re: [PR] [FLINK-36148][pipeline-connector][mysql] Add custom parser for CreateTableEvent [flink-cdc]

2024-08-25 Thread via GitHub


lvyanquan commented on code in PR #3570:
URL: https://github.com/apache/flink-cdc/pull/3570#discussion_r1730722065


##
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-mysql/src/test/java/org/apache/flink/cdc/connectors/mysql/source/MySqlPipelineITCase.java:
##
@@ -501,6 +501,22 @@ public void testSchemaChangeEvents() throws Exception {
 expected.add(
 new DropTableEvent(
 
TableId.tableId(inventoryDatabase.getDatabaseName(), "customers")));
+
+// Test create table DDL
+statement.execute(
+String.format(
+"CREATE TABLE `%s`.`newlyAddedTable1`(id int, id2 
int, primary key(id));",

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-35579] update frocksdb version to v8.10.0 [flink]

2024-08-25 Thread via GitHub


mayuehappy commented on PR #25253:
URL: https://github.com/apache/flink/pull/25253#issuecomment-2309420675

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-35809) Deploying Flink History Server and Flink SQL Gateway via Flink Operator.

2024-08-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-35809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

钟洋洋 closed FLINK-35809.
---
Resolution: Fixed

> Deploying Flink History Server and Flink SQL Gateway via Flink Operator.
> 
>
> Key: FLINK-35809
> URL: https://issues.apache.org/jira/browse/FLINK-35809
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: 钟洋洋
>Priority: Not a Priority
>
> Do we need to support deploying the Flink History Server and Flink SQL 
> Gateway via Flink Operator? I can implement it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-36150] tables.exclude is still valid if scan.binlog.newly-added-table.enabled is true. [flink-cdc]

2024-08-25 Thread via GitHub


ruanhang1993 commented on code in PR #3573:
URL: https://github.com/apache/flink-cdc/pull/3573#discussion_r1730759323


##
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/schema/Selectors.java:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.cdc.connectors.mysql.schema;
+
+import org.apache.flink.cdc.common.utils.Predicates;
+
+import io.debezium.relational.TableId;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.function.Predicate;
+
+/** Selectors for filtering tables. */
+public class Selectors {

Review Comment:
   Why do we need to add this class ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-34467] bump flink version to 1.20.0 [flink-connector-kafka]

2024-08-25 Thread via GitHub


AHeise commented on PR #111:
URL: 
https://github.com/apache/flink-connector-kafka/pull/111#issuecomment-2309462158

   > @AHeise Thanks for the reply. I totally understand the pain points of 
maintain multiple flink version compatibility for a connector. In each Flink 
release, there are always some new experimental interfaces in api or runtime 
introduced. Shall we consider the solution from Apache Hudi or Apache Iceberg? 
Both of them use a separate module for different flink versions. Some classes 
are replicated into different modules as needed. 
https://github.com/apache/hudi/tree/master/hudi-flink-datasource
   
   This sounds like a maintenance nightmare. Having separate branches sounds 
much better to me. Hudi needs to do it because they have their own release 
cycle independent of Flink. But the connector can correlate the release cycle 
to the Flink version, so we don't need to resort to such hackery. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-36150] tables.exclude is still valid if scan.binlog.newly-added-table.enabled is true. [flink-cdc]

2024-08-25 Thread via GitHub


loserwang1024 commented on code in PR #3573:
URL: https://github.com/apache/flink-cdc/pull/3573#discussion_r1730770909


##
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/schema/Selectors.java:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.cdc.connectors.mysql.schema;
+
+import org.apache.flink.cdc.common.utils.Predicates;
+
+import io.debezium.relational.TableId;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.function.Predicate;
+
+/** Selectors for filtering tables. */
+public class Selectors {

Review Comment:
   Because we need to translate tables.exclude to a filter. We can not get it 
by dbzMySqlConfig. 
   ```suggestion
   public class Selectors {z
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-36150] tables.exclude is still valid if scan.binlog.newly-added-table.enabled is true. [flink-cdc]

2024-08-25 Thread via GitHub


loserwang1024 commented on code in PR #3573:
URL: https://github.com/apache/flink-cdc/pull/3573#discussion_r1730777101


##
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/schema/Selectors.java:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.cdc.connectors.mysql.schema;
+
+import org.apache.flink.cdc.common.utils.Predicates;
+
+import io.debezium.relational.TableId;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.function.Predicate;
+
+/** Selectors for filtering tables. */
+public class Selectors {

Review Comment:
   Because we need to translate tables.exclude to a filter. We can not get it 
by dbzMySqlConfig. getTableFilters which need the whole debezium config.
   I alse want to use org.apache.flink.cdc.common.schema.Selectors, but  it is 
not so good:
   1. the param is org.apache.flink.cdc.common.event.TableId rather than 
io.debezium.relational.TableId, so I can use it directly in mysql cdc source.
   2. 
org.apache.flink.cdc.common.schema.Selectors.SelectorsBuilder#includeTables 
will parse "databaseName.tableName" as , the 
org.apache.flink.cdc.connectors.mysql.source.utils.TableDiscoveryUtils#listTables
 will return . Thus, the selector cannot filter 
table from mysql cdc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

65 matches

Mail list logo