[jira] [Created] (FLINK-34504) Avoid the parallelism adjutment when the upstream shuffle type doesn't has keyBy

2024-02-23 Thread Rui Fan (Jira)
Rui Fan created FLINK-34504:
---

 Summary: Avoid the parallelism adjutment when the upstream shuffle 
type doesn't has keyBy
 Key: FLINK-34504
 URL: https://issues.apache.org/jira/browse/FLINK-34504
 Project: Flink
  Issue Type: Improvement
  Components: Autoscaler
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: kubernetes-operator-1.8.0


JobVertexScaler#scale has a optimization: Try to adjust the parallelism such 
that it divides the number of key groups without a remainder =>  data is evenly 
spread across subtasks.

 

It's only useful when the upstream shuffle type has keyBy. We should avoid this 
optimization when the upstream shuffle type doesn't has keyBy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34505) Migrate WindowGroupReorderRule

2024-02-23 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-34505:
-

 Summary: Migrate WindowGroupReorderRule
 Key: FLINK-34505
 URL: https://issues.apache.org/jira/browse/FLINK-34505
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.20.0
Reporter: Jacky Lau
 Fix For: 1.20.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34506) Do not copy "file://" schemed artifact in standalone application modes

2024-02-23 Thread Ferenc Csaky (Jira)
Ferenc Csaky created FLINK-34506:


 Summary: Do not copy "file://" schemed artifact in standalone 
application modes
 Key: FLINK-34506
 URL: https://issues.apache.org/jira/browse/FLINK-34506
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission
Affects Versions: 1.19.0
Reporter: Ferenc Csaky


In standalone application mode, if an artifact is passed via a path witohut 
prefix, the file will be copied to `user.artifacts.base-dir`, although it 
should not be, as it can accessable locally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34507) JSON functions have wrong operand checker

2024-02-23 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-34507:


 Summary: JSON functions have wrong operand checker
 Key: FLINK-34507
 URL: https://issues.apache.org/jira/browse/FLINK-34507
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.18.1
Reporter: Dawid Wysakowicz


I believe that all JSON functions (`JSON_VALUE`, `JSON_QUERY`, ...) have wrong 
operand checker.

As far as I can tell the first argument (the JSON) should be a `STRING` 
argument. That's what all other systems do (some accept clob/blob additionally 
e.g. ORACLE).

We via Calcite accept `ANY` type there, which I believe is wrong: 
https://github.com/apache/calcite/blob/c49792f9c72159571f898c5fca1e26cba9870b07/core/src/main/java/org/apache/calcite/sql/fun/SqlJsonValueFunction.java#L61



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS]FLIP-420: Add API annotations for RocksDB StateBackend user-facing classes

2024-02-23 Thread Yun Tang
Hi Jinzhong,

Thanks for driving this topic, and +1 for fixing the lack of annotation.

@Yanfei the `ConfigurableRocksDBOptionsFactory` interface is introduced for 
user extension, you can refer to the doc[1], which shows an example of how to 
use this interface.


[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/large_state_tuning/#tuning-rocksdb-memory


Best
Yun Tang

From: Yanfei Lei 
Sent: Thursday, February 22, 2024 15:39
To: dev@flink.apache.org 
Subject: Re: [DISCUSS]FLIP-420: Add API annotations for RocksDB StateBackend 
user-facing classes

Hi Jinzhong,
Thanks for driving this!

1. I'm wondering if `ConfigurableRocksDBOptionsFactory` will be used
by users,  currently it looks like only developers use it in rocksdb
state backend module. And Its only non-testing subclass
"DefaultConfigurableOptionsFactory" is marked @Deprecated.
2. Regarding @Internal,  according to the comments, it is used for
"Annotation to mark methods within stable, public APIs as an internal
developer API."  So marking "SingleStateIterator" and
"RocksDBRestoreOperation" as @Internal is acceptable for me.

Best,
Yanfei

Jinzhong Li  于2024年1月25日周四 12:16写道:
>
> Hi Zakelly,
>
> Thanks for your comments!
>
> 1)I agree that almost no user would use "RocksDBStateUploader" and
> "RocksDBStateDownloader" to do something. It's fine for me to keep them
> unmarked.
> 2)Regarding "SingleStateIterator", I think it's acceptable to either leave
> it unmarked or mark it as @Internal. I just consider that
> SingleStateIterator is one interface with the "public" modifier and it is
> harmless to annotate it as @Internal.
>
>
>
>
> Hi Hangxiang,
>
> Thanks for the reminder!
>
> It makes sense to mark RocksDBStateBackendFactory as Deprecated.
>
> Best,
> Jinzhong Li
>
>
> On Thu, Jan 25, 2024 at 10:22 AM Hangxiang Yu  wrote:
>
> > Hi Jinzhong.
> > Thanks for driving this!
> > Some suggestions:
> > 1. As RocksDBStateBackend marked as Deprecated, We should also
> > mark RocksDBStateBackendFactory as Deprecated
> > 2. Since 1.19 will be freezed in 1.26. Let's adjust the target version to
> > 1.20
> >
> >
> > On Wed, Jan 24, 2024 at 11:50 PM Zakelly Lan 
> > wrote:
> >
> > > Hi Jinzhong,
> > >
> > > Thanks for driving this! +1 for fixing the lack of annotation.
> > >
> > > I'm wondering if we really need to annotate *RocksDBStateUploader* and
> > > *RocksDBStateDownloader
> > > *with @Internal, as they seem to be ordinary classes without interacting
> > > with other modules.
> > > Also, I have reservations about annotating *SingleStateIterator*, but I'd
> > > like to hear others' opinions and won't insist on this.
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Wed, Jan 24, 2024 at 10:26 PM Jinzhong Li 
> > > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I’m opening this thread to discuss about FLIP-420: Add API annotations
> > > for
> > > > RocksDB StateBackend user-facing classes[1].
> > > >
> > > > As described in FLINK-18255[2] , several user-facing classes in
> > > > flink-statebackend-rocksdb module don't have any API annotations, not
> > > even
> > > > @PublicEvolving. This FLIP will add annotations for them to clarify
> > their
> > > > usage.
> > > >
> > > > Looking forward to hearing from you, thanks!
> > > >
> > > >
> > > > Best regards,
> > > > Jinzhong Li
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-420%3A+Add+API+annotations+for+RocksDB+StateBackend+user-facing+classes
> > > > [2] https://issues.apache.org/jira/browse/FLINK-18255
> > > >
> > >
> >
> >
> > --
> > Best,
> > Hangxiang.
> >


Re: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun

2024-02-23 Thread Yun Tang
Congratulations, Jiabao!

Best
Yun Tang

From: Weihua Hu 
Sent: Thursday, February 22, 2024 17:29
To: dev@flink.apache.org 
Subject: Re: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun

Congratulations, Jiabao!

Best,
Weihua


On Thu, Feb 22, 2024 at 10:34 AM Jingsong Li  wrote:

> Congratulations! Well deserved!
>
> On Wed, Feb 21, 2024 at 4:36 PM Yuepeng Pan  wrote:
> >
> > Congratulations~ :)
> >
> > Best,
> > Yuepeng Pan
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 在 2024-02-21 09:52:17,"Hongshun Wang"  写道:
> > >Congratulations, Jiabao :)
> > >Congratulations Jiabao!
> > >
> > >Best,
> > >Hongshun
> > >Best regards,
> > >
> > >Weijie
> > >
> > >On Tue, Feb 20, 2024 at 2:19 PM Runkang He  wrote:
> > >
> > >> Congratulations Jiabao!
> > >>
> > >> Best,
> > >> Runkang He
> > >>
> > >> Jane Chan  于2024年2月20日周二 14:18写道:
> > >>
> > >> > Congrats, Jiabao!
> > >> >
> > >> > Best,
> > >> > Jane
> > >> >
> > >> > On Tue, Feb 20, 2024 at 10:32 AM Paul Lam 
> wrote:
> > >> >
> > >> > > Congrats, Jiabao!
> > >> > >
> > >> > > Best,
> > >> > > Paul Lam
> > >> > >
> > >> > > > 2024年2月20日 10:29,Zakelly Lan  写道:
> > >> > > >
> > >> > > >> Congrats! Jiabao!
> > >> > >
> > >> > >
> > >> >
> > >>
>


[jira] [Created] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio

2024-02-23 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34508:
-

 Summary: Migrate S3-related ITCases and e2e tests to Minio 
 Key: FLINK-34508
 URL: https://issues.apache.org/jira/browse/FLINK-34508
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34509) Docs: Fix Debezium JSON/AVRO opts in examples

2024-02-23 Thread Lorenzo Affetti (Jira)
Lorenzo Affetti created FLINK-34509:
---

 Summary: Docs: Fix Debezium JSON/AVRO opts in examples 
 Key: FLINK-34509
 URL: https://issues.apache.org/jira/browse/FLINK-34509
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Lorenzo Affetti


Problem found here: 
[https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/debezium/#how-to-use-debezium-format.]

Here is the example provided:
{code:java}
CREATE TABLE topic_products (
  -- schema is totally the same to the MySQL "products" table
  id BIGINT,
  name STRING,
  description STRING,
  weight DECIMAL(10, 2)
) WITH (
 'connector' = 'kafka',
 'topic' = 'products_binlog',
 'properties.bootstrap.servers' = 'localhost:9092',
 'properties.group.id' = 'testGroup',
 -- using 'debezium-json' as the format to interpret Debezium JSON messages
 -- please use 'debezium-avro-confluent' if Debezium encodes messages in Avro 
format
 'format' = 'debezium-json'
) {code}
Actually, `debezium-json` would require `'debezium-json.schema-include' = 
'true'` to work (as, by default, Debezium includes the schema. See 
[https://www.markhneedham.com/blog/2023/01/24/flink-sql-could-not-execute-sql-statement-corrupt-debezium-message/).]

On the other hand `debezium-avro` would require the URL of the Confluent schema 
registry: `'debezium-avro-confluent.url' = '[http://...:8081'.]

I propose to split the single example in 2 with the correct default options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Temporal join on rolling aggregate

2024-02-23 Thread Feng Jin
+1 Support this feature. There are many limitations to using time window
aggregation currently, and if we can declare watermark and time attribute
on the view, it will make it easier for us to use time windows. Similarly,
it would be very useful if the primary key could be declared in the view.

Therefore, I believe we need a FLIP to detail the design of this feature.


Best,
Feng

On Fri, Feb 23, 2024 at 2:39 PM  wrote:

> +1 for supporting defining time attributes on views.
>
> I once encountered the same problem as yours. I did some regular joins and
> lost time attribute, and hence I could no longer do window operations in
> subsequent logics. I had to output the joined view to Kafka, read from it
> again, and define watermark on the new source - a cubersome workaround.
>
> It would be more flexible if we could control time attribute / watermark
> on views, just as if it's some kind of special source.
>
> Thanks,
> Yaming
> 在 Feb 22, 2024, 7:46 PM +0800,Gyula Fóra ,写道:
> > Posting this to dev as well as it potentially has some implications on
> development effort.
> >
> > What seems to be the problem here is that we cannot control/override
> Timestamps/Watermarks/Primary key on VIEWs. It's understandable that you
> cannot create a PRIMARY KEY on the view but I think the temporal join also
> should not require the PK, should we remove this limitation?
> >
> > The general problem is the inflexibility of the timestamp/watermark
> handling on query outputs, which makes this again impossible.
> >
> > The workaround here can be to write the rolling aggregate to Kafka, read
> it back again and join with that. The fact that this workaround is possible
> actually highlights the need for more flexibility on the query/view side in
> my opinion.
> >
> > Has anyone else run into this issue and considered the proper solution
> to the problem? Feels like it must be pretty common :)
> >
> > Cheers,
> > Gyula
> >
> >
> >
> >
> > > On Wed, Feb 21, 2024 at 10:29 PM Sébastien Chevalley 
> wrote:
> > > > Hi,
> > > >
> > > > I have been trying to write a temporal join in SQL done on a rolling
> aggregate view. However it does not work and throws :
> > > >
> > > > org.apache.flink.table.api.ValidationException: Event-Time Temporal
> Table Join requires both primary key and row time attribute in versioned
> table, but no row time attribute can be found.
> > > >
> > > > It seems that after the aggregation, the table looses the watermark
> and it's not possible to add one with the SQL API as it's a view.
> > > >
> > > > CREATE TABLE orders (
> > > > order_id INT,
> > > > price DECIMAL(6, 2),
> > > > currency_id INT,
> > > > order_time AS NOW(),
> > > > WATERMARK FOR order_time AS order_time - INTERVAL '2' SECOND
> > > > )
> > > > WITH (
> > > > 'connector' = 'datagen',
> > > > 'rows-per-second' = '10',
> > > > 'fields.order_id.kind' = 'sequence',
> > > > 'fields.order_id.start' = '1',
> > > > 'fields.order_id.end' = '10',
> > > > 'fields.currency_id.min' = '1',
> > > > 'fields.currency_id.max' = '20'
> > > > );
> > > >
> > > > CREATE TABLE currency_rates (
> > > > currency_id INT,
> > > > conversion_rate DECIMAL(4, 3),
> > > > PRIMARY KEY (currency_id) NOT ENFORCED
> > > > )
> > > > WITH (
> > > > 'connector' = 'datagen',
> > > > 'rows-per-second' = '10',
> > > > 'fields.currency_id.min' = '1',
> > > > 'fields.currency_id.max' = '20'
> > > > );
> > > >
> > > > CREATE TEMPORARY VIEW max_rates AS (
> > > > SELECT
> > > > currency_id,
> > > > MAX(conversion_rate) AS max_rate
> > > > FROM currency_rates
> > > > GROUP BY currency_id
> > > > );
> > > >
> > > > CREATE TEMPORARY VIEW temporal_join AS (
> > > > SELECT
> > > > order_id,
> > > > max_rates.max_rate
> > > > FROM orders
> > > >  LEFT JOIN max_rates FOR SYSTEM_TIME AS OF orders.order_time
> > > >  ON orders.currency_id = max_rates.currency_id
> > > > );
> > > >
> > > > SELECT * FROM temporal_join;
> > > >
> > > > Am I missing something? What would be a good starting point to
> address this?
> > > >
> > > > Thanks in advance,
> > > > Sébastien Chevalley
>


Re: Flink's treatment to "hadoop" and "yarn" configuration overrides seems unintuitive

2024-02-23 Thread Venkatakrishnan Sowrirajan
Gentle ping on the ^^ question to surface this back up again. Any thoughts?

Regards
Venkata krishnan


On Fri, Feb 16, 2024 at 7:32 PM Venkatakrishnan Sowrirajan 
wrote:

> Hi Flink devs,
>
> Flink supports overriding "hadoop" and "yarn" configuration. As part of
> the override mechanism, users have to prefix `hadoop` configs with "
> flink.hadoop." and the prefix will be removed, while with `yarn` configs
> users have to prefix it with "flink.yarn." but "flink." only is removed,
> not "flink.yarn.".
>
> Following is an example:
>
> 1. "*Hadoop*" config
>
> Hadoop config key = hadoop.tmp.dir => Flink config =
> flink.hadoop.hadoop.tmp.dir => Hadoop's configuration object would have
> hadoop.tmp.dir*.*
>
> *2. "YARN" config*
>
> YARN config key = yarn.application.classpath => Flink config =
> flink.yarn.yarn.application.classpath => YARN's configuration object
> would have yarn.yarn.application.classpath*.*
>
> Although this is documented
> 
> properly, it feels unintuitive and it tripped me, took quite a while to
> understand why the above YARN configuration override was not working as
> expected. Is this something that should be fixed? The problem with fixing
> it is, it will become backwards incompatible. Therefore, can this be
> addressed as part of Flink-2.0?
>
> Any thoughts?
>
> Regards
> Venkata krishnan
>