[jira] [Created] (FLINK-30586) Fix calcCodeGen failed if calc with like condition contains double quotation mark

2023-01-06 Thread Yunhong Zheng (Jira)
Yunhong Zheng created FLINK-30586:
-

 Summary: Fix calcCodeGen failed if calc with like condition 
contains double quotation mark
 Key: FLINK-30586
 URL: https://issues.apache.org/jira/browse/FLINK-30586
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.16.0
Reporter: Yunhong Zheng
 Fix For: 1.17.0
 Attachments: code-gen-1.png, code-gen-2.png

If I write a sql like "SELECT * FROM MyTable WHERE b LIKE '%"%'" in Flink-1.16 
as

'like' condition contains double quotation mark, it will cause code gen failed 
because wrong code generated by codeGen. 

!code-gen-1.png!

 

!code-gen-2.png!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-280: Introduce a new explain mode to provide SQL advice

2023-01-06 Thread Jingsong Li
Thanks Jane for your feedback.

`EXPLAIN PLAN_ADVICE` looks good to me.

Best,
Jingsong

On Thu, Jan 5, 2023 at 5:20 PM Jane Chan  wrote:
>
> Hi, devs,
>
> After discussing with Godfrey , Lincoln
> , and Jark , I've updated the
> FLIP document[1] and look forward to your opinions and suggestions.
>
> The highlight difference is listed as follows.
>
>- *The proposed syntax changes from EXPLAIN ANALYZED_PHYSICAL_PLAN
> to EXPLAIN PLAN_ADVICE *.
>   - The reason for changing the syntax is that the output format and
>   analyzed target are two orthogonal concepts and better be
> decoupled. On the
>   other hand, users may care about the advice content instead of which 
> plan
>   is analyzed, and thus PHYSICAL should be kept from users.
>
>
>- *The output format changes from JSON to current tree-style text.
>Introduce ExplainFormat to classify the output format.*
>   - The current output format is a mixture of plain text (AST,
>   Optimized Physical Plan, and Optimized Execution Plan) and JSON 
> (Physical
>   Execution Plan,  via EXPLAIN JSON_EXECUTION_PLAN ), which is not
> structured
>   and categorized. By introducing ExplainFormat, we can better classify 
> the
>   output format and have more flexibility to extend more formats in the
>   future.
>
>
>- *The PlanAnalyzer installation gets rid of SPI.*
>   - PlanAnalyzer should be an internal interface and not be exposed to
>   users. Therefore, the Factory mechanism is unsuitable for this.
>
>
> To Godfrey , Jingsong , and
> Shengkai , Thanks for your comments and questions.
>
> @Jingsong
>
> > Can you give examples of other systems for the syntax?
> > In other systems, is EXPLAIN ANALYZE already PHYSICAL_PLAN?
> >
>
> For other systems like MySQL[2], PostgreSQL[3], Presto[4], and TiDB[5]
>
> EXPLAIN ANALYZE 
> is the mainstream syntax.
>
> However, it represents an actual measurement of the cost, i.e., the
> statement will execute the statement, which is unsuitable for this
> condition.
>
>
> `EXPLAIN ANALYZED_PHYSICAL_PLAN ` looks a bit strange, and even
> > stranger that it contains `advice`. The purpose of FLIP seems to be a bit
> > more to `advice`, so can we just
> > introduce a syntax for `advice`?
>
>
> Good point. After several discussions, the syntax has been updated to
>
> EXPLAIN PLAN_ADVICE 
>
> @Godfrey
>
> Do we really need to expose `PlanAnalyzerFactory` as public interface?
> > I prefer we only expose ExplainDetail#ANALYZED_PHYSICAL_PLAN and the
> > analyzed result.
> > Which is enough for users and consistent with the results of `explain`
> > method.The classes about plan analyzer are in table planner module, which
> > does not public api (public interfaces should be defined in
> > flink-table-api-java module). And PlanAnalyzer is depend on RelNode, which
> > is internal class of planner, and not expose to users.
>
>
> Good point. After reconsideration, the SPI mechanism is removed from the
> FLIP. PlanAnalyzer should be an internal implementation much similar to a
> RelOptRule, and should not be exposed to users.
>
> @Shengkai
>
> > 1. `PlanAnalyzer#analyze` uses the FlinkRelNode as the input. Could you
> > share some thoughts about the motivation? In my experience, users mainly
> > care about 2 things when they develop their job:
>
> a. Why their SQL can not work? For example, their streaming SQL contains an
> > OVER window but their ORDER key is not ROWTIME. In this case, we may don't
> > have a physical node or logical node because, during the optimization, the
> > planner has already thrown the exception.
> >
>
>  The prerequisite for providing advice is that the optimized physical can
> be generated. The planner should throw exceptions if the query contains
> syntax errors or other problems.
>
>
>
> > b. Many users care about whether their state is compatible after upgrading
> > their Flink version. In this case, I think the old execplan and the SQL
> > statement are the user's input.
>
>
> Good point. State compatibility detection is beneficial, but it better be
> decoupled with EXPLAIN PLAN_ADVICE. We could provide a separate mechanism
> for cross-version validation.
>
>
> 2. I am just curious how other people add the rules to the Advisor. When
> > rules increases, all these rules should be added to the Flink codebase?
>
>
> It is much similar to adding a RelOptRule to RuleSet. The number of
> analyzers will not be growing too fast. So adding them to the Flink
> codebase may not be a concern.
>
>
> 3. How do users configure another advisor?
>
>
>  After reconsideration, I would like to let PlanAdvisor be an internal
> interface, which is different from implementing a custom connector/format.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-280%3A+Introduce+EXPLAIN+PLAN_ADVICE+to+provide+SQL+advice
> [2] https://dev.mysql.com/doc/refman/8.0/en/explain.html#explain-analyze
> [3] https://www.postgresql.org/docs/current/sql-explain.html
> [4] https://p

[jira] [Created] (FLINK-30587) Validate primary key in an append-only table in ddl

2023-01-06 Thread Shammon (Jira)
Shammon created FLINK-30587:
---

 Summary: Validate primary key in an append-only table in ddl
 Key: FLINK-30587
 URL: https://issues.apache.org/jira/browse/FLINK-30587
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.3.0, table-store-0.4.0
Reporter: Shammon


Current table store check primary key in an append-only table, it should be 
checked in catalog table too



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30588) Improvement in truncate code in the new ABFSOutputstream

2023-01-06 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created FLINK-30588:
--

 Summary: Improvement in truncate code in the new ABFSOutputstream 
 Key: FLINK-30588
 URL: https://issues.apache.org/jira/browse/FLINK-30588
 Project: Flink
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30589) Snapshot expiration should be skipped in Table Store dedicated writer jobs

2023-01-06 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-30589:
---

 Summary: Snapshot expiration should be skipped in Table Store 
dedicated writer jobs
 Key: FLINK-30589
 URL: https://issues.apache.org/jira/browse/FLINK-30589
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.3.0, table-store-0.4.0
Reporter: Caizhi Weng
Assignee: Caizhi Weng
 Fix For: table-store-0.3.0, table-store-0.4.0


Currently Table Store dedicated writer jobs will also expire snapshots. This 
may cause conflicts when multiple writer jobs are running.

We should expire snapshots only in dedicated compact job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30590) Remove set default value manual for table options

2023-01-06 Thread Shammon (Jira)
Shammon created FLINK-30590:
---

 Summary: Remove set default value manual for table options
 Key: FLINK-30590
 URL: https://issues.apache.org/jira/browse/FLINK-30590
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.4.0
Reporter: Shammon


Remove set default value manually in `CoreOptions.setDefaultValues` which may 
cause wrong error information and it's not needed anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30591) Stateful functions SDK for Flink DataStream Integration example is not available

2023-01-06 Thread Utopia (Jira)
Utopia created FLINK-30591:
--

 Summary: Stateful functions SDK for Flink DataStream Integration 
example is not available
 Key: FLINK-30591
 URL: https://issues.apache.org/jira/browse/FLINK-30591
 Project: Flink
  Issue Type: Bug
  Components: Stateful Functions
Reporter: Utopia
 Attachments: image-2023-01-06-17-49-04-381.png, 
image-2023-01-06-17-50-21-013.png

 

https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/sdk/flink-datastream/

 

!image-2023-01-06-17-49-04-381.png!

 

The 'apache/flink-statefun' repository doesn't contain the 
'statefun-examples/statefun-flink-datastream-example/src/main/java/org/apache/flink/statefun/examples/datastream/Example.java'
 path in 'master'.  
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30592) The unsupported hive version is not deleted on the hive overview document

2023-01-06 Thread chrismartin (Jira)
chrismartin created FLINK-30592:
---

 Summary: The unsupported hive version is not deleted on the hive 
overview document
 Key: FLINK-30592
 URL: https://issues.apache.org/jira/browse/FLINK-30592
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.16.0, 1.17.0
Reporter: chrismartin


flink 1.16.0 drop support for Hive versions 1.*, 2.1.* and 2.2.* which are no 
longer supported by the Hive community,but overview document  was not remove 
these versions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-280: Introduce a new explain mode to provide SQL advice

2023-01-06 Thread Jane Chan
Hi, devs,

Thanks for all the feedback.

Based on the discussion[1], we seem to have a consensus so far, so I would
like to start a vote on FLIP-280[2], which begins on the following Monday
(Jan 9th at 10:00 AM GMT).

If you have any questions or doubts, please do not hesitate to follow up on
this discussion.

Best,
Jane

[1] https://lists.apache.org/thread/5xywxv7g43byoh0jbx1b6qo6gx6wjkcz
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-280%3A+Introduce+EXPLAIN+PLAN_ADVICE+to+provide+SQL+advice

On Fri, Jan 6, 2023 at 4:27 PM Jingsong Li  wrote:

> Thanks Jane for your feedback.
>
> `EXPLAIN PLAN_ADVICE` looks good to me.
>
> Best,
> Jingsong
>
> On Thu, Jan 5, 2023 at 5:20 PM Jane Chan  wrote:
> >
> > Hi, devs,
> >
> > After discussing with Godfrey , Lincoln
> > , and Jark , I've updated the
> > FLIP document[1] and look forward to your opinions and suggestions.
> >
> > The highlight difference is listed as follows.
> >
> >- *The proposed syntax changes from EXPLAIN ANALYZED_PHYSICAL_PLAN
> > to EXPLAIN PLAN_ADVICE *.
> >   - The reason for changing the syntax is that the output format and
> >   analyzed target are two orthogonal concepts and better be
> > decoupled. On the
> >   other hand, users may care about the advice content instead of
> which plan
> >   is analyzed, and thus PHYSICAL should be kept from users.
> >
> >
> >- *The output format changes from JSON to current tree-style text.
> >Introduce ExplainFormat to classify the output format.*
> >   - The current output format is a mixture of plain text (AST,
> >   Optimized Physical Plan, and Optimized Execution Plan) and JSON
> (Physical
> >   Execution Plan,  via EXPLAIN JSON_EXECUTION_PLAN ), which is not
> > structured
> >   and categorized. By introducing ExplainFormat, we can better
> classify the
> >   output format and have more flexibility to extend more formats in
> the
> >   future.
> >
> >
> >- *The PlanAnalyzer installation gets rid of SPI.*
> >   - PlanAnalyzer should be an internal interface and not be exposed
> to
> >   users. Therefore, the Factory mechanism is unsuitable for this.
> >
> >
> > To Godfrey , Jingsong , and
> > Shengkai , Thanks for your comments and questions.
> >
> > @Jingsong
> >
> > > Can you give examples of other systems for the syntax?
> > > In other systems, is EXPLAIN ANALYZE already PHYSICAL_PLAN?
> > >
> >
> > For other systems like MySQL[2], PostgreSQL[3], Presto[4], and TiDB[5]
> >
> > EXPLAIN ANALYZE 
> > is the mainstream syntax.
> >
> > However, it represents an actual measurement of the cost, i.e., the
> > statement will execute the statement, which is unsuitable for this
> > condition.
> >
> >
> > `EXPLAIN ANALYZED_PHYSICAL_PLAN ` looks a bit strange, and even
> > > stranger that it contains `advice`. The purpose of FLIP seems to be a
> bit
> > > more to `advice`, so can we just
> > > introduce a syntax for `advice`?
> >
> >
> > Good point. After several discussions, the syntax has been updated to
> >
> > EXPLAIN PLAN_ADVICE 
> >
> > @Godfrey
> >
> > Do we really need to expose `PlanAnalyzerFactory` as public interface?
> > > I prefer we only expose ExplainDetail#ANALYZED_PHYSICAL_PLAN and the
> > > analyzed result.
> > > Which is enough for users and consistent with the results of `explain`
> > > method.The classes about plan analyzer are in table planner module,
> which
> > > does not public api (public interfaces should be defined in
> > > flink-table-api-java module). And PlanAnalyzer is depend on RelNode,
> which
> > > is internal class of planner, and not expose to users.
> >
> >
> > Good point. After reconsideration, the SPI mechanism is removed from the
> > FLIP. PlanAnalyzer should be an internal implementation much similar to a
> > RelOptRule, and should not be exposed to users.
> >
> > @Shengkai
> >
> > > 1. `PlanAnalyzer#analyze` uses the FlinkRelNode as the input. Could you
> > > share some thoughts about the motivation? In my experience, users
> mainly
> > > care about 2 things when they develop their job:
> >
> > a. Why their SQL can not work? For example, their streaming SQL contains
> an
> > > OVER window but their ORDER key is not ROWTIME. In this case, we may
> don't
> > > have a physical node or logical node because, during the optimization,
> the
> > > planner has already thrown the exception.
> > >
> >
> >  The prerequisite for providing advice is that the optimized physical can
> > be generated. The planner should throw exceptions if the query contains
> > syntax errors or other problems.
> >
> >
> >
> > > b. Many users care about whether their state is compatible after
> upgrading
> > > their Flink version. In this case, I think the old execplan and the SQL
> > > statement are the user's input.
> >
> >
> > Good point. State compatibility detection is beneficial, but it better be
> > decoupled with EXPLAIN PLAN_ADVICE. We could provide a separate mechanism
> > for cross-version validation.
> >
> >

[RESULT][VOTE]FLIP-266: Simplify network memory configurations for TaskManager

2023-01-06 Thread Yuxin Tan
Hi all,

FLIP-266: Simplify network memory configurations for TaskManager[1]
has been accepted. The FLIP was voted in this thread[2].

There are 3 bindings, and 1 non-bindings as follows:

Xintong Song (binding)
Zhu Zhu (binding)
JasonLee (no-binding)
Lijie Wang (binding)

There are no disapproving votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-266:+Simplify+network+memory+configurations+for+TaskManager
[2]https://lists.apache.org/thread/flv4w4tn5r8xbhzdqngx8o8o8t0gv3bt

Best,
Yuxin


[jira] [Created] (FLINK-30593) Determine restart time on the fly for Autoscaler

2023-01-06 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-30593:
--

 Summary: Determine restart time on the fly for Autoscaler
 Key: FLINK-30593
 URL: https://issues.apache.org/jira/browse/FLINK-30593
 Project: Flink
  Issue Type: New Feature
  Components: Autoscaler, Kubernetes Operator
Reporter: Gyula Fora


Currently the autoscaler uses a preconfigured restart time for the job. We 
should dynamically adjust this on the observered restart times for scale 
operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: CDC from Oracle database reading directly logs - integration with OpenLogReplicator

2023-01-06 Thread Gunnar Morling
Hey Adam, all,

Just came across this thread, still remembering the good conversations
we had around this while I was working on Debezium full-time :)

Personally, I still believe the best way forward with this would be to
add support to the Debezium connector for Oracle so it can ingest
changes from a remote OpenLogReplicator instance via that server
you've built. That way, you don't need to deal with any Kafka
specifics, users would inherit the existing functionality for
backfilling, integration with Debezium Server (i.e. for non-Kafka
scenarios like Apache Pulsar, Kinesis, etc.), and Debezium engine
(which is what Flink CDC is based on). The Debezium connector for
Oracle is already built in a way that it supports multiple stream
ingestion adapters (currently LogMiner and XStream), so adding another
one for OLR would be rather simple. This approach would simplify
things from a Flink (CDC) perspective a lot.

I've just pinged the folks over in the Debezium community on this, it
would be great to see progress in this matter.

Best,

--Gunnar


Am Do., 5. Jan. 2023 um 20:55 Uhr schrieb Adam Leszczyński
:
>
> Thanks Leonard, Jark,
>
> I will just reply on the dev list for this topic as this is more related with 
> development. Sorry, I have sent on 2 lists - I don’t want to add more chaos 
> here.
>
> The answer to your question is not straight, so I will start from a broader 
> picture.
>
> Maybe first I will describe some assumptions that I have chosen while 
> designing OpenLogReplicator. The project is aimed to be minimalistic. It 
> should only contain the code that is necessary to do parsing of Oracle redo 
> logs. Nothing more, it should not be a fully functional replicator. So, the 
> targets are limited to middleware (like Kafka, Flink, some MQ). The amount of 
> dependencies is reduced to minimal.
>
> The second assumption is to make the project  stateless wherever possible. 
> The goal is to put on HA (Kubernetes) and store state in Redis (not yet 
> implemented). But generally OpenLogReplicator should not handle the 
> information (if possible) about the position of data confirmed by the 
> receiver. This would allow the receiver to choose way of handling failures 
> (data to be duplicated on restart, idempotent message).
>
> The third topic is initial data load. There is plenty of available software 
> for that. There is absolutely no need to duplicate it in this project. No 
> ETL, selects, etc. My goal is just to track changes.
>
> The fourth assumption is to write code in C++ so that the code is fast, and I 
> have full control over memory. The code can fully reuse memory and work also 
> with machines with little memory. This allows easy compilation on Linux, but 
> maybe in the future also on Solaris, AIX, HP-UX, or even Windows (if there is 
> demand for that). I think Java is good for some solutions but not for a 
> binary parser which heavily works with memory and in most cases uses zero 
> copy approach.
>
> Amount of data in the output is actually defined by source database (how much 
> data is logged - full schema or just changed columns). I don’t care. The user 
> defines that what is logged by db. If just primary key and changed columns - 
> I can send just changed data. If someone wants full schema in every payload - 
> this is fine too. If schema changes - no problem, I can provide just DDL 
> commands and process further payloads with new schema.
>
> Format of data - this is actually defined by the receiver. My first choice 
> was JSON. Next the Debezium guys asked me to support Protobuf. Ok, I have 
> spend a lot of time and extended the architecture to actually make the code 
> modular and allow to choose the format of the payload. The writer module can 
> directly produce json or protobuf payload. Actually that can be extended to 
> any other format if there is demand for that. Also the json format allows 
> many options regarding format. I generally don’t test protobuf format code - 
> I would treat that as a prototype because I know nobody who would like to use 
> it. This code was planned for Debezium request but so far nobody cares.
>
> For integration with other systems, languages - this is an open case. 
> Actually I am here agnostic. The data that is produced for output is stored 
> in a buffer and can be sent to any target. This is done by the Writer module 
> (you can look at the code) and there is a writer for Kafka, ZeroMQ and even 
> plain network tcp/ip connection. I don’t understand the question regarding to 
> adapt that better. If I have a specification I can extend. Say what you need.
>
> In such case when we have bidirectional connection not like with Kafka - the 
> receiver can define starting position of data (scn) of the stream he/she 
> wants to receive.
>
> You can look at the prototype code how this communication would look like: 
> StreamClient.cpp - but please rather treat that as a working prototype. This 
> is a client which just connects to OpenLog

Re: [DISCUSS] FLIP-282: Introduce Delete & Update API

2023-01-06 Thread yuxia
Hi, all.

Jingsong raises a really good point. And after a offline discussion with 
Jingsong, we reach an agreement about it.
I post the modification for the FLIP in here:
1: remove all the modiction in ScanTableSource and DynamicTableSink which are 
really confusing.
2: Introduce an independent interface named SupportsRowLevelModificationScan 
for source, with which the table source scan can know whether the table source 
scan is for update or delete statement.
3: Introduce an interface named RowLevelModificationScanContext, the source 
which implements SupportsRowLevelModificationScan can then pass the 
RowLevelModificationScanContext providing
necessary informations to the sink.

I have updated all the modification to the FLIP-282[1].

Now, seems we have a consensus for this FLIP, I would like to start a vote on 
FLIP-282, which begins on the following Monday
(Jan 9th at 10:00 AM GMT) if there's no any other question. If you have any 
other question, please reply it in this discuss thread.[2]

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235838061
[2] https://lists.apache.org/thread/6h64v7v6gj916pkvmc3ql3vxxccr46r3



Best regards,
Yuxia

- 原始邮件 -
发件人: "Jingsong Li" 
收件人: "dev" 
发送时间: 星期五, 2023年 1 月 06日 下午 2:08:26
主题: Re: [DISCUSS] FLIP-282: Introduce Delete & Update API

Thanks yuxia for your explanation.

But what I mean is that this may lead to confusion for implementers
and users. You can use comments to explain it. However, a good
interface can make the mechanism clearer through code design.

So here, I still think an independent SupportsXX interface can make
the behavior more clear.

Best,
Jingsong

On Wed, Jan 4, 2023 at 10:56 AM yuxia  wrote:
>
> Hi, Jingsong, thanks for your comments.
>
> ## About RowLevelDeleteMode
> That's really a good suggestion, now I have updated the FLIP to make 
> RowLevelDeleteMode a higer level.
>
> ## About scope of addContextParameter
> Sorry for the confusing, now I have updated the FLIP to add more comments for 
> it. The scope for the parameters is limited to the phase
> that Flink translates ranslates physical RelNode to ExecNode.
> It's possible to see all the other sources and sinks in a topo. For the 
> order, if only one sink, the sink will be last one to see the parametes,
> the order for the sources is consistent to the order the table source nodes 
> are translated to ExecNode.
> If there're multiple sinks for the case of StatementSet, the sink may also 
> see the parameters added by the table sources that belong the statment
> added earlier.
>
> ## About scope of getScanPurpose
> Yes, all sources wil see this method. But it won't bring any compatibility 
> issues for in here we just tell the source scan
> what's scan purpose without touching any other logic. If sources ignore this 
> method, it just works as normally. So I think there's
> no necessary to add a new interface like SupportsXX.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Jingsong Li" 
> 收件人: "dev" 
> 发送时间: 星期二, 2023年 1 月 03日 下午 12:12:12
> 主题: Re: [DISCUSS] FLIP-282: Introduce Delete & Update API
>
> Thanks yuxia for the FLIP! It looks really good!
>
> I have three comments:
>
> ## RowLevelDeleteMode
>
> Can RowLevelDeleteMode be a higher level?
> `SupportsRowLevelDelete.RowLevelDeleteMode` is better than
> `SupportsRowLevelDelete.RowLevelDeleteInfo.RowLevelDeleteMode`.
> Same as `RowLevelUpdateMode`.
>
> ## Scope of addContextParameter
>
> I see that some of your comments are for sink, but can you make it
> clearer here? What exactly is its scope? For example, is it possible
> to see all the other sources and sinks in a topo? What is the order of
> seeing?
>
> ## Scope of getScanPurpose
>
> Will all sources see this method? Will there be compatibility issues?
> If sources ignore this method, will this cause strange phenomena?
>
> What I mean is: should another SupportsXX be created here to provide
> delete and update.
>
> Best,
> Jingsong
>
> On Thu, Dec 29, 2022 at 6:23 PM yuxia  wrote:
> >
> > Hi, Lincoln Lee;
> > 1: Yes,  it's a typo; Thanks for pointing out. I have fixed the typo.
> > 2: For stream users,  assuming for delete, they will receive 
> > TableException("DELETE TABLE is not supported for streaming mode now"); 
> > Update is similar. I also update them to the FLIP.
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Lincoln Lee" 
> > 收件人: "dev" 
> > 发送时间: 星期三, 2022年 12 月 28日 上午 9:50:50
> > 主题: Re: [DISCUSS] FLIP-282: Introduce Delete & Update API
> >
> > Hi yuxia,
> >
> > Thanks for the proposal! I think it'll be very useful for users in batch
> > scenarios to cooperate with external systems.
> >
> > For the flip I have two questions:
> > 1. Is it a typo the default method 'default ScanPurpose getScanPurpose();'
> > without implementation in interface ScanContext?
> > 2. For stream users, what exceptions will be received for this unsupported
> > operations?
> >
> > Best,
> > Lincoln Lee
> >
> >
> > yuxia  于2022年12月26日周一 

[jira] [Created] (FLINK-30594) Update Java version because of JDK bug in the operator

2023-01-06 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-30594:
-

 Summary: Update Java version because of JDK bug in the operator
 Key: FLINK-30594
 URL: https://issues.apache.org/jira/browse/FLINK-30594
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Gabor Somogyi


The following JDK bug is found during operator usage: 
https://bugs.openjdk.org/browse/JDK-8221218

This has been resolved in 11.0.18 which should be used in the operator base 
image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30595) support create table like

2023-01-06 Thread Jun Zhang (Jira)
Jun Zhang created FLINK-30595:
-

 Summary: support create table like
 Key: FLINK-30595
 URL: https://issues.apache.org/jira/browse/FLINK-30595
 Project: Flink
  Issue Type: Improvement
Reporter: Jun Zhang
 Fix For: table-store-0.4.0


support CREATE TABLE LIKE 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Allow source readers extending SourceReaderBase to override numRecordsIn report logic

2023-01-06 Thread John Roesler
Thanks for the replies, Dong and Wencong!

That’s a good point about the overhead of the extra method.

Is the desire to actually deprecate that metric in a user-facing way, or just 
to deprecate the private counter mechanism?

It seems like if the desire is to deprecate the existing private counter, we 
can accomplish it by deprecating the current constructor and offering another 
that is documented not to track the metric. This seems better than the config 
option, since the concern is purely about the division of responsibilities 
between the sub- and super-class. 

Another option, which might be better if we wish to keep a uniformly named 
metric, would be to simply make the counter protected. That would be better if 
we’re generally happy with the metric and counter, but a few special source 
connectors need to emit records in other ways. 

And finally, if we really want to get rid of the metric itself, then I agree, a 
config is the way to do it. 

Thanks,
John

On Fri, Jan 6, 2023, at 00:55, Dong Lin wrote:
> Hi John and Wencong,
>
> Thanks for the reply!
>
> It is nice that optional-2 can address the problem without affecting the
> existing source connectors as far as functionality is concerned. One
> potential concern with this approach is that it might increase the Flink
> runtime overhead by adding one more virtual functional call to the
> per-record runtime call stack.
>
> Since Java's default MaxInlineLevel is 12-18, I believe it is easy for an
> operator chain of 5+ operators to exceed this limit. In this case. And
> option-2 would incur one more virtual table lookup to produce each record.
> It is not clear how much this overhead would show up for jobs with a chain
> of lightweight operators. I am recently working on FLINK-30531
>  to reduce runtime
> overhead which might be related to this discussion.
>
> In comparison to option-2, the option-3 provided in my earlier email would
> not add this extra overhead. I think it might be worthwhile to invest in
> the long-term performance (and simpler runtime infra) and pay for the
> short-term cost of deprecating this metric in SourceOperatorBase. What do
> you think?
>
> Regards,
> Dong
>
>
> On Thu, Jan 5, 2023 at 10:10 PM Wencong Liu  wrote:
>
>> Hi, All
>>
>>
>> Thanks for the reply!
>>
>>
>> I think both John and Dong's opinions are reasonable. John's Suggestion 2
>> is a good implementation.
>> It does not affect the existing source connectors, but also provides
>> support
>> for custom counter in the future implementation.
>>
>>
>> WDYT?
>>
>>
>> Best,
>>
>> Wencong Liu


[jira] [Created] (FLINK-30596) Multiple POST /jars/:jarid/run requests with the same jobId, runs duplicate jobs

2023-01-06 Thread Mohsen Rezaei (Jira)
Mohsen Rezaei created FLINK-30596:
-

 Summary: Multiple POST /jars/:jarid/run requests with the same 
jobId, runs duplicate jobs
 Key: FLINK-30596
 URL: https://issues.apache.org/jira/browse/FLINK-30596
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.15.3, 1.16.0, 1.17.0
Reporter: Mohsen Rezaei
 Fix For: 1.17.0, 1.16.1, 1.15.4


Analysis from [~trohrmann]:

{quote}
The problem is the following: When submitting a job, then the {{Dispatcher}} 
will wait for the termination of a previous {{JobMaster}}. This is done to 
enable the proper cleanup of the job resources. In the initial submission case, 
there is no previous {{JobMaster}} with the same {{jobId}}. The problem is now 
that Flink schedules the 
[{{persistAndRunJob}}|https://github.com/apache/flink/blob/5f924bc84227a3a6c67b44e82c45fe444393f577/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L571]
 action, which runs the newly submitted job, as [an asynchronous 
task|https://github.com/apache/flink/blob/5f924bc84227a3a6c67b44e82c45fe444393f577/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L1312-L1318].
 This is done to ensure that the action is run on the {{Dispatcher}}'s main 
thread since the termination future can be run on a different thread. Due to 
this behaviour, there can be other tasks enqueued in the {{Dispatcher}}'s work 
queue which are executed before. Such a task could be another job submission 
which wouldn't see that there is already a job submitted with the same 
{{jobId}} since [we only do this in 
{{runJob}}|https://github.com/apache/flink/blob/5f924bc84227a3a6c67b44e82c45fe444393f577/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L602]
 which is called by {{persistAndRunJob}}. This is the reason why you don't see 
a duplicate job submission exception for the second job submission. Even worse, 
this will eventually [lead to an invalid 
state|https://github.com/apache/flink/blob/5f924bc84227a3a6c67b44e82c45fe444393f577/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L611-L615]
 and fail the whole cluster entrypoint.
{quote}

The following fix to the {{Dispatcher}} seems to fix the issue, but before 
submitting a PR, I wanted to post this for possible follow up discussions:

{code:language=java}
private CompletableFuture waitForTerminatingJob(
JobID jobId, JobGraph jobGraph, ThrowingConsumer 
action) {
...
return FutureUtils.thenAcceptAsyncIfNotDone(
jobManagerTerminationFuture,
getMainThreadExecutor(),
FunctionUtils.uncheckedConsumer(
(ignored) -> {
jobManagerRunnerTerminationFutures.remove(jobId);
action.accept(jobGraph);
}));
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)