[jira] [Created] (FLINK-19948) CAST(now() as bigint) throws compile exception

2020-11-03 Thread Jark Wu (Jira)
Jark Wu created FLINK-19948:
---

 Summary: CAST(now() as bigint) throws compile exception
 Key: FLINK-19948
 URL: https://issues.apache.org/jira/browse/FLINK-19948
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Jark Wu
 Fix For: 1.12.0


The following test code in {{ScalarOperatorsTest}} will fail with a compile 
exception

{code:scala}
testSqlApi("CAST(NOW() AS BIGINT)", "??")
{code}

{code}
java.lang.RuntimeException: Could not instantiate generated class 
'TestFunction$24'

at 
org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:57)
at 
org.apache.flink.table.planner.expressions.utils.ExpressionTestBase.evaluateExprs(ExpressionTestBase.scala:143)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at 
com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
Caused by: org.apache.flink.util.FlinkRuntimeException: 
org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
compiled. This is a bug. Please file an issue.
at 
org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:68)
at 
org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:78)
at 
org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:52)
... 25 more
Caused by: 
org.apache.flink.shaded.guava18.com.google.common.util.concurrent.UncheckedExecutionException:
 org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
compiled. This is a bug. Please file an issue.
at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
at 
org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:66)
... 27 more
Caused by: org.apache.flink.api.common.InvalidProgramException: Table program 
cannot be compiled. This is a bug. Please file an issue.
at 
org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:81)
at 
org.apache.flink.table.runtime.generated.CompileUtils.lambda$compile$1(CompileUtils.java:66)
at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.get(Local

[jira] [Created] (FLINK-19949) Unescape CSV format line delimiter character

2020-11-03 Thread Danny Chen (Jira)
Danny Chen created FLINK-19949:
--

 Summary: Unescape CSV format line delimiter character
 Key: FLINK-19949
 URL: https://issues.apache.org/jira/browse/FLINK-19949
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.11.2
Reporter: Danny Chen
 Fix For: 1.12.0


We should unescape the line delimiter characters first because the DDL can be 
read from a file. So that the new line "\n" in the DDL options was recognized 
as 2 characters.

While what user want is actually the invisible new line character.

{code:sql}
create table t1(
  ...
) with (
  'format' = 'csv',
  'csv.line-delimiter' = '\n'
  ...
)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Dependency injection and flink.

2020-11-03 Thread Arvid Heise
I'm answering your user ML mail. Please only ask your question once.

Dev ML is only used to coordinate Flink development, which kinda makes
sense in your case after we verified in your first thread that indeed Flink
should provide such things.

On Tue, Nov 3, 2020 at 3:14 AM santhosh venkat 
wrote:

> Hi,
>
> I'm trying to integrate a dependency injection framework with flink. When I
> searched the user-mailing list, I found the following thread in flink which
> discussed this in the past:
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Dependency-Injection-and-Flink-td18880.html
>
> Since the thread is around ~2 yrs old, I'm creating this request.
>
> 1. How do we expect users to integrate flink with a dependency injection
> framework. Are there any hooks/entry-points that we can use to seamlessly
> integrate a DI-fwk with flink? What does the community recommend for the
> dependency injection integration?
>
> 2. Would it be possible to create the DI objects(say spring objects) at a
> flink-task scope ? Or all these objects(say spring) from a dependency
> injection fwk are expected to be created at a process(JM/TM) level?
>
> Can someone please help answer the above questions and help me understand
> the flink-guarantees better. Any help would be greatly appreciated.
>
> Thanks.
>


-- 

Arvid Heise | Senior Java Developer



Follow us @VervericaData

--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng


[jira] [Created] (FLINK-19950) LookupJoin can not support view or subquery and so on. o

2020-11-03 Thread jackylau (Jira)
jackylau created FLINK-19950:


 Summary: LookupJoin can not support view or subquery and so on. o
 Key: FLINK-19950
 URL: https://issues.apache.org/jira/browse/FLINK-19950
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.11.0
Reporter: jackylau
 Fix For: 1.12.0


{code:java}
// code placeholder
val sql0 = "create view v1 AS SELECT * FROM user_table"

val sql = "SELECT T.id, T.len, T.content, D.name FROM src AS T JOIN v1 " +
  "for system_time as of T.proctime AS D ON T.id = D.id"

val sink = new TestingAppendSink
tEnv.executeSql(sql0)
tEnv.sqlQuery(sql).toAppendStream[Row].addSink(sink)
env.execute()
{code}
{code:java}
// code placeholder
private void convertTemporalTable(Blackboard bb, SqlCall call) {
  final SqlSnapshot snapshot = (SqlSnapshot) call;
  final RexNode period = bb.convertExpression(snapshot.getPeriod());

  // convert inner query, could be a table name or a derived table
  SqlNode expr = snapshot.getTableRef();
  convertFrom(bb, expr);
  final TableScan scan = (TableScan) bb.root;

  final RelNode snapshotRel = relBuilder.push(scan).snapshot(period).build();

  bb.setRoot(snapshotRel, false);
}

{code}
it will exist cast Exception at final TableScan scan = (TableScan) bb.root;

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS]FLIP-150: Introduce Hybrid Source

2020-11-03 Thread Nicholas Jiang
Hi devs,

I'd like to start a new FLIP to introduce the Hybrid Source. The hybrid
source is a source that contains a list of concrete sources. The hybrid
source reads from each contained source in the defined order. It switches
from source A to the next source B when source A finishes.

In practice, many Flink jobs need to read data from multiple sources in
sequential order. Change Data Capture (CDC) and machine learning feature
backfill are two concrete scenarios of this consumption pattern. Users may
have to either run two different Flink jobs or have some hacks in the
SourceFunction to address such use cases. 

To support above scenarios smoothly, the Flink jobs need to first read from
HDFS for historical data then switch to Kafka for real-time records. The
hybrid source has several benefits from the user's perspective:

- Switching among multiple sources is easy based on the switchable source
implementations of different connectors.
- This supports to automatically switching for user-defined switchable
source that constitutes hybrid source.
- There is complete and effective mechanism to support smooth source
migration between historical and real-time data.

Therefore, in this discussion, we propose to introduce a “Hybrid Source” API
built on top of the new Source API (FLIP-27) to help users to smoothly
switch sources. For more detail, please refer to the FLIP design doc[1].

I'm looking forward to your feedback.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source

  

Best,
Nicholas Jiang



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/


[jira] [Created] (FLINK-19951) PyFlink end-to-end test stuck in "Reading kafka messages"

2020-11-03 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-19951:
--

 Summary: PyFlink end-to-end test stuck in "Reading kafka messages"
 Key: FLINK-19951
 URL: https://issues.apache.org/jira/browse/FLINK-19951
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.12.0
Reporter: Robert Metzger


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=8837&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529

{code}
2020-11-03T08:18:10.2935249Z Nov 03 08:18:10 Test PyFlink DataStream job:
2020-11-03T08:18:10.2936216Z Nov 03 08:18:10 Preparing Kafka...
2020-11-03T08:18:10.2948091Z Nov 03 08:18:10 Downloading Kafka from 
https://archive.apache.org/dist/kafka/2.2.0/kafka_2.12-2.2.0.tgz
2020-11-03T08:18:10.3024006Z   % Total% Received % Xferd  Average Speed   
TimeTime Time  Current
2020-11-03T08:18:10.3024610Z  Dload  Upload   
Total   SpentLeft  Speed
2020-11-03T08:18:10.3024891Z 
2020-11-03T08:18:10.6563956Z   0 00 00 0  0  0 
--:--:-- --:--:-- --:--:-- 0
2020-11-03T08:18:11.6568328Z   0 54.3M0 327680 0  92275  0  
0:10:18 --:--:--  0:10:18 92044
2020-11-03T08:18:12.6540430Z  11 54.3M   11 6272k0 0  4626k  0  
0:00:12  0:00:01  0:00:11 4625k
2020-11-03T08:18:13.6585146Z  23 54.3M   23 12.6M0 0  5521k  0  
0:00:10  0:00:02  0:00:08 5521k
2020-11-03T08:18:14.6558377Z  36 54.3M   36 19.7M0 0  6018k  0  
0:00:09  0:00:03  0:00:06 6017k
2020-11-03T08:18:15.6593118Z  49 54.3M   49 26.7M0 0  6297k  0  
0:00:08  0:00:04  0:00:04 6297k
2020-11-03T08:18:16.653Z  62 54.3M   62 34.0M0 0  6515k  0  
0:00:08  0:00:05  0:00:03 6973k
2020-11-03T08:18:17.6544951Z  76 54.3M   76 41.8M0 0  6747k  0  
0:00:08  0:00:06  0:00:02 7322k
2020-11-03T08:18:18.2448109Z  91 54.3M   91 49.7M0 0  6923k  0  
0:00:08  0:00:07  0:00:01 7584k
2020-11-03T08:18:18.2450531Z 100 54.3M  100 54.3M0 0  7010k  0  
0:00:07  0:00:07 --:--:-- 7737k
2020-11-03T08:18:20.2751451Z Nov 03 08:18:20 Zookeeper Server has been started 
...
2020-11-03T08:18:22.0064118Z Nov 03 08:18:22 Waiting for broker...
2020-11-03T08:18:25.4758082Z Nov 03 08:18:25 Created topic 
test-python-data-stream-source.
2020-11-03T08:18:25.8324767Z Nov 03 08:18:25 Sending messages to Kafka...
2020-11-03T08:18:35.2954788Z Nov 03 08:18:35 >>Created topic 
test-python-data-stream-sink.
2020-11-03T08:18:54.8314099Z Nov 03 08:18:54 Job has been submitted with JobID 
1b0c317b47c69ee600937e1715ad9cce
2020-11-03T08:18:54.8348757Z Nov 03 08:18:54 Reading kafka messages...
2020-11-03T08:53:10.5246998Z 
==
2020-11-03T08:53:10.5249381Z === WARNING: This E2E Run took already 80% of the 
allocated time budget of 250 minutes ===
2020-11-03T08:53:10.5251343Z 
==
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19952) flink SecurityOptions.class use depracted method so mant times

2020-11-03 Thread jackylau (Jira)
jackylau created FLINK-19952:


 Summary: flink SecurityOptions.class use depracted method so mant 
times
 Key: FLINK-19952
 URL: https://issues.apache.org/jira/browse/FLINK-19952
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.11.0
Reporter: jackylau
 Fix For: 1.12.0


we should use it in this way
{code:java}
// code placeholder
public static final ConfigOption KERBEROS_LOGIN_USETICKETCACHE =
 key("security.kerberos.login.use-ticket-cache")
 .booleanType()
 .defaultValue(true)
 .withDescription("Indicates whether to read from your Kerberos ticket cache.");
{code}
instead of 
{code:java}
public static final ConfigOption KERBEROS_LOGIN_USETICKETCACHE = 
key("security.kerberos.login.use-ticket-cache")
.defaultValue(true) 
.withDescription("Indicates whether to read from your Kerberos ticket cache.");
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing StateFun hotfix version 2.2.1

2020-11-03 Thread Igal Shilman
Hi Gordon,
Thanks for driving this discussion!

I would go with the second suggestion - having two consecutive StateFun
releases 2.2.1 and 2.2.2, since the Flink-1.11.3 release
might take a while, and this hot-fix release is important enough to get out
as early as possible.

Cheers,
Igal.




On Mon, Nov 2, 2020 at 11:43 AM Tzu-Li (Gordon) Tai 
wrote:

> Hi,
>
> We’re currently thinking about releasing StateFun 2.2.1, to address a
> critical bug that causes restores from checkpoints / savepoints to fail
> under certain circumstances [1].
>
> To provide a bit more context, the full fix for this issue is two-fold:
>
>1. *Fix restoring from checkpoints / savepoints taken with the same
>StateFun version:* this has already been fixed in StateFun, with
>changes backported to `flink-statefun/release-2.2`.
>2. *Allow restoring from older savepoints taken with StateFun <=
>2.2.0:* this requires a few fixes to Flink around restoring heap-based
>timers [2] and iterating through key groups in restored raw keyed state
>streams [3]. These fixes will be included in Flink 1.11.3 [4], meaning that
>to fix this, StateFun will need to wait until Flink 1.11.3 is out and
>upgrade its Flink dependency.
>
> The main discussion point here is whether or not it makes sense for
> StateFun 2.2.1 to wait for Flink 1.11.3, so that both parts of the problems
> 1) and 2) can be solved together in a single hotfix release.
>
> The other option is to release StateFun 2.2.1 already with fixes for
> problem 1) only, and have another follow-up hotfix release 2.2.2 after
> Flink 1.11.3 is available.
>
> I propose to keep a close eye on the progress of Flink 1.11.3 (you can
> track progress on the 1.11.3 discussion thread [4]), and *make a decision
> here mid-week on Wednesday, Nov. 4th*.
> If by then we decide to not let StateFun 2.2.1 wait for Flink 1.11.3
> because it could take a while, we can start with a StateFun 2.2.1 RC right
> away; otherwise, if Flink 1.11.3 seems to be just around the corner, we can
> wait for a few more days.
>
> What do you think?
>
> Cheers,
> Gordon
>
> [1] https://issues.apache.org/jira/browse/FLINK-19692
> [2] https://github.com/apache/flink/pull/13761
> [3] https://github.com/apache/flink/pull/13772
> [4]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Apache-Flink-1-11-3-td45989.html
>


Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-11-03 Thread Stephan Ewen
@Steven would it be possible to initially copy some of the code into the
iceberg source and later replace it by a dependency on the Flink file
source?

On Mon, Nov 2, 2020 at 8:33 PM Steven Wu  wrote:

> Stephan, thanks a lot for explaining the file connector. that makes sense.
>
> I was asking because we were trying to reuse some of the implementations in
> the file source for Iceberg source. Flink Iceberg source lives in the
> Iceberg repo, which is not possible to code against the master branch of
> the Flink code.
>
> On Mon, Nov 2, 2020 at 3:31 AM Stephan Ewen  wrote:
>
> > Hi Steven!
> >
> > So far there are no plans to pick back the file system connector code.
> This
> > is still evolving and not finalized for 1.12, so I don't feel it is a
> good
> > candidate to be backported.
> > However, with the base connector changes backported, you should be able
> to
> > run the file connector code from master against 1.11.3.
> >
> > The collect() utils can be picked back, I see no issue with that (it is
> > isolated utilities).
> >
> > Best,
> > Stephan
> >
> >
> > On Mon, Nov 2, 2020 at 3:02 AM Steven Wu  wrote:
> >
> > > Basically, it would be great to get the latest code in the
> > > flink-connector-files (FLIP-27).
> > >
> > > On Sat, Oct 31, 2020 at 9:57 AM Steven Wu 
> wrote:
> > >
> > > > Stephan, it will be great if we can also backport the DataStreamUtils
> > > > related commits that help with collecting output from unbounded
> > streams.
> > > > e.g.
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/commit/09a7a66b7313fea64817fe960a8da1265b428efc
> > > >
> > > > I tried to copy and paste the code to unblock myself. but it quickly
> > got
> > > > into the rabbit hole of more and more code.
> > > >
> > > > On Fri, Oct 30, 2020 at 11:02 AM Stephan Ewen 
> > wrote:
> > > >
> > > >> I have started with backporting the source API changes. Some minor
> > > >> conflicts to solve, will need a bit more to finish this.
> > > >>
> > > >> On Fri, Oct 30, 2020 at 7:25 AM Tzu-Li (Gordon) Tai <
> > > tzuli...@apache.org>
> > > >> wrote:
> > > >>
> > > >> > @Stephan Ewen 
> > > >> > Are there already plans or ongoing efforts for backporting the
> list
> > of
> > > >> > FLIP-27 changes that you posted?
> > > >> >
> > > >> > On Thu, Oct 29, 2020 at 7:08 PM Xintong Song <
> tonysong...@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> >> Hi folks,
> > > >> >>
> > > >> >> Just to provide some updates concerning the status on the
> > > >> >> test instabilities.
> > > >> >>
> > > >> >> Currently, we have 30 unresolved tickets labeled with `Affects
> > > Version`
> > > >> >> 1.11.x.
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-19775?filter=12348580&jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20affectedVersion%20in%20(1.11.0%2C%201.11.1%2C%201.11.2%2C%201.11.3)%20AND%20labels%20%3D%20test-stability%20ORDER%20BY%20created%20DESC
> > > >> >>
> > > >> >> Among the 30 tickets, 11 of them are:
> > > >> >> - Have occured in the recent 3 months
> > > >> >> - Not confirmed to be pure testability issues
> > > >> >> - Not confirmed to be rare condition cases
> > > >> >>
> > > >> >> It would be nice if someone familiar with these components can
> > take a
> > > >> look
> > > >> >> into these issues.
> > > >> >>
> > > >> >> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6)
> > > >> >> - https://issues.apache.org/jira/browse/FLINK-17912 (Kafka)
> > > >> >> - https://issues.apache.org/jira/browse/FLINK-17949 (Kafka)
> > > >> >> ⁃ https://issues.apache.org/jira/browse/FLINK-18444 (Kafka)
> > > >> >> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka)
> > > >> >> - https://issues.apache.org/jira/browse/FLINK-18648 (Kafka)
> > > >> >> - https://issues.apache.org/jira/browse/FLINK-18807 (Kafka)
> > > >> >> - https://issues.apache.org/jira/browse/FLINK-19369
> > (BlobClientTest)
> > > >> >> - https://issues.apache.org/jira/browse/FLINK-19436 (TPCDS)
> > > >> >> - https://issues.apache.org/jira/browse/FLINK-19690
> > (Format/Parquet)
> > > >> >> - https://issues.apache.org/jira/browse/FLINK-19775
> > > >> >> (SystemProcessingTimeServiceTest)
> > > >> >>
> > > >> >> Thank you~
> > > >> >>
> > > >> >> Xintong Song
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On Thu, Oct 29, 2020 at 10:21 AM Jingsong Li <
> > jingsongl...@gmail.com
> > > >
> > > >> >> wrote:
> > > >> >>
> > > >> >> > +1 to backport the FLIP-27 adjustments to 1.11.x.
> > > >> >> >
> > > >> >> > If possible, that would be great. Many people are looking
> forward
> > > to
> > > >> the
> > > >> >> > FLIP-27 interface, but they don't want to take the risk to
> > upgrade
> > > to
> > > >> >> 1.12
> > > >> >> > (And wait 1.12). After all, 1.11 is a relatively stable
> version.
> > > >> >> >
> > > >> >> > Best,
> > > >> >> > Jingsong
> > > >> >> >
> > > >> >> > On Thu, Oct 29, 2020 at 1:24 AM Stephan Ewen  >
> > > >> wrote:
> > > >> >> >
> > > >> >> > > Thanks for starting this

[jira] [Created] (FLINK-19953) Translation in docs/ops/memory/mem_setup.zh.md

2020-11-03 Thread Matthias (Jira)
Matthias created FLINK-19953:


 Summary: Translation in docs/ops/memory/mem_setup.zh.md
 Key: FLINK-19953
 URL: https://issues.apache.org/jira/browse/FLINK-19953
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation
Reporter: Matthias
 Fix For: 1.12.0


I updated the documentation in 
[c6043a5|https://github.com/apache/flink/commit/c6043a5f0e190a196d5407c4497f42fe9c905f33].
 The Chinese version {{docs/ops/memory/mem_setup.zh.md}} has some English 
sentences in it which needs to be translated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19954) Move execution deployment tracking logic from legacy EG code to SchedulerNG

2020-11-03 Thread Andrey Zagrebin (Jira)
Andrey Zagrebin created FLINK-19954:
---

 Summary: Move execution deployment tracking logic from legacy EG 
code to SchedulerNG
 Key: FLINK-19954
 URL: https://issues.apache.org/jira/browse/FLINK-19954
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Affects Versions: 1.12.0
Reporter: Andrey Zagrebin


FLINK-17075 introduced the execution state reconciliation between TM and JM. 
The reconciliation requires tracking of the execution deployment state. The 
tracking logic was added to the legacy code of EG state handling which is 
partially inactive as discussed in FLINK-19927. The recent state handling logic 
resides in the new SchedulerNG, currently DefaultScheduler.

We could reconsider how the execution tracking for reconciliation is integrated 
with the scheduling. I think the tracking logic could be moved from 
Execution#deploy and EG#notifyExecutionChange to either 
SchedulerNG#updateTaskExecutionState or DefaultScheduler#deployTaskSafe. The 
latter looks to me currently more natural. ExecutionVertexOperations.deploy 
could return submission future for deployment completion in 
ExecutionDeploymentTracker and Execution#getTerminalFuture to stop the 
tracking. This would be also easier to unit test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing StateFun hotfix version 2.2.1

2020-11-03 Thread Robert Metzger
Thanks a lot for starting this thread.
How many users are affected by the problem? Is it somebody else besides the
initial issue reporter?
If it is just one person, I would suggest to rather help pushing the 1.11.3
release over the line or work on more StateFun features ;)

On Tue, Nov 3, 2020 at 11:58 AM Igal Shilman  wrote:

> Hi Gordon,
> Thanks for driving this discussion!
>
> I would go with the second suggestion - having two consecutive StateFun
> releases 2.2.1 and 2.2.2, since the Flink-1.11.3 release
> might take a while, and this hot-fix release is important enough to get out
> as early as possible.
>
> Cheers,
> Igal.
>
>
>
>
> On Mon, Nov 2, 2020 at 11:43 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Hi,
> >
> > We’re currently thinking about releasing StateFun 2.2.1, to address a
> > critical bug that causes restores from checkpoints / savepoints to fail
> > under certain circumstances [1].
> >
> > To provide a bit more context, the full fix for this issue is two-fold:
> >
> >1. *Fix restoring from checkpoints / savepoints taken with the same
> >StateFun version:* this has already been fixed in StateFun, with
> >changes backported to `flink-statefun/release-2.2`.
> >2. *Allow restoring from older savepoints taken with StateFun <=
> >2.2.0:* this requires a few fixes to Flink around restoring heap-based
> >timers [2] and iterating through key groups in restored raw keyed
> state
> >streams [3]. These fixes will be included in Flink 1.11.3 [4],
> meaning that
> >to fix this, StateFun will need to wait until Flink 1.11.3 is out and
> >upgrade its Flink dependency.
> >
> > The main discussion point here is whether or not it makes sense for
> > StateFun 2.2.1 to wait for Flink 1.11.3, so that both parts of the
> problems
> > 1) and 2) can be solved together in a single hotfix release.
> >
> > The other option is to release StateFun 2.2.1 already with fixes for
> > problem 1) only, and have another follow-up hotfix release 2.2.2 after
> > Flink 1.11.3 is available.
> >
> > I propose to keep a close eye on the progress of Flink 1.11.3 (you can
> > track progress on the 1.11.3 discussion thread [4]), and *make a decision
> > here mid-week on Wednesday, Nov. 4th*.
> > If by then we decide to not let StateFun 2.2.1 wait for Flink 1.11.3
> > because it could take a while, we can start with a StateFun 2.2.1 RC
> right
> > away; otherwise, if Flink 1.11.3 seems to be just around the corner, we
> can
> > wait for a few more days.
> >
> > What do you think?
> >
> > Cheers,
> > Gordon
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-19692
> > [2] https://github.com/apache/flink/pull/13761
> > [3] https://github.com/apache/flink/pull/13772
> > [4]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Apache-Flink-1-11-3-td45989.html
> >
>


Re: [DISCUSS][docker] Adopt Jemalloc as default memory allocator for debian based Flink docker image

2020-11-03 Thread Xie Billy
Hi guys:
 refer:

https://stackoverflow.com/questions/13027475/cpu-and-memory-usage-of-jemalloc-as-compared-to-glibc-malloc
 code: https://github.com/jemalloc/jemalloc-experiments
  https://code.woboq.org/userspace/glibc/benchtests/




Best Regards!
Billy xie(谢志民)


On Fri, Oct 30, 2020 at 4:27 PM Yun Tang  wrote:

> Hi
>
> > Do you see a noticeable performance difference between the two?
> @ Stephan Ewen , as we already use jemalloc as default memory allocator in
> production, we do not have much experience to compare performace between
> glibc and jemalloc. And I did not take a look at the performance difference
> when I debug docker OOM. I'll have a try to run benchmark on docker with
> different allocators when I have time these days.
>
> @wang gang, yes, that is what I also observed when I pmap the memory when
> using glibc, many memory segments with 64MB size.
>
> @ Billy Xie, what kind of test case did you use? In my point of view,
> compared to who would use more memory in some cases, we should care more
> about who would behave under the max limit for most cases.
>
> Best
> Yun Tang
> 
> From: Xie Billy 
> Sent: Friday, October 30, 2020 8:46
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS][docker] Adopt Jemalloc as default memory allocator
> for debian based Flink docker image
>
> After long time  running test application(>96 hours) , glibc used less
> memory than jemalloc , almost half of jemalloc. Can refer this test case.
>
> Best Regards!
> Billy xie(谢志民)
>
>
> On Fri, Oct 30, 2020 at 12:38 AM 王刚  wrote:
>
> > we  also met glic memory leak .it appeared many 64 MB memory segment when
> > using pmap command . i am +1 on setting jemalloc as default
> >
> > 发送自autohome
> > 
> > 发件人: Yu Li 
> > 发送时间: 2020-10-30 00:15:12
> > 收件人: dev 
> > 抄送: n...@ververica.com 
> > 主题: Re: [DISCUSS][docker] Adopt Jemalloc as default memory allocator for
> > debian based Flink docker image
> >
> > Thanks for sharing more thoughts guys.
> >
> > My experience with malloc libraries are also limited, and as mentioned in
> > my previous email [1], my major concern comes from some online
> information
> > [2] which indicates in some cases jemalloc will consume as much as twice
> > the memory than glibc, which is a huge cost from my point of view.
> However,
> > one may also argue that this post was written in 2015 and things might
> have
> > been improved a lot ever since.
> >
> > Nevertheless, as Till and Yun pointed out, I could also see the benefits
> of
> > setting jemalloc as default, so I don't have strong objections here. But
> > this actually is a change of the default memory allocator of our docker
> > image (previously glibc is the only choice) which definitely worth a big
> > release note.
> >
> > Best Regards,
> > Yu
> >
> > [1]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-docker-Adopt-Jemalloc-as-default-memory-allocator-for-debian-based-Flink-docker-image-tp45643p45692.html
> > [2] https://stackoverflow.com/a/33993215
> >
> >
> > On Thu, 29 Oct 2020 at 20:55, Stephan Ewen  > se...@apache.org>> wrote:
> >
> > > I think Till has a good point. The default should work well, because
> very
> > > few users will ever change it. Even on OOMs.
> > > So I am slightly in favor of going with jemalloc.
> > >
> > > @Yun Tang - Do you see a noticeable performance difference between the
> > two?
> > >
> > > On Thu, Oct 29, 2020 at 12:32 PM Yun Tang  > myas...@live.com>> wrote:
> > >
> > > > Hi
> > > >
> > > > I think Till's view deserves wider discussion.
> > > > I want to give my two cents when I debug with Nico on his reported
> > > RocksDB
> > > > OOM problem.
> > > > Jemalloc has the mechanism to profile memory allocation [1] which is
> > > > widely used to analysis memory leak.
> > > > Once we set jemalloc as default memory allocator, the frequency of
> OOM
> > > > behavior decreases obviously.
> > > >
> > > > Considering the OOM killed problem in k8s, change default memory
> > > allocator
> > > > as jemalloc could be something beneficial.
> > > >
> > > > [1]
> > https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-Profiling
> > > >
> > > > Best
> > > > Yun Tang
> > > >
> > > > 
> > > > From: Till Rohrmann  trohrm...@apache.org
> > >>
> > > > Sent: Thursday, October 29, 2020 18:34
> > > > To: dev mailto:dev@flink.apache.org>>
> > > > Cc: Yun Tang mailto:myas...@live.com>>
> > > > Subject: Re: [DISCUSS][docker] Adopt Jemalloc as default memory
> > allocator
> > > > for debian based Flink docker image
> > > >
> > > > Hi Yu,
> > > >
> > > > I see your point why you are in favour of continuing using glibc.
> > > >
> > > > I honestly don't have a lot of experience with malloc libraries so I
> > can
> > > > only argue from a general perspective: We know that glibc has some
> > > problems
> > > > wrt to memory fragmentation which can cause processes to exceed its
> 

Re: [DISCUSS] Releasing StateFun hotfix version 2.2.1

2020-11-03 Thread Tzu-Li (Gordon) Tai
Hi Robert,

So far we've only seen a single user report the issue, but the severity of
FLINK-19692 is actually pretty huge.
TL;DR: If a checkpoint / savepoint that contains feedback events (which is
considered normal under typical StateFun operations) is attempted to be
restored from, the restore would always fail.

That's why we came up with the discussion to potentially release a
"partial" solution with StateFun 2.2.1 already so that at least there is a
StateFun release available that works properly with failure recoveries,
and then after that release another follow-up StateFun hotfix release
2.2.2, which would include Flink 1.11.3, to address the remaining part of
the problem.

BR,
Gordon

On Tue, Nov 3, 2020 at 9:33 PM Robert Metzger  wrote:

> Thanks a lot for starting this thread.
> How many users are affected by the problem? Is it somebody else besides
> the initial issue reporter?
> If it is just one person, I would suggest to rather help pushing the
> 1.11.3 release over the line or work on more StateFun features ;)
>
> On Tue, Nov 3, 2020 at 11:58 AM Igal Shilman  wrote:
>
>> Hi Gordon,
>> Thanks for driving this discussion!
>>
>> I would go with the second suggestion - having two consecutive StateFun
>> releases 2.2.1 and 2.2.2, since the Flink-1.11.3 release
>> might take a while, and this hot-fix release is important enough to get
>> out
>> as early as possible.
>>
>> Cheers,
>> Igal.
>>
>>
>>
>>
>> On Mon, Nov 2, 2020 at 11:43 AM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>> > Hi,
>> >
>> > We’re currently thinking about releasing StateFun 2.2.1, to address a
>> > critical bug that causes restores from checkpoints / savepoints to fail
>> > under certain circumstances [1].
>> >
>> > To provide a bit more context, the full fix for this issue is two-fold:
>> >
>> >1. *Fix restoring from checkpoints / savepoints taken with the same
>> >StateFun version:* this has already been fixed in StateFun, with
>> >changes backported to `flink-statefun/release-2.2`.
>> >2. *Allow restoring from older savepoints taken with StateFun <=
>> >2.2.0:* this requires a few fixes to Flink around restoring
>> heap-based
>> >timers [2] and iterating through key groups in restored raw keyed
>> state
>> >streams [3]. These fixes will be included in Flink 1.11.3 [4],
>> meaning that
>> >to fix this, StateFun will need to wait until Flink 1.11.3 is out and
>> >upgrade its Flink dependency.
>> >
>> > The main discussion point here is whether or not it makes sense for
>> > StateFun 2.2.1 to wait for Flink 1.11.3, so that both parts of the
>> problems
>> > 1) and 2) can be solved together in a single hotfix release.
>> >
>> > The other option is to release StateFun 2.2.1 already with fixes for
>> > problem 1) only, and have another follow-up hotfix release 2.2.2 after
>> > Flink 1.11.3 is available.
>> >
>> > I propose to keep a close eye on the progress of Flink 1.11.3 (you can
>> > track progress on the 1.11.3 discussion thread [4]), and *make a
>> decision
>> > here mid-week on Wednesday, Nov. 4th*.
>> > If by then we decide to not let StateFun 2.2.1 wait for Flink 1.11.3
>> > because it could take a while, we can start with a StateFun 2.2.1 RC
>> right
>> > away; otherwise, if Flink 1.11.3 seems to be just around the corner, we
>> can
>> > wait for a few more days.
>> >
>> > What do you think?
>> >
>> > Cheers,
>> > Gordon
>> >
>> > [1] https://issues.apache.org/jira/browse/FLINK-19692
>> > [2] https://github.com/apache/flink/pull/13761
>> > [3] https://github.com/apache/flink/pull/13772
>> > [4]
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Apache-Flink-1-11-3-td45989.html
>> >
>>
>


Re: [DISCUSS] Releasing StateFun hotfix version 2.2.1

2020-11-03 Thread Robert Metzger
Hi Gordon,
thanks a lot for this clarification.

In this case I would vote for releasing StateFun 2.2.1 asap and not wait
for 1.11.3.

Thanks a lot for your efforts!


On Tue, Nov 3, 2020 at 3:38 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi Robert,
>
> So far we've only seen a single user report the issue, but the severity of
> FLINK-19692 is actually pretty huge.
> TL;DR: If a checkpoint / savepoint that contains feedback events (which is
> considered normal under typical StateFun operations) is attempted to be
> restored from, the restore would always fail.
>
> That's why we came up with the discussion to potentially release a
> "partial" solution with StateFun 2.2.1 already so that at least there is a
> StateFun release available that works properly with failure recoveries,
> and then after that release another follow-up StateFun hotfix release
> 2.2.2, which would include Flink 1.11.3, to address the remaining part of
> the problem.
>
> BR,
> Gordon
>
> On Tue, Nov 3, 2020 at 9:33 PM Robert Metzger  wrote:
>
>> Thanks a lot for starting this thread.
>> How many users are affected by the problem? Is it somebody else besides
>> the initial issue reporter?
>> If it is just one person, I would suggest to rather help pushing the
>> 1.11.3 release over the line or work on more StateFun features ;)
>>
>> On Tue, Nov 3, 2020 at 11:58 AM Igal Shilman  wrote:
>>
>>> Hi Gordon,
>>> Thanks for driving this discussion!
>>>
>>> I would go with the second suggestion - having two consecutive StateFun
>>> releases 2.2.1 and 2.2.2, since the Flink-1.11.3 release
>>> might take a while, and this hot-fix release is important enough to get
>>> out
>>> as early as possible.
>>>
>>> Cheers,
>>> Igal.
>>>
>>>
>>>
>>>
>>> On Mon, Nov 2, 2020 at 11:43 AM Tzu-Li (Gordon) Tai >> >
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > We’re currently thinking about releasing StateFun 2.2.1, to address a
>>> > critical bug that causes restores from checkpoints / savepoints to fail
>>> > under certain circumstances [1].
>>> >
>>> > To provide a bit more context, the full fix for this issue is two-fold:
>>> >
>>> >1. *Fix restoring from checkpoints / savepoints taken with the same
>>> >StateFun version:* this has already been fixed in StateFun, with
>>> >changes backported to `flink-statefun/release-2.2`.
>>> >2. *Allow restoring from older savepoints taken with StateFun <=
>>> >2.2.0:* this requires a few fixes to Flink around restoring
>>> heap-based
>>> >timers [2] and iterating through key groups in restored raw keyed
>>> state
>>> >streams [3]. These fixes will be included in Flink 1.11.3 [4],
>>> meaning that
>>> >to fix this, StateFun will need to wait until Flink 1.11.3 is out
>>> and
>>> >upgrade its Flink dependency.
>>> >
>>> > The main discussion point here is whether or not it makes sense for
>>> > StateFun 2.2.1 to wait for Flink 1.11.3, so that both parts of the
>>> problems
>>> > 1) and 2) can be solved together in a single hotfix release.
>>> >
>>> > The other option is to release StateFun 2.2.1 already with fixes for
>>> > problem 1) only, and have another follow-up hotfix release 2.2.2 after
>>> > Flink 1.11.3 is available.
>>> >
>>> > I propose to keep a close eye on the progress of Flink 1.11.3 (you can
>>> > track progress on the 1.11.3 discussion thread [4]), and *make a
>>> decision
>>> > here mid-week on Wednesday, Nov. 4th*.
>>> > If by then we decide to not let StateFun 2.2.1 wait for Flink 1.11.3
>>> > because it could take a while, we can start with a StateFun 2.2.1 RC
>>> right
>>> > away; otherwise, if Flink 1.11.3 seems to be just around the corner,
>>> we can
>>> > wait for a few more days.
>>> >
>>> > What do you think?
>>> >
>>> > Cheers,
>>> > Gordon
>>> >
>>> > [1] https://issues.apache.org/jira/browse/FLINK-19692
>>> > [2] https://github.com/apache/flink/pull/13761
>>> > [3] https://github.com/apache/flink/pull/13772
>>> > [4]
>>> >
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Apache-Flink-1-11-3-td45989.html
>>> >
>>>
>>


[jira] [Created] (FLINK-19955) Document cross-version compatibility

2020-11-03 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-19955:
--

 Summary: Document cross-version compatibility
 Key: FLINK-19955
 URL: https://issues.apache.org/jira/browse/FLINK-19955
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Runtime / Coordination
Affects Versions: 1.12.0
Reporter: Robert Metzger


A user rightfully asked on the list, what Flink guarantees cross-minor 
versions: 
https://lists.apache.org/thread.html/r29b3600848ff3e2966a83b6bdcd7b3fe2835be62a2cd44ddfcf31881%40%3Cuser.flink.apache.org%3E

I'd expect such information on this documentation page: 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-11-03 Thread Steven Wu
@Stephan Ewen  yeah, we can do that. don't worry about
it. your earlier email had the perfect explanation on why file source
shouldn't be backported.

On Tue, Nov 3, 2020 at 3:37 AM Stephan Ewen  wrote:

> @Steven would it be possible to initially copy some of the code into the
> iceberg source and later replace it by a dependency on the Flink file
> source?
>
> On Mon, Nov 2, 2020 at 8:33 PM Steven Wu  wrote:
>
> > Stephan, thanks a lot for explaining the file connector. that makes
> sense.
> >
> > I was asking because we were trying to reuse some of the implementations
> in
> > the file source for Iceberg source. Flink Iceberg source lives in the
> > Iceberg repo, which is not possible to code against the master branch of
> > the Flink code.
> >
> > On Mon, Nov 2, 2020 at 3:31 AM Stephan Ewen  wrote:
> >
> > > Hi Steven!
> > >
> > > So far there are no plans to pick back the file system connector code.
> > This
> > > is still evolving and not finalized for 1.12, so I don't feel it is a
> > good
> > > candidate to be backported.
> > > However, with the base connector changes backported, you should be able
> > to
> > > run the file connector code from master against 1.11.3.
> > >
> > > The collect() utils can be picked back, I see no issue with that (it is
> > > isolated utilities).
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Mon, Nov 2, 2020 at 3:02 AM Steven Wu  wrote:
> > >
> > > > Basically, it would be great to get the latest code in the
> > > > flink-connector-files (FLIP-27).
> > > >
> > > > On Sat, Oct 31, 2020 at 9:57 AM Steven Wu 
> > wrote:
> > > >
> > > > > Stephan, it will be great if we can also backport the
> DataStreamUtils
> > > > > related commits that help with collecting output from unbounded
> > > streams.
> > > > > e.g.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/commit/09a7a66b7313fea64817fe960a8da1265b428efc
> > > > >
> > > > > I tried to copy and paste the code to unblock myself. but it
> quickly
> > > got
> > > > > into the rabbit hole of more and more code.
> > > > >
> > > > > On Fri, Oct 30, 2020 at 11:02 AM Stephan Ewen 
> > > wrote:
> > > > >
> > > > >> I have started with backporting the source API changes. Some minor
> > > > >> conflicts to solve, will need a bit more to finish this.
> > > > >>
> > > > >> On Fri, Oct 30, 2020 at 7:25 AM Tzu-Li (Gordon) Tai <
> > > > tzuli...@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > @Stephan Ewen 
> > > > >> > Are there already plans or ongoing efforts for backporting the
> > list
> > > of
> > > > >> > FLIP-27 changes that you posted?
> > > > >> >
> > > > >> > On Thu, Oct 29, 2020 at 7:08 PM Xintong Song <
> > tonysong...@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Hi folks,
> > > > >> >>
> > > > >> >> Just to provide some updates concerning the status on the
> > > > >> >> test instabilities.
> > > > >> >>
> > > > >> >> Currently, we have 30 unresolved tickets labeled with `Affects
> > > > Version`
> > > > >> >> 1.11.x.
> > > > >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-19775?filter=12348580&jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20affectedVersion%20in%20(1.11.0%2C%201.11.1%2C%201.11.2%2C%201.11.3)%20AND%20labels%20%3D%20test-stability%20ORDER%20BY%20created%20DESC
> > > > >> >>
> > > > >> >> Among the 30 tickets, 11 of them are:
> > > > >> >> - Have occured in the recent 3 months
> > > > >> >> - Not confirmed to be pure testability issues
> > > > >> >> - Not confirmed to be rare condition cases
> > > > >> >>
> > > > >> >> It would be nice if someone familiar with these components can
> > > take a
> > > > >> look
> > > > >> >> into these issues.
> > > > >> >>
> > > > >> >> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6)
> > > > >> >> - https://issues.apache.org/jira/browse/FLINK-17912 (Kafka)
> > > > >> >> - https://issues.apache.org/jira/browse/FLINK-17949 (Kafka)
> > > > >> >> ⁃ https://issues.apache.org/jira/browse/FLINK-18444 (Kafka)
> > > > >> >> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka)
> > > > >> >> - https://issues.apache.org/jira/browse/FLINK-18648 (Kafka)
> > > > >> >> - https://issues.apache.org/jira/browse/FLINK-18807 (Kafka)
> > > > >> >> - https://issues.apache.org/jira/browse/FLINK-19369
> > > (BlobClientTest)
> > > > >> >> - https://issues.apache.org/jira/browse/FLINK-19436 (TPCDS)
> > > > >> >> - https://issues.apache.org/jira/browse/FLINK-19690
> > > (Format/Parquet)
> > > > >> >> - https://issues.apache.org/jira/browse/FLINK-19775
> > > > >> >> (SystemProcessingTimeServiceTest)
> > > > >> >>
> > > > >> >> Thank you~
> > > > >> >>
> > > > >> >> Xintong Song
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >> On Thu, Oct 29, 2020 at 10:21 AM Jingsong Li <
> > > jingsongl...@gmail.com
> > > > >
> > > > >> >> wrote:
> > > > >> >>
> > > > >> >> > +1 to backport the FLIP-27 adjustments to 1.11.x.
> > > > >> >> >
> > > > >> >> > If possib

[DISCUSS] FLIP-151: Incremental snapshots for heap-based state backend

2020-11-03 Thread Khachatryan Roman
Hi devs,

I'd like to start a discussion of FLIP-151: Incremental snapshots for
heap-based state backend [1]

Heap backend, while being limited state sizes fitting into memory, also has
some advantages compared to RocksDB backend:
1. Serialization once per checkpoint, not per state modification. This
allows to “squash” updates to the same keys
2. Shorter synchronous phase (compared to RocksDB incremental)
3. No need for sorting and compaction, no IO amplification and JNI overhead
This can potentially give higher throughput and efficiency.

However, Heap backend currently lacks incremental checkpoints. This FLIP
aims to add initial support for them.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-151%3A+Incremental+snapshots+for+heap-based+state+backend


Any feedback highly appreciated.

Regards,
Roman


[jira] [Created] (FLINK-19956) $ does not work on variables without being qualified in Scala

2020-11-03 Thread Rex Remind (Jira)
Rex Remind created FLINK-19956:
--

 Summary: $ does not work on variables without being qualified in 
Scala
 Key: FLINK-19956
 URL: https://issues.apache.org/jira/browse/FLINK-19956
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.11.2
 Environment: MacOS
Reporter: Rex Remind


This does not compile:
{code:java}
val columnName = "bool_column"
table.filter($(column) === true)  {code}
 
{color:#33}This does:{color}
{code:java}
val columnName = "bool_column"
table.filter(Expressions.$(column) === true)    {code}
 
{color:#33}There's nothing obviously documented to using the later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19957) flink-sql-connector-hive incompatible with cdh6

2020-11-03 Thread Cheng Pan (Jira)
Cheng Pan created FLINK-19957:
-

 Summary: flink-sql-connector-hive incompatible with cdh6
 Key: FLINK-19957
 URL: https://issues.apache.org/jira/browse/FLINK-19957
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.11.2
 Environment: Flink 1.11.2

Hadoop 3.0.0-cdh6.3.1

Hive 2.1.1-cdh6.3.1
Reporter: Cheng Pan


According to Flink docs, we should use flink-sql-connector-hive-2.2.0(which 
should compatible with hive 2.0.0 - 2.2.0), actually, we got a exception: 
Unrecognized Hadoop major version number: 3.0.0-cdh6.3.1;

If use flink-sql-connector-hive-2.3.6 (which should compatible with 2.3.0 - 
2.3.6), encounter another exception: org.apache.thrift.TApplicationException: 
Invalid method name: 'get_table_req'

If copy flink-connector-hive_2.11-1.11.2.jar and hive-exec-2.1.1-cdh6.3.1.jar 
to flink/lib, not work again. Caused by: java.lang.ClassNotFoundException: 
com.facebook.fb303.FacebookService$Iface



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New Apache Flink Committer - Congxian Qiu

2020-11-03 Thread godfrey he
Congratulations! Congxian

Best,
Godfrey

Fabian Hueske  于2020年11月2日周一 下午7:00写道:

> Congrats Congxian!
>
> Cheers, Fabian
>
> Am Mo., 2. Nov. 2020 um 10:33 Uhr schrieb Yang Wang  >:
>
> > Congratulations Congxian!
> >
> > Best,
> > Yang
> >
> > Zhu Zhu  于2020年11月2日周一 下午5:14写道:
> >
> > > Congrats Congxian!
> > >
> > > Thanks,
> > > Zhu
> > >
> > > Pengcheng Liu  于2020年11月2日周一 下午5:01写道:
> > >
> > > > Congratulations Congxian!
> > > >
> > > > Matthias Pohl  于2020年11月2日周一 下午3:57写道:
> > > >
> > > > > Yup, congratulations Congxian!
> > > > >
> > > > > On Mon, Nov 2, 2020 at 8:46 AM Danny Chan 
> > > wrote:
> > > > >
> > > > > > Congrats, Doctor Qiu! Well deserved!
> > > > > >
> > > > > > Congxian Qiu  于2020年10月31日周六 下午9:45写道:
> > > > > >
> > > > > > > Thanks all for the support. It's a great honor for me.
> > > > > > >
> > > > > > > Best,
> > > > > > > Congxian
> > > > > > >
> > > > > > >
> > > > > > > Paul Lam  于2020年10月30日周五 下午3:18写道:
> > > > > > >
> > > > > > > > Congrats, Congxian! Well deserved!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Paul Lam
> > > > > > > >
> > > > > > > > > 2020年10月30日 15:12,Zhijiang  > > .INVALID>
> > > > > 写道:
> > > > > > > > >
> > > > > > > > > Congrats, Congxian!
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Congxian Qiu

2020-11-03 Thread Guowei Ma
Congratulations! Congxian

Best,
Guowei


On Wed, Nov 4, 2020 at 10:36 AM godfrey he  wrote:

> Congratulations! Congxian
>
> Best,
> Godfrey
>
> Fabian Hueske  于2020年11月2日周一 下午7:00写道:
>
> > Congrats Congxian!
> >
> > Cheers, Fabian
> >
> > Am Mo., 2. Nov. 2020 um 10:33 Uhr schrieb Yang Wang <
> danrtsey...@gmail.com
> > >:
> >
> > > Congratulations Congxian!
> > >
> > > Best,
> > > Yang
> > >
> > > Zhu Zhu  于2020年11月2日周一 下午5:14写道:
> > >
> > > > Congrats Congxian!
> > > >
> > > > Thanks,
> > > > Zhu
> > > >
> > > > Pengcheng Liu  于2020年11月2日周一 下午5:01写道:
> > > >
> > > > > Congratulations Congxian!
> > > > >
> > > > > Matthias Pohl  于2020年11月2日周一 下午3:57写道:
> > > > >
> > > > > > Yup, congratulations Congxian!
> > > > > >
> > > > > > On Mon, Nov 2, 2020 at 8:46 AM Danny Chan 
> > > > wrote:
> > > > > >
> > > > > > > Congrats, Doctor Qiu! Well deserved!
> > > > > > >
> > > > > > > Congxian Qiu  于2020年10月31日周六 下午9:45写道:
> > > > > > >
> > > > > > > > Thanks all for the support. It's a great honor for me.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Congxian
> > > > > > > >
> > > > > > > >
> > > > > > > > Paul Lam  于2020年10月30日周五 下午3:18写道:
> > > > > > > >
> > > > > > > > > Congrats, Congxian! Well deserved!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Paul Lam
> > > > > > > > >
> > > > > > > > > > 2020年10月30日 15:12,Zhijiang  > > > .INVALID>
> > > > > > 写道:
> > > > > > > > > >
> > > > > > > > > > Congrats, Congxian!
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-19958) Unified exception signature in Sink API

2020-11-03 Thread Guowei Ma (Jira)
Guowei Ma created FLINK-19958:
-

 Summary: Unified exception signature in Sink API
 Key: FLINK-19958
 URL: https://issues.apache.org/jira/browse/FLINK-19958
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream
Reporter: Guowei Ma


In the current Sink API some method does not throw any exception, which should 
throw intuitive for example `SinkWriter::write`. Some method throw the normal 
`Exception`, which might be too general.

 

So in this pr we want to change all the methods that needed throw exception 
with IOException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19959) Multiple input creation algorithm will deduce an incorrect input order if the inputs are related under PIPELINED shuffle mode

2020-11-03 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-19959:
---

 Summary: Multiple input creation algorithm will deduce an 
incorrect input order if the inputs are related under PIPELINED shuffle mode
 Key: FLINK-19959
 URL: https://issues.apache.org/jira/browse/FLINK-19959
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Caizhi Weng
 Fix For: 1.12.0


Consider the following SQL
{code:sql}
WITH
  T1 AS (SELECT x.a AS a, y.d AS b FROM y LEFT JOIN x ON y.d = x.a),
  T2 AS (SELECT a, b FROM (SELECT a, b FROM T1) UNION ALL (SELECT x.a AS a, x.b 
AS b FROM x))
SELECT * FROM T2 LEFT JOIN t ON T2.a = t.a
{code}

The multiple input creation algorithm will currently deduce the following plan 
under the PIPELINED shuffle mode:
{code}
MultipleInputNode(readOrder=[0,1,1,0], 
members=[\nNestedLoopJoin(joinType=[LeftOuterJoin], where=[=(a, a0)], 
select=[a, b, a0, b0, c], build=[right])\n:- Union(all=[true], union=[a, b])\n: 
 :- Calc(select=[a, CAST(d) AS b])\n:  :  +- 
NestedLoopJoin(joinType=[LeftOuterJoin], where=[=(d, a)], select=[d, a], 
build=[right])\n:  : :- [#3] Calc(select=[d])\n:  : +- [#4] 
Exchange(distribution=[broadcast])\n:  +- [#2] Calc(select=[a, b])\n+- [#1] 
Exchange(distribution=[broadcast])\n])
:- Exchange(distribution=[broadcast])
:  +- BoundedStreamScan(table=[[default_catalog, default_database, t]], 
fields=[a, b, c])
:- Calc(select=[a, b])
:  +- LegacyTableSourceScan(table=[[default_catalog, default_database, x, 
source: [TestTableSource(a, b, c, nx)]]], fields=[a, b, c, nx], reuse_id=[1])
:- Calc(select=[d])
:  +- LegacyTableSourceScan(table=[[default_catalog, default_database, y, 
source: [TestTableSource(d, e, f, ny)]]], fields=[d, e, f, ny])
+- Exchange(distribution=[broadcast])
   +- Calc(select=[a])
  +- Reused(reference_id=[1])
{code}

It's obvious that the 2nd and the 4th input for the multiple input node should 
have the same input priority, otherwise a deadlock will occur.

This is because the current algorithm fails to consider the case when the 
inputs are related out of the multiple input node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19960) Introduce PartitionFieldExtractor to extract partition field from split

2020-11-03 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-19960:


 Summary: Introduce PartitionFieldExtractor to extract partition 
field from split
 Key: FLINK-19960
 URL: https://issues.apache.org/jira/browse/FLINK-19960
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: 1.12.0


Filesystem: extract partition field from path

Hive: there is partition information in split



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-11-03 Thread Tzu-Li (Gordon) Tai
> The collect() utils can be picked back, I see no issue with that (it is
isolated utilities).

Just checking on all the requested backports mentioned in this thread, and
figuring out which ones seem to still be unassigned / open.

Is someone working on backporting
https://github.com/apache/flink/commit/09a7a66b7313fea64817fe960a8da1265b428efc
to release-1.11 at the moment?

On Wed, Nov 4, 2020 at 1:39 AM Steven Wu  wrote:

> @Stephan Ewen  yeah, we can do that. don't worry about
> it. your earlier email had the perfect explanation on why file source
> shouldn't be backported.
>
> On Tue, Nov 3, 2020 at 3:37 AM Stephan Ewen  wrote:
>
> > @Steven would it be possible to initially copy some of the code into the
> > iceberg source and later replace it by a dependency on the Flink file
> > source?
> >
> > On Mon, Nov 2, 2020 at 8:33 PM Steven Wu  wrote:
> >
> > > Stephan, thanks a lot for explaining the file connector. that makes
> > sense.
> > >
> > > I was asking because we were trying to reuse some of the
> implementations
> > in
> > > the file source for Iceberg source. Flink Iceberg source lives in the
> > > Iceberg repo, which is not possible to code against the master branch
> of
> > > the Flink code.
> > >
> > > On Mon, Nov 2, 2020 at 3:31 AM Stephan Ewen  wrote:
> > >
> > > > Hi Steven!
> > > >
> > > > So far there are no plans to pick back the file system connector
> code.
> > > This
> > > > is still evolving and not finalized for 1.12, so I don't feel it is a
> > > good
> > > > candidate to be backported.
> > > > However, with the base connector changes backported, you should be
> able
> > > to
> > > > run the file connector code from master against 1.11.3.
> > > >
> > > > The collect() utils can be picked back, I see no issue with that (it
> is
> > > > isolated utilities).
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > > >
> > > > On Mon, Nov 2, 2020 at 3:02 AM Steven Wu 
> wrote:
> > > >
> > > > > Basically, it would be great to get the latest code in the
> > > > > flink-connector-files (FLIP-27).
> > > > >
> > > > > On Sat, Oct 31, 2020 at 9:57 AM Steven Wu 
> > > wrote:
> > > > >
> > > > > > Stephan, it will be great if we can also backport the
> > DataStreamUtils
> > > > > > related commits that help with collecting output from unbounded
> > > > streams.
> > > > > > e.g.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/commit/09a7a66b7313fea64817fe960a8da1265b428efc
> > > > > >
> > > > > > I tried to copy and paste the code to unblock myself. but it
> > quickly
> > > > got
> > > > > > into the rabbit hole of more and more code.
> > > > > >
> > > > > > On Fri, Oct 30, 2020 at 11:02 AM Stephan Ewen 
> > > > wrote:
> > > > > >
> > > > > >> I have started with backporting the source API changes. Some
> minor
> > > > > >> conflicts to solve, will need a bit more to finish this.
> > > > > >>
> > > > > >> On Fri, Oct 30, 2020 at 7:25 AM Tzu-Li (Gordon) Tai <
> > > > > tzuli...@apache.org>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > @Stephan Ewen 
> > > > > >> > Are there already plans or ongoing efforts for backporting the
> > > list
> > > > of
> > > > > >> > FLIP-27 changes that you posted?
> > > > > >> >
> > > > > >> > On Thu, Oct 29, 2020 at 7:08 PM Xintong Song <
> > > tonysong...@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> >> Hi folks,
> > > > > >> >>
> > > > > >> >> Just to provide some updates concerning the status on the
> > > > > >> >> test instabilities.
> > > > > >> >>
> > > > > >> >> Currently, we have 30 unresolved tickets labeled with
> `Affects
> > > > > Version`
> > > > > >> >> 1.11.x.
> > > > > >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-19775?filter=12348580&jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20affectedVersion%20in%20(1.11.0%2C%201.11.1%2C%201.11.2%2C%201.11.3)%20AND%20labels%20%3D%20test-stability%20ORDER%20BY%20created%20DESC
> > > > > >> >>
> > > > > >> >> Among the 30 tickets, 11 of them are:
> > > > > >> >> - Have occured in the recent 3 months
> > > > > >> >> - Not confirmed to be pure testability issues
> > > > > >> >> - Not confirmed to be rare condition cases
> > > > > >> >>
> > > > > >> >> It would be nice if someone familiar with these components
> can
> > > > take a
> > > > > >> look
> > > > > >> >> into these issues.
> > > > > >> >>
> > > > > >> >> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6)
> > > > > >> >> - https://issues.apache.org/jira/browse/FLINK-17912 (Kafka)
> > > > > >> >> - https://issues.apache.org/jira/browse/FLINK-17949 (Kafka)
> > > > > >> >> ⁃ https://issues.apache.org/jira/browse/FLINK-18444 (Kafka)
> > > > > >> >> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka)
> > > > > >> >> - https://issues.apache.org/jira/browse/FLINK-18648 (Kafka)
> > > > > >> >> - https://issues.apache.org/jira/browse/FLINK-18807 (Kafka)
> > > > > >> >> - https://issues.apache.

Re: [DISCUSS][docker] Adopt Jemalloc as default memory allocator for debian based Flink docker image

2020-11-03 Thread Yun Tang
Hi @ Billy

I found there existed many benchmark case existed in the two repos, which 
benchmark case did you run?


Best
Yun Tang

From: Xie Billy 
Sent: Tuesday, November 3, 2020 22:08
To: dev@flink.apache.org 
Subject: Re: [DISCUSS][docker] Adopt Jemalloc as default memory allocator for 
debian based Flink docker image

Hi guys:
 refer:

https://stackoverflow.com/questions/13027475/cpu-and-memory-usage-of-jemalloc-as-compared-to-glibc-malloc
 code: https://github.com/jemalloc/jemalloc-experiments
  https://code.woboq.org/userspace/glibc/benchtests/




Best Regards!
Billy xie(谢志民)


On Fri, Oct 30, 2020 at 4:27 PM Yun Tang  wrote:

> Hi
>
> > Do you see a noticeable performance difference between the two?
> @ Stephan Ewen , as we already use jemalloc as default memory allocator in
> production, we do not have much experience to compare performace between
> glibc and jemalloc. And I did not take a look at the performance difference
> when I debug docker OOM. I'll have a try to run benchmark on docker with
> different allocators when I have time these days.
>
> @wang gang, yes, that is what I also observed when I pmap the memory when
> using glibc, many memory segments with 64MB size.
>
> @ Billy Xie, what kind of test case did you use? In my point of view,
> compared to who would use more memory in some cases, we should care more
> about who would behave under the max limit for most cases.
>
> Best
> Yun Tang
> 
> From: Xie Billy 
> Sent: Friday, October 30, 2020 8:46
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS][docker] Adopt Jemalloc as default memory allocator
> for debian based Flink docker image
>
> After long time  running test application(>96 hours) , glibc used less
> memory than jemalloc , almost half of jemalloc. Can refer this test case.
>
> Best Regards!
> Billy xie(谢志民)
>
>
> On Fri, Oct 30, 2020 at 12:38 AM 王刚  wrote:
>
> > we  also met glic memory leak .it appeared many 64 MB memory segment when
> > using pmap command . i am +1 on setting jemalloc as default
> >
> > 发送自autohome
> > 
> > 发件人: Yu Li 
> > 发送时间: 2020-10-30 00:15:12
> > 收件人: dev 
> > 抄送: n...@ververica.com 
> > 主题: Re: [DISCUSS][docker] Adopt Jemalloc as default memory allocator for
> > debian based Flink docker image
> >
> > Thanks for sharing more thoughts guys.
> >
> > My experience with malloc libraries are also limited, and as mentioned in
> > my previous email [1], my major concern comes from some online
> information
> > [2] which indicates in some cases jemalloc will consume as much as twice
> > the memory than glibc, which is a huge cost from my point of view.
> However,
> > one may also argue that this post was written in 2015 and things might
> have
> > been improved a lot ever since.
> >
> > Nevertheless, as Till and Yun pointed out, I could also see the benefits
> of
> > setting jemalloc as default, so I don't have strong objections here. But
> > this actually is a change of the default memory allocator of our docker
> > image (previously glibc is the only choice) which definitely worth a big
> > release note.
> >
> > Best Regards,
> > Yu
> >
> > [1]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-docker-Adopt-Jemalloc-as-default-memory-allocator-for-debian-based-Flink-docker-image-tp45643p45692.html
> > [2] https://stackoverflow.com/a/33993215
> >
> >
> > On Thu, 29 Oct 2020 at 20:55, Stephan Ewen  > se...@apache.org>> wrote:
> >
> > > I think Till has a good point. The default should work well, because
> very
> > > few users will ever change it. Even on OOMs.
> > > So I am slightly in favor of going with jemalloc.
> > >
> > > @Yun Tang - Do you see a noticeable performance difference between the
> > two?
> > >
> > > On Thu, Oct 29, 2020 at 12:32 PM Yun Tang  > myas...@live.com>> wrote:
> > >
> > > > Hi
> > > >
> > > > I think Till's view deserves wider discussion.
> > > > I want to give my two cents when I debug with Nico on his reported
> > > RocksDB
> > > > OOM problem.
> > > > Jemalloc has the mechanism to profile memory allocation [1] which is
> > > > widely used to analysis memory leak.
> > > > Once we set jemalloc as default memory allocator, the frequency of
> OOM
> > > > behavior decreases obviously.
> > > >
> > > > Considering the OOM killed problem in k8s, change default memory
> > > allocator
> > > > as jemalloc could be something beneficial.
> > > >
> > > > [1]
> > https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-Profiling
> > > >
> > > > Best
> > > > Yun Tang
> > > >
> > > > 
> > > > From: Till Rohrmann  trohrm...@apache.org
> > >>
> > > > Sent: Thursday, October 29, 2020 18:34
> > > > To: dev mailto:dev@flink.apache.org>>
> > > > Cc: Yun Tang mailto:myas...@live.com>>
> > > > Subject: Re: [DISCUSS][docker] Adopt Jemalloc as default memory
> > allocator
> > > > for debian based Flink docker image
> > > >
>

[jira] [Created] (FLINK-19961) Bash e2e utility run_test_with_timeout is (sometimes?) not properly stopping it's watchdog

2020-11-03 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-19961:
--

 Summary: Bash e2e utility run_test_with_timeout is (sometimes?) 
not properly stopping it's watchdog
 Key: FLINK-19961
 URL: https://issues.apache.org/jira/browse/FLINK-19961
 Project: Flink
  Issue Type: Bug
  Components: Test Infrastructure
Affects Versions: 1.12.0
Reporter: Robert Metzger


Symptoms: 
https://issues.apache.org/jira/browse/FLINK-19882?focusedCommentId=17223725&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17223725

Once a test has finished, the timeout watchdog should stop. However it seems 
that this doesn't always work. 
Maybe this is caused by the on_exit trap being overwritten by the test?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-11-03 Thread Tzu-Li (Gordon) Tai
Thanks for the efforts so far with pushing for 1.11.3.

I'd like to provide a mid-week update on how we're looking with all the
pending blockers and backports:

*Blockers**:*

Currently all blockers either have an open PR, or have been merged. I'll
highlight below the blockers *that still require reviewing efforts for them
to move forward:*

- [FLINK-19909] Flink application in attach mode could not terminate when
the only job is canceled. PR: https://github.com/apache/flink/pull/13911
- [FLINK-19717] SourceReaderBase.pollNext may return END_OF_INPUT if
SplitReader.fetch throws. PR: https://github.com/apache/flink/pull/13776

The above PRs currently have no reviews at all yet. They seem to already
have designated reviewers.


*Backports:*
- FLIP-27 robustness improvement backports: Stephan is currently working on
backporting several FLIP-27 changes. There are no PRs yet for the
backported changes.
- DataStreamUtils.collect() refactorings backport: Steven Wu mentioned to
backport this, but AFAIK this isn't assigned to anyone yet.

*ETAs*:

I'd like to request ETAs for the remaining backports, to prevent a creep in
the scope of this bugfix release.

We already have Flink users that would benefit from fixes that have been
merged to release-1.11 already, so technically speaking the backports
should be considered "nice-to-have" (to the best of my knowledge of the
changes) and could potentially be moved to a follow-up 1.11.4.
Most notably, the Stateful Functions project is already waiting on Flink
1.11.3 to address critical recovery issues (please see the StateFun 2.2.1
release discussion thread [1]).

@Stephan Ewen  @Becket Qin  could
you provide an ETA for the FLIP-27 backports? It would help to get a better
estimate to decide how we proceed here.

Cheers,
Gordon

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-StateFun-hotfix-version-2-2-1-td46239.html

On Wed, Nov 4, 2020 at 3:16 PM Tzu-Li (Gordon) Tai 
wrote:

> > The collect() utils can be picked back, I see no issue with that (it is
> isolated utilities).
>
> Just checking on all the requested backports mentioned in this thread, and
> figuring out which ones seem to still be unassigned / open.
>
> Is someone working on backporting
> https://github.com/apache/flink/commit/09a7a66b7313fea64817fe960a8da1265b428efc
> to release-1.11 at the moment?
>
> On Wed, Nov 4, 2020 at 1:39 AM Steven Wu  wrote:
>
>> @Stephan Ewen  yeah, we can do that. don't worry about
>> it. your earlier email had the perfect explanation on why file source
>> shouldn't be backported.
>>
>> On Tue, Nov 3, 2020 at 3:37 AM Stephan Ewen  wrote:
>>
>> > @Steven would it be possible to initially copy some of the code into the
>> > iceberg source and later replace it by a dependency on the Flink file
>> > source?
>> >
>> > On Mon, Nov 2, 2020 at 8:33 PM Steven Wu  wrote:
>> >
>> > > Stephan, thanks a lot for explaining the file connector. that makes
>> > sense.
>> > >
>> > > I was asking because we were trying to reuse some of the
>> implementations
>> > in
>> > > the file source for Iceberg source. Flink Iceberg source lives in the
>> > > Iceberg repo, which is not possible to code against the master branch
>> of
>> > > the Flink code.
>> > >
>> > > On Mon, Nov 2, 2020 at 3:31 AM Stephan Ewen  wrote:
>> > >
>> > > > Hi Steven!
>> > > >
>> > > > So far there are no plans to pick back the file system connector
>> code.
>> > > This
>> > > > is still evolving and not finalized for 1.12, so I don't feel it is
>> a
>> > > good
>> > > > candidate to be backported.
>> > > > However, with the base connector changes backported, you should be
>> able
>> > > to
>> > > > run the file connector code from master against 1.11.3.
>> > > >
>> > > > The collect() utils can be picked back, I see no issue with that
>> (it is
>> > > > isolated utilities).
>> > > >
>> > > > Best,
>> > > > Stephan
>> > > >
>> > > >
>> > > > On Mon, Nov 2, 2020 at 3:02 AM Steven Wu 
>> wrote:
>> > > >
>> > > > > Basically, it would be great to get the latest code in the
>> > > > > flink-connector-files (FLIP-27).
>> > > > >
>> > > > > On Sat, Oct 31, 2020 at 9:57 AM Steven Wu 
>> > > wrote:
>> > > > >
>> > > > > > Stephan, it will be great if we can also backport the
>> > DataStreamUtils
>> > > > > > related commits that help with collecting output from unbounded
>> > > > streams.
>> > > > > > e.g.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/commit/09a7a66b7313fea64817fe960a8da1265b428efc
>> > > > > >
>> > > > > > I tried to copy and paste the code to unblock myself. but it
>> > quickly
>> > > > got
>> > > > > > into the rabbit hole of more and more code.
>> > > > > >
>> > > > > > On Fri, Oct 30, 2020 at 11:02 AM Stephan Ewen > >
>> > > > wrote:
>> > > > > >
>> > > > > >> I have started with backporting the source API changes. Some
>> minor
>> > > > > >> conflicts to solve, will need a bit more to finish this.
>> > > > > >>
>> > > > >