Re: [ANNOUNCE] Apache Flink 1.10.1 released

2020-05-15 Thread Congxian Qiu
Thanks a lot for the release and your great job, Yu!
Also thanks to everyone who made this release possible!

Best,
Congxian


Yu Li  于2020年5月14日周四 上午1:59写道:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.10.1, which is the first bugfix release for the Apache Flink 1.10
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2020/05/12/release-1.10.1.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346891
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
> Yu
>


[jira] [Created] (FLINK-17719) Provide ChannelStateReader#hasStates for hints of reading channel states

2020-05-15 Thread Zhijiang (Jira)
Zhijiang created FLINK-17719:


 Summary: Provide ChannelStateReader#hasStates for hints of reading 
channel states
 Key: FLINK-17719
 URL: https://issues.apache.org/jira/browse/FLINK-17719
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing, Runtime / Task
Reporter: Zhijiang
Assignee: Zhijiang


Currently we rely on whether unaligned checkpoint is enabled to determine 
whether to read recovered states during task startup, then it will block the 
requirements of recovery from previous unaligned states even though the current 
mode is aligned.

We can make `ChannelStateReader` provide the hint whether there are any channel 
states to be read during startup, then we will never lose any chances to 
recover from them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17720) StreamSqlTests.test_execute_sql: "AssertionError: True is not false"

2020-05-15 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17720:
--

 Summary: StreamSqlTests.test_execute_sql: "AssertionError: True is 
not false"
 Key: FLINK-17720
 URL: https://issues.apache.org/jira/browse/FLINK-17720
 Project: Flink
  Issue Type: Bug
  Components: API / Python, Tests
Reporter: Robert Metzger


CI 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1347&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=455fddbf-5921-5b71-25ac-92992ad80b28

{code}
2020-05-15T02:05:53.8421159Z === FAILURES 
===
2020-05-15T02:05:53.8421748Z ___ 
StreamSqlTests.test_execute_sql 
2020-05-15T02:05:53.8422093Z 
2020-05-15T02:05:53.8422503Z self = 

2020-05-15T02:05:53.8427615Z 
2020-05-15T02:05:53.8428821Z def test_execute_sql(self):
2020-05-15T02:05:53.8429638Z t_env = self.t_env
2020-05-15T02:05:53.8429928Z table_result = t_env.execute_sql("create 
table tbl"
2020-05-15T02:05:53.8430212Z  "("
2020-05-15T02:05:53.8430522Z  "   a 
bigint,"
2020-05-15T02:05:53.8430850Z  "   b 
int,"
2020-05-15T02:05:53.8431308Z  "   c 
varchar"
2020-05-15T02:05:53.8431773Z  ") with ("
2020-05-15T02:05:53.8433165Z  "  
'connector' = 'COLLECTION',"
2020-05-15T02:05:53.8433861Z  "   
'is-bounded' = 'false'"
2020-05-15T02:05:53.8434298Z  ")")
2020-05-15T02:05:53.8434784Z 
self.assertIsNone(table_result.get_job_client())
2020-05-15T02:05:53.8435189Z 
self.assertIsNotNone(table_result.get_table_schema())
2020-05-15T02:05:53.8435594Z 
self.assertEquals(table_result.get_table_schema().get_field_names(), ["result"])
2020-05-15T02:05:53.8436100Z 
self.assertIsNotNone(table_result.get_result_kind())
2020-05-15T02:05:53.8436433Z 
self.assertEqual(table_result.get_result_kind(), ResultKind.SUCCESS)
2020-05-15T02:05:53.8437088Z table_result.print()
2020-05-15T02:05:53.8437368Z 
2020-05-15T02:05:53.8438070Z table_result = t_env.execute_sql("alter 
table tbl set ('k1' = 'a', 'k2' = 'b')")
2020-05-15T02:05:53.8438580Z 
self.assertIsNone(table_result.get_job_client())
2020-05-15T02:05:53.8439431Z 
self.assertIsNotNone(table_result.get_table_schema())
2020-05-15T02:05:53.8439954Z 
self.assertEquals(table_result.get_table_schema().get_field_names(), ["result"])
2020-05-15T02:05:53.8440385Z 
self.assertIsNotNone(table_result.get_result_kind())
2020-05-15T02:05:53.8440885Z 
self.assertEqual(table_result.get_result_kind(), ResultKind.SUCCESS)
2020-05-15T02:05:53.8441281Z table_result.print()
2020-05-15T02:05:53.8441648Z 
2020-05-15T02:05:53.8441871Z field_names = ["k1", "k2", "c"]
2020-05-15T02:05:53.8442175Z field_types = [DataTypes.BIGINT(), 
DataTypes.INT(), DataTypes.STRING()]
2020-05-15T02:05:53.8442496Z t_env.register_table_sink(
2020-05-15T02:05:53.8442705Z "sinks",
2020-05-15T02:05:53.8442986Z 
source_sink_utils.TestAppendSink(field_names, field_types))
2020-05-15T02:05:53.8443334Z table_result = t_env.execute_sql("insert 
into sinks select * from tbl")
2020-05-15T02:05:53.8443723Z 
self.assertIsNotNone(table_result.get_job_client())
2020-05-15T02:05:53.8444064Z 
self.assertIsNotNone(table_result.get_table_schema())
2020-05-15T02:05:53.8444393Z 
self.assertEquals(table_result.get_table_schema().get_field_names(),
2020-05-15T02:05:53.8444750Z   
["default_catalog.default_database.sinks"])
2020-05-15T02:05:53.8445068Z 
self.assertIsNotNone(table_result.get_result_kind())
2020-05-15T02:05:53.8445430Z 
self.assertEqual(table_result.get_result_kind(), 
ResultKind.SUCCESS_WITH_CONTENT)
2020-05-15T02:05:53.8445796Z job_status = 
table_result.get_job_client().get_job_status().result()
2020-05-15T02:05:53.8446266Z >   
self.assertFalse(job_status.is_globally_terminal_state())
2020-05-15T02:05:53.8446577Z E   AssertionError: True is not false
2020-05-15T02:05:53.8446750Z 
2020-05-15T02:05:53.8446965Z pyflink/table/tests/test_sql.py:98: AssertionError
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17721) AbstractHadoopFileSystemITTest .cleanupDirectoryWithRetry fails with AssertionError

2020-05-15 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17721:
--

 Summary: AbstractHadoopFileSystemITTest .cleanupDirectoryWithRetry 
fails with AssertionError 
 Key: FLINK-17721
 URL: https://issues.apache.org/jira/browse/FLINK-17721
 Project: Flink
  Issue Type: Bug
  Components: FileSystems, Tests
Reporter: Robert Metzger


CI: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1343&view=logs&j=961f8f81-6b52-53df-09f6-7291a2e4af6a&t=2f99feaa-7a9b-5916-4c1c-5e61f395079e

{code}
[ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 34.079 
s <<< FAILURE! - in org.apache.flink.fs.s3hadoop.HadoopS3FileSystemITCase
[ERROR] org.apache.flink.fs.s3hadoop.HadoopS3FileSystemITCase  Time elapsed: 
21.334 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.flink.runtime.fs.hdfs.AbstractHadoopFileSystemITTest.cleanupDirectoryWithRetry(AbstractHadoopFileSystemITTest.java:162)
at 
org.apache.flink.runtime.fs.hdfs.AbstractHadoopFileSystemITTest.teardown(AbstractHadoopFileSystemITTest.java:149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17722) "Failed to find the file" in "build_wheels" stage

2020-05-15 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17722:
--

 Summary: "Failed to find the file" in "build_wheels" stage
 Key: FLINK-17722
 URL: https://issues.apache.org/jira/browse/FLINK-17722
 Project: Flink
  Issue Type: Bug
  Components: API / Python, Build System / Azure Pipelines
Reporter: Robert Metzger
 Fix For: 1.11.0


CI 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1343&view=logs&j=fe7ebddc-3e2f-5c50-79ee-226c8653f218&t=b2830442-93c7-50ff-36f4-5b3e2dca8c83

{code}
Successfully built dill crcmod httplib2 hdfs oauth2client future avro-python3
Installing collected packages: six, pbr, mock, dill, typing, crcmod, numpy, 
pyarrow, python-dateutil, typing-extensions, fastavro, httplib2, protobuf, 
pymongo, docopt, idna, chardet, urllib3, requests, hdfs, pyparsing, pydot, 
pyasn1, pyasn1-modules, rsa, oauth2client, grpcio, future, avro-python3, pytz, 
apache-beam, cython
Successfully installed apache-beam-2.19.0 avro-python3-1.9.2.1 chardet-3.0.4 
crcmod-1.7 cython-0.29.16 dill-0.3.1.1 docopt-0.6.2 fastavro-0.21.24 
future-0.18.2 grpcio-1.29.0 hdfs-2.5.8 httplib2-0.12.0 idna-2.9 mock-2.0.0 
numpy-1.18.4 oauth2client-3.0.0 pbr-5.4.5 protobuf-3.11.3 pyarrow-0.15.1 
pyasn1-0.4.8 pyasn1-modules-0.2.8 pydot-1.4.1 pymongo-3.10.1 pyparsing-2.4.7 
python-dateutil-2.8.1 pytz-2020.1 requests-2.23.0 rsa-4.0 six-1.14.0 
typing-3.7.4.1 typing-extensions-3.7.4.2 urllib3-1.25.9
+ (( i++ ))
+ (( i<3 ))
+ (( i=0 ))
+ (( i<3 ))
+ /home/vsts/work/1/s/flink-python/dev/.conda/envs/3.5/bin/python setup.py 
bdist_wheel
Compiling pyflink/fn_execution/fast_coder_impl.pyx because it changed.
Compiling pyflink/fn_execution/fast_operations.pyx because it changed.
[1/2] Cythonizing pyflink/fn_execution/fast_coder_impl.pyx
[2/2] Cythonizing pyflink/fn_execution/fast_operations.pyx
Failed to find the file 
/home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/opt/flink-sql-client_*.jar.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Rework History Server into Global Dashboard

2020-05-15 Thread Gyula Fóra
Hi Till!

I agree to some extent that managing multiple clusters is not Flink's
primary responsibility.

However many (if not most) production users use Flink in per-job-cluster
mode which gives superior configurability and resource isolation than
standalone/session modes.
But still the best job management experience is on standalone clusters
where users see all the jobs, and can interact with them purely using their
unique job id.

This is the mismatch we were trying to resolve here, to get the best of
both worlds. This of course only concerns production users running many
different jobs so we can
definitely call it an enterprise feature.

I agree that this would be new code to maintain in contrast to the current
history server which "just works".

We are completely okay with not adding this to Flink just yet, as it will
be part of the next Cloudera Flink release anyways. We will test run it
there and gather production feedback for the Flink community and can make a
better decision afterwards when we see the real value.

Cheers,
Gyula



On Thu, May 14, 2020 at 3:36 PM Till Rohrmann  wrote:

> Hi Gyula,
>
> thanks for proposing this extension. I can see that such a feature could be
> helpful.
>
> However, I wouldn't consider the management of multiple clusters core to
> Flink. Managing a single cluster is already complex enough and given the
> available community capacity I would rather concentrate on doing this
> aspect right instead of adding more complexity and more code to maintain.
>
> Maybe we could add this feature as a Flink package instead. That way it
> would still be available to our users. If it gains enough traction then we
> can also add it to Flink later. What do you think?
>
> Cheers,
> Till
>
> On Wed, May 13, 2020 at 11:36 AM Gyula Fóra  wrote:
>
> > It seems that not everyone can see the screenshot in the email, so here
> is
> > a link:
> >
> > https://drive.google.com/open?id=1abrlpI976NFqOZSX20k2FoiAfVhBbER9
> >
> > On Wed, May 13, 2020 at 11:29 AM Gyula Fóra 
> wrote:
> >
> > > Oops I forgot the screenshot, thanks Ufuk :D
> > >
> > >
> > > @Jeff Zhang  : Yes we simply call to the individual
> > > cluster's rest endpoints so it would work with multiple flink versions
> > yes.
> > > Gyula
> > >
> > >
> > > On Wed, May 13, 2020 at 10:56 AM Jeff Zhang  wrote:
> > >
> > >> Hi Gyula,
> > >>
> > >> Big +1 for this, it would be very helpful for flink jobs and cluster
> > >> operations. Do you call flink rest api to gather the job info ? I hope
> > >> this
> > >> history server could work with multiple versions of flink as long as
> the
> > >> flink rest api is compatible.
> > >>
> > >> Gyula Fóra  于2020年5月13日周三 下午4:13写道:
> > >>
> > >> > Hi All!
> > >> >
> > >> > With the growing number of Flink streaming applications the current
> HS
> > >> > implementation is starting to lose its value. Users running
> streaming
> > >> > applications mostly care about what is running right now on the
> > cluster
> > >> and
> > >> > a centralised view on history is not very useful.
> > >> >
> > >> > We have been experimenting with reworking the current HS into a
> Global
> > >> > Flink Dashboard that would show all running and completed/failed
> jobs
> > on
> > >> > all the running Flink clusters the users have.
> > >> >
> > >> > In essence we would get a view similar to the current HS but it
> would
> > >> also
> > >> > show the running jobs with a link redirecting to the actual cluster
> > >> > specific dashboard.
> > >> >
> > >> > This is how it looks now:
> > >> >
> > >> >
> > >> > In this version we took a very simple approach of introducing a
> > cluster
> > >> > discovery abstraction to collect all the running Flink clusters (by
> > >> listing
> > >> > yarn apps for instance).
> > >> >
> > >> > The main pages aggregating jobs from different clusters would then
> > >> simply
> > >> > make calls to all clusters and aggregate the response. Job specific
> > >> > endpoints would be simply routed to the correct target cluster. This
> > way
> > >> > the changes required are localised to the current HS implementation
> > and
> > >> > cluster rest endpoints don't need to be changed.
> > >> >
> > >> > In addition to getting a fully working global dashboard this also
> gets
> > >> us a
> > >> > fully functioning rest endpoint for accessing all jobs in all
> clusters
> > >> > without having to provide the clusterId (yarn app id for instance)
> > that
> > >> we
> > >> > can use to enhance CLI experience in multi cluster (lot of per-job
> > >> > clusters) environments. Please let us know what you think! Gyula
> > >> >
> > >>
> > >>
> > >> --
> > >> Best Regards
> > >>
> > >> Jeff Zhang
> > >>
> > >
> >
>


Re: [DISCUSS] increase "state.backend.fs.memory-threshold" from 1K to 100K

2020-05-15 Thread Stephan Ewen
I see, thanks for all the input.

I agree with Yun Tang that the use of UnionState is problematic and can
cause issues in conjunction with this.
However, most of the large-scale users I know that also struggle with
UnionState have also increased this threshold, because with this low
threshold, they get an excess number of small files and overwhelm their
HDFS / S3 / etc.

An intermediate solution could be to put the increased value into the
default configuration. That way, existing setups with existing configs will
not be affected, but new users / installations will have a simper time.

Best,
Stephan


On Thu, May 14, 2020 at 9:20 PM Yun Tang  wrote:

> Tend to be not in favor of this proposal as union state is somewhat abused
> in several popular source connectors (e.g. kafka), and increasing this
> value could lead to JM OOM when sending tdd from JM to TMs with large
> parallelism.
>
> After we collect union state and initialize the map list [1], we already
> have union state ready to assign. At this time, the memory footprint has
> not increase too much as the union state which shared across tasks have the
> same reference of ByteStreamStateHandle. However, when we send tdd with the
> taskRestore to TMs, akka will serialize those ByteStreamStateHandle within
> tdd to increases the memory footprint. If the source have 1024
> parallelisms, and any one of the sub-task would then have 1024*100KB size
> state handles. The sum of total memory footprint cannot be ignored.
>
> If we plan to increase the default value of
> state.backend.fs.memory-threshold, we should first resolve the above case.
> In other words, this proposal could be a trade-off, which benefit perhaps
> 99% users, but might bring harmful effects to 1% user with large-scale
> flink jobs.
>
>
> [1]
> https://github.com/apache/flink/blob/c1ea6fcfd05c72a68739bda8bd16a2d1c15522c0/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/RoundRobinOperatorStateRepartitioner.java#L64-L87
>
> Best
> Yun Tang
>
>
> 
> From: Yu Li 
> Sent: Thursday, May 14, 2020 23:51
> To: Till Rohrmann 
> Cc: dev ; Piotr Nowojski 
> Subject: Re: [DISCUSS] increase "state.backend.fs.memory-threshold" from
> 1K to 100K
>
> TL;DR: I have some reservations but tend to be +1 for the proposal,
> meanwhile suggest we have a more thorough solution in the long run.
>
> Please correct me if I'm wrong, but it seems the root cause of the issue is
> too many small files generated.
>
> I have some concerns for the case of session cluster [1], as well as
> possible issues for users at large scale, otherwise I think increasing
> `state.backend.fs.memory-threshold` to 100K is a good choice, based on the
> assumption that a large portion of our users are running small jobs with
> small states.
>
> OTOH, maybe extending the solution [2] of resolving RocksDB small file
> problem (as proposed by FLINK-11937 [3]) to also support operator state
> could be an alternative? We have already applied the solution in production
> for operator state and solved the HDFS NN RPC bottleneck problem on last
> year's Singles' day.
>
> Best Regards,
> Yu
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-stable/concepts/glossary.html#flink-session-cluster
> [2]
>
> https://docs.google.com/document/d/1ukLfqNt44yqhDFL3uIhd68NevVdawccb6GflGNFzLcg
> <
> https://docs.google.com/document/d/1ukLfqNt44yqhDFL3uIhd68NevVdawccb6GflGNFzLcg/edit#heading=h.rl48knhoni0h
> >
> [3] https://issues.apache.org/jira/browse/FLINK-11937
>
>
> On Thu, 14 May 2020 at 21:45, Till Rohrmann  wrote:
>
> > I cannot say much about the concrete value but if our users have problems
> > with the existing default values, then it makes sense to me to change it.
> >
> > One thing to check could be whether it is possible to provide a
> meaningful
> > exception in case that the state size exceeds the frame size. At the
> > moment, Flink should fail with a message saying that a rpc message
> exceeds
> > the maximum frame size. Maybe it is also possible to point the user
> towards
> > "state.backend.fs.memory-threshold" if the message exceeds the frame size
> > because of too much state.
> >
> > Cheers,
> > Till
> >
> > On Thu, May 14, 2020 at 2:34 PM Stephan Ewen  wrote:
> >
> >> The parameter "state.backend.fs.memory-threshold" decides when a state
> >> will
> >> become a file and when it will be stored inline with the metadata (to
> >> avoid
> >> excessive amounts of small files).
> >>
> >> By default, this threshold is 1K - so every state above that size
> becomes
> >> a
> >> file. For many cases, this threshold seems to be too low.
> >> There is an interesting talk with background on this from Scott Kidder:
> >> https://www.youtube.com/watch?v=gycq0cY3TZ0
> >>
> >> I wanted to discuss increasing this to 100K by default.
> >>
> >> Advantage:
> >>   - This should help many users out of the box, which otherwise see
> >> checkpointing problems on systems like S3, GCS, etc.
> >>
> >> Disadvantage:
>

[jira] [Created] (FLINK-17723) Written design for flink threading model and guarantees made to the various structual components

2020-05-15 Thread John Lonergan (Jira)
John Lonergan created FLINK-17723:
-

 Summary: Written design for flink threading model and guarantees 
made to the various structual components
 Key: FLINK-17723
 URL: https://issues.apache.org/jira/browse/FLINK-17723
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: John Lonergan


Do we have a written design for the threading model including the guarantees 
made by the core framework in terms of threading and concurrency.

Looking at various existing components such as JDBC and file sinks and other 
non-core facilities.

Having some difficulty understanding the intended design.

Want to understand the assumptions I can make about when certain functions will 
be called (for example JDBCOutputFormat  open vs flush vs writeRecord vs close) 
and whether this will always be from the same thread or some other thread, or 
whether they might be called concurrently, in order to verify the correctness 
of the code. 

What guarantees are there?

Does a certain reference need a volatile or even a synchronisation or not.

What's the design for threading?

If the intended design is not written down then we have to infer it from the 
code and we will definitiely come to different conclusions and thus bugs and 
leaks. and other avoidable horrors.

It's really hard writing good MT code and a strong design is necessary to 
provide a framework for the code.

Some info here 
https://flink.apache.org/contributing/code-style-and-quality-common.html#concurrency-and-threading
 , but this isn't a design and doesn't say how it's meant to work. However that 
page does agree that writing MT code is very hard and this just underlines the 
need for a strong and detailed design for this aspect. 

==

Another supporting example. 
When I see code like this ...

```
FileOutputFormat

   public void close() throws IOException {
final FSDataOutputStream s = this.stream;
if (s != null) {
this.stream = null;
s.close();
}
}
```

My feeling is that someone else wasn't sure what the right approach was.

I can only guess that the author was concerned that someone else was going to 
call the function concurrently, or mess with the class state by some other 
means. And, if that were true then would this code even be MT safe - who knows? 
Ought there be a volatile in there or a plain old sync or perhaps none of the 
caution is needed at all (framework guarantees preventing the need).

I find if worrying that I see a lot of code in the project that is similarly 
uncertain and inconsistent.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17724) PyFlink end-to-end test fails with Cannot run program "venv.zip/.conda/bin/python": error=2, No such file or directory

2020-05-15 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17724:
--

 Summary: PyFlink end-to-end test fails with Cannot run program 
"venv.zip/.conda/bin/python": error=2, No such file or directory
 Key: FLINK-17724
 URL: https://issues.apache.org/jira/browse/FLINK-17724
 Project: Flink
  Issue Type: Bug
  Components: API / Python, Tests
Reporter: Robert Metzger
 Fix For: 1.11.0


CI: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8001&view=logs&j=1f3ed471-1849-5d3c-a34c-19792af4ad16&t=2f5b54d0-1d28-5b01-d344-aa50ffe0cdf8

{code}
2020-05-15T09:21:20.5198725Z Verifying transaction: ...working... done
2020-05-15T09:21:20.5918557Z Executing transaction: ...working... done
2020-05-15T09:22:18.8776831Z DeprecationWarning: 'source deactivate' is 
deprecated. Use 'conda deactivate'.
2020-05-15T09:22:19.091Z Starting cluster.
2020-05-15T09:22:21.9516388Z Starting standalonesession daemon on host fv-az678.
2020-05-15T09:22:23.5833254Z Starting taskexecutor daemon on host fv-az678.
2020-05-15T09:22:23.6192099Z Waiting for Dispatcher REST endpoint to come up...
2020-05-15T09:22:24.6699447Z Waiting for Dispatcher REST endpoint to come up...
2020-05-15T09:22:26.0376695Z Waiting for Dispatcher REST endpoint to come up...
2020-05-15T09:22:27.1345574Z Waiting for Dispatcher REST endpoint to come up...
2020-05-15T09:22:28.1809673Z Dispatcher REST endpoint is up.
2020-05-15T09:22:28.1842051Z Test submitting python job:\n
2020-05-15T09:22:29.6520483Z Results directory: /tmp/result
2020-05-15T09:23:23.2171819Z Traceback (most recent call last):
2020-05-15T09:23:23.2174222Z   File 
"/home/vsts/work/1/s/flink-end-to-end-tests/flink-python-test/python/python_job.py",
 line 82, in 
2020-05-15T09:23:23.2174857Z word_count()
2020-05-15T09:23:23.2175685Z   File 
"/home/vsts/work/1/s/flink-end-to-end-tests/flink-python-test/python/python_job.py",
 line 76, in word_count
2020-05-15T09:23:23.2176310Z t_env.execute("word_count")
2020-05-15T09:23:23.2177228Z   File 
"/home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/opt/python/pyflink.zip/pyflink/table/table_environment.py",
 line 1049, in execute
2020-05-15T09:23:23.2179484Z   File 
"/home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/opt/python/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
 line 1286, in __call__
2020-05-15T09:23:23.2181045Z   File 
"/home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/opt/python/pyflink.zip/pyflink/util/exceptions.py",
 line 147, in deco
2020-05-15T09:23:23.2182205Z   File 
"/home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/opt/python/py4j-0.10.8.1-src.zip/py4j/protocol.py",
 line 328, in get_return_value
2020-05-15T09:23:23.2182889Z py4j.protocol.Py4JJavaError: An error occurred 
while calling o2.execute.
2020-05-15T09:23:23.2183634Z : java.lang.RuntimeException: 
java.util.concurrent.ExecutionException: 
org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 
b0e54b0a1c99b57e04ec32d7879437c4)
2020-05-15T09:23:23.2184387Zat 
org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:290)
2020-05-15T09:23:23.2184960Zat 
org.apache.flink.table.api.internal.BatchTableEnvImpl.execute(BatchTableEnvImpl.scala:325)
2020-05-15T09:23:23.2185508Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-05-15T09:23:23.2186320Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-05-15T09:23:23.2186930Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-05-15T09:23:23.2187472Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2020-05-15T09:23:23.2188027Zat 
org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
2020-05-15T09:23:23.2188672Zat 
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
2020-05-15T09:23:23.2189287Zat 
org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
2020-05-15T09:23:23.2189911Zat 
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
2020-05-15T09:23:23.2190546Zat 
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
2020-05-15T09:23:23.2191178Zat 
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
2020-05-15T09:23:23.2191702Zat java.lang.Thread.run(Thread.java:748)
2020-05-15T09:23:23.2192357Z Caused by: 
java.util.concurrent.ExecutionException: 
org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 
b0e54b0a1c99b57e04ec32d7879437c4)
2020-05-15T09:23:23.2193084Zat 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
2020-05-15T09:23:

[jira] [Created] (FLINK-17725) FileUploadHandlerTest.testUploadCleanupOnFailure fails with "SocketTimeout timeout"

2020-05-15 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17725:
--

 Summary: FileUploadHandlerTest.testUploadCleanupOnFailure fails 
with "SocketTimeout timeout"
 Key: FLINK-17725
 URL: https://issues.apache.org/jira/browse/FLINK-17725
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Reporter: Robert Metzger


CI: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1392&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=4ed44b66-cdd6-5dcf-5f6a-88b07dda665d
 ( select 1st attempt)

{code}
2020-05-15T09:01:40.0547958Z [ERROR] 
testUploadCleanupOnFailure(org.apache.flink.runtime.rest.FileUploadHandlerTest) 
 Time elapsed: 10.415 s  <<< ERROR!
2020-05-15T09:01:40.0548716Z java.net.SocketTimeoutException: timeout
2020-05-15T09:01:40.0549048Zat 
okio.Okio$4.newTimeoutException(Okio.java:227)
2020-05-15T09:01:40.0549361Zat okio.AsyncTimeout.exit(AsyncTimeout.java:284)
2020-05-15T09:01:40.0549688Zat 
okio.AsyncTimeout$2.read(AsyncTimeout.java:240)
2020-05-15T09:01:40.0552454Zat 
okio.RealBufferedSource.indexOf(RealBufferedSource.java:344)
2020-05-15T09:01:40.0554987Zat 
okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:216)
2020-05-15T09:01:40.0555636Zat 
okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:210)
2020-05-15T09:01:40.0556307Zat 
okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
2020-05-15T09:01:40.0556856Zat 
okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75)
2020-05-15T09:01:40.0557505Zat 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
2020-05-15T09:01:40.0558021Zat 
okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
2020-05-15T09:01:40.0558498Zat 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
2020-05-15T09:01:40.0558932Zat 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
2020-05-15T09:01:40.0559381Zat 
okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
2020-05-15T09:01:40.0559803Zat 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
2020-05-15T09:01:40.0560262Zat 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
2020-05-15T09:01:40.0561022Zat 
okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
2020-05-15T09:01:40.0561701Zat 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
2020-05-15T09:01:40.0562439Zat 
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
2020-05-15T09:01:40.0563170Zat 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
2020-05-15T09:01:40.0565934Zat 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
2020-05-15T09:01:40.0566781Zat 
okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
2020-05-15T09:01:40.0575046Zat okhttp3.RealCall.execute(RealCall.java:69)
2020-05-15T09:01:40.0575858Zat 
org.apache.flink.runtime.rest.FileUploadHandlerTest.testUploadCleanupOnFailure(FileUploadHandlerTest.java:250)
2020-05-15T09:01:40.0576567Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-05-15T09:01:40.0577242Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-05-15T09:01:40.0577979Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-05-15T09:01:40.0578594Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2020-05-15T09:01:40.0579234Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2020-05-15T09:01:40.0580279Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2020-05-15T09:01:40.0581129Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2020-05-15T09:01:40.0581862Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2020-05-15T09:01:40.0582538Zat 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2020-05-15T09:01:40.0583174Zat 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
2020-05-15T09:01:40.0583934Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2020-05-15T09:01:40.0584501Zat 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
2020-05-15T09:01:40.0585282Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
2020-05-15T09:01:40.0586005Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
2020-05-15T09:01:40.058Zat 
org.junit.runners.ParentRunner$3.run(ParentRunne

Re: [ANNOUNCE] Apache Flink 1.10.1 released

2020-05-15 Thread Till Rohrmann
Thanks Yu for being our release manager and everyone else who made the
release possible!

Cheers,
Till

On Fri, May 15, 2020 at 9:15 AM Congxian Qiu  wrote:

> Thanks a lot for the release and your great job, Yu!
> Also thanks to everyone who made this release possible!
>
> Best,
> Congxian
>
>
> Yu Li  于2020年5月14日周四 上午1:59写道:
>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.10.1, which is the first bugfix release for the Apache Flink
>> 1.10 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> https://flink.apache.org/news/2020/05/12/release-1.10.1.html
>>
>> The full release notes are available in Jira:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346891
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>>
>> Regards,
>> Yu
>>
>


Re: [DISCUSS] Rework History Server into Global Dashboard

2020-05-15 Thread Till Rohrmann
This sounds like a good plan to me Gyula. And there is always the Flink
packages option available if we want to make it available earlier.

Cheers,
Till

On Fri, May 15, 2020 at 10:12 AM Gyula Fóra  wrote:

> Hi Till!
>
> I agree to some extent that managing multiple clusters is not Flink's
> primary responsibility.
>
> However many (if not most) production users use Flink in per-job-cluster
> mode which gives superior configurability and resource isolation than
> standalone/session modes.
> But still the best job management experience is on standalone clusters
> where users see all the jobs, and can interact with them purely using their
> unique job id.
>
> This is the mismatch we were trying to resolve here, to get the best of
> both worlds. This of course only concerns production users running many
> different jobs so we can
> definitely call it an enterprise feature.
>
> I agree that this would be new code to maintain in contrast to the current
> history server which "just works".
>
> We are completely okay with not adding this to Flink just yet, as it will
> be part of the next Cloudera Flink release anyways. We will test run it
> there and gather production feedback for the Flink community and can make a
> better decision afterwards when we see the real value.
>
> Cheers,
> Gyula
>
>
>
> On Thu, May 14, 2020 at 3:36 PM Till Rohrmann 
> wrote:
>
> > Hi Gyula,
> >
> > thanks for proposing this extension. I can see that such a feature could
> be
> > helpful.
> >
> > However, I wouldn't consider the management of multiple clusters core to
> > Flink. Managing a single cluster is already complex enough and given the
> > available community capacity I would rather concentrate on doing this
> > aspect right instead of adding more complexity and more code to maintain.
> >
> > Maybe we could add this feature as a Flink package instead. That way it
> > would still be available to our users. If it gains enough traction then
> we
> > can also add it to Flink later. What do you think?
> >
> > Cheers,
> > Till
> >
> > On Wed, May 13, 2020 at 11:36 AM Gyula Fóra 
> wrote:
> >
> > > It seems that not everyone can see the screenshot in the email, so here
> > is
> > > a link:
> > >
> > > https://drive.google.com/open?id=1abrlpI976NFqOZSX20k2FoiAfVhBbER9
> > >
> > > On Wed, May 13, 2020 at 11:29 AM Gyula Fóra 
> > wrote:
> > >
> > > > Oops I forgot the screenshot, thanks Ufuk :D
> > > >
> > > >
> > > > @Jeff Zhang  : Yes we simply call to the
> individual
> > > > cluster's rest endpoints so it would work with multiple flink
> versions
> > > yes.
> > > > Gyula
> > > >
> > > >
> > > > On Wed, May 13, 2020 at 10:56 AM Jeff Zhang 
> wrote:
> > > >
> > > >> Hi Gyula,
> > > >>
> > > >> Big +1 for this, it would be very helpful for flink jobs and cluster
> > > >> operations. Do you call flink rest api to gather the job info ? I
> hope
> > > >> this
> > > >> history server could work with multiple versions of flink as long as
> > the
> > > >> flink rest api is compatible.
> > > >>
> > > >> Gyula Fóra  于2020年5月13日周三 下午4:13写道:
> > > >>
> > > >> > Hi All!
> > > >> >
> > > >> > With the growing number of Flink streaming applications the
> current
> > HS
> > > >> > implementation is starting to lose its value. Users running
> > streaming
> > > >> > applications mostly care about what is running right now on the
> > > cluster
> > > >> and
> > > >> > a centralised view on history is not very useful.
> > > >> >
> > > >> > We have been experimenting with reworking the current HS into a
> > Global
> > > >> > Flink Dashboard that would show all running and completed/failed
> > jobs
> > > on
> > > >> > all the running Flink clusters the users have.
> > > >> >
> > > >> > In essence we would get a view similar to the current HS but it
> > would
> > > >> also
> > > >> > show the running jobs with a link redirecting to the actual
> cluster
> > > >> > specific dashboard.
> > > >> >
> > > >> > This is how it looks now:
> > > >> >
> > > >> >
> > > >> > In this version we took a very simple approach of introducing a
> > > cluster
> > > >> > discovery abstraction to collect all the running Flink clusters
> (by
> > > >> listing
> > > >> > yarn apps for instance).
> > > >> >
> > > >> > The main pages aggregating jobs from different clusters would then
> > > >> simply
> > > >> > make calls to all clusters and aggregate the response. Job
> specific
> > > >> > endpoints would be simply routed to the correct target cluster.
> This
> > > way
> > > >> > the changes required are localised to the current HS
> implementation
> > > and
> > > >> > cluster rest endpoints don't need to be changed.
> > > >> >
> > > >> > In addition to getting a fully working global dashboard this also
> > gets
> > > >> us a
> > > >> > fully functioning rest endpoint for accessing all jobs in all
> > clusters
> > > >> > without having to provide the clusterId (yarn app id for instance)
> > > that
> > > >> we
> > > >> > can use to enhance CLI experience in multi cluster (lot of per-

[jira] [Created] (FLINK-17726) Scheduler should take care of tasks directly canceled by TaskManager

2020-05-15 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-17726:
---

 Summary: Scheduler should take care of tasks directly canceled by 
TaskManager
 Key: FLINK-17726
 URL: https://issues.apache.org/jira/browse/FLINK-17726
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.12.0
Reporter: Zhu Zhu
 Fix For: 1.12.0


JobManager will not trigger failure handling when receiving CANCELED task 
update. 
This is because CANCELED tasks are usually caused by another FAILED task. These 
CANCELED tasks will be restarted by the failover process triggered  FAILED task.

However, if a task is directly CANCELED by TaskManager due to its own runtime 
issue, the task will not be recovered by JM and thus the job would hang.
This is a potential issue and we should avoid it.

A possible solution is to let JobManager treat tasks transitioning to CANCELED 
from all states except from CANCELING as failed tasks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Apache Flink 1.10.1 released

2020-05-15 Thread Dian Fu
Thanks Yu for managing this release and everyone else who made this release 
possible. Good work!

Regards,
Dian

> 在 2020年5月15日,下午6:26,Till Rohrmann  写道:
> 
> Thanks Yu for being our release manager and everyone else who made the 
> release possible!
> 
> Cheers,
> Till
> 
> On Fri, May 15, 2020 at 9:15 AM Congxian Qiu  > wrote:
> Thanks a lot for the release and your great job, Yu!
> Also thanks to everyone who made this release possible!
> 
> Best,
> Congxian
> 
> 
> Yu Li mailto:car...@gmail.com>> 于2020年5月14日周四 上午1:59写道:
> The Apache Flink community is very happy to announce the release of Apache 
> Flink 1.10.1, which is the first bugfix release for the Apache Flink 1.10 
> series.
> 
> Apache Flink® is an open-source stream processing framework for distributed, 
> high-performing, always-available, and accurate data streaming applications.
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html 
> 
> 
> Please check out the release blog post for an overview of the improvements 
> for this bugfix release:
> https://flink.apache.org/news/2020/05/12/release-1.10.1.html 
> 
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346891
>  
> 
> 
> We would like to thank all contributors of the Apache Flink community who 
> made this release possible!
> 
> Regards,
> Yu



[jira] [Created] (FLINK-17727) Can't subsume checkpoint with no channel state in UC mode

2020-05-15 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-17727:
-

 Summary: Can't subsume checkpoint with no channel state in UC mode
 Key: FLINK-17727
 URL: https://issues.apache.org/jira/browse/FLINK-17727
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing, Runtime / Task
Affects Versions: 1.11.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.11.0


When there are no channel state handles, the underlying FS stream is still 
created.

On discard it is not deleted because it's not referenced by any state handles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Exact feature freeze date

2020-05-15 Thread Piotr Nowojski
Hi,

Couple of contributors asked for extending cutting the release branch until 
Monday, what do you think about such extension?

(+1 from my side)

Piotrek

> On 25 Apr 2020, at 21:24, Yu Li  wrote:
> 
> +1 for extending the feature freeze to May 15th.
> 
> Best Regards,
> Yu
> 
> 
> On Fri, 24 Apr 2020 at 14:43, Yuan Mei  wrote:
> 
>> +1
>> 
>> On Thu, Apr 23, 2020 at 4:10 PM Stephan Ewen  wrote:
>> 
>>> Hi all!
>>> 
>>> I want to bring up a discussion about when we want to do the feature
>> freeze
>>> for 1.11.
>>> 
>>> When kicking off the release cycle, we tentatively set the date to end of
>>> April, which would be in one week.
>>> 
>>> I can say from the features I am involved with (FLIP-27, FLIP-115,
>>> reviewing some state backend improvements, etc.) that it would be helpful
>>> to have two additional weeks.
>>> 
>>> When looking at various other feature threads, my feeling is that there
>> are
>>> more contributors and committers that could use a few more days.
>>> The last two months were quite exceptional in and we did lose a bit of
>>> development speed here and there.
>>> 
>>> How do you think about making *May 15th* the feature freeze?
>>> 
>>> Best,
>>> Stephan
>>> 
>> 



[jira] [Created] (FLINK-17728) use sql parser to parse a statement in sql client

2020-05-15 Thread godfrey he (Jira)
godfrey he created FLINK-17728:
--

 Summary: use sql parser to parse a statement in sql client
 Key: FLINK-17728
 URL: https://issues.apache.org/jira/browse/FLINK-17728
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: godfrey he
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Apache Flink 1.10.1 released

2020-05-15 Thread Benchao Li
Thanks Yu for the great work, and everyone else who made this possible.

Dian Fu  于2020年5月15日周五 下午6:55写道:

> Thanks Yu for managing this release and everyone else who made this
> release possible. Good work!
>
> Regards,
> Dian
>
> 在 2020年5月15日,下午6:26,Till Rohrmann  写道:
>
> Thanks Yu for being our release manager and everyone else who made the
> release possible!
>
> Cheers,
> Till
>
> On Fri, May 15, 2020 at 9:15 AM Congxian Qiu 
> wrote:
>
>> Thanks a lot for the release and your great job, Yu!
>> Also thanks to everyone who made this release possible!
>>
>> Best,
>> Congxian
>>
>>
>> Yu Li  于2020年5月14日周四 上午1:59写道:
>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.10.1, which is the first bugfix release for the Apache Flink
>>> 1.10 series.
>>>
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this bugfix release:
>>> https://flink.apache.org/news/2020/05/12/release-1.10.1.html
>>>
>>> The full release notes are available in Jira:
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346891
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>>
>>> Regards,
>>> Yu
>>>
>>
>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn


[jira] [Created] (FLINK-17729) Make mandatory to have lib/, plugin/ and dist in yarn.provided.lib.dirs

2020-05-15 Thread Kostas Kloudas (Jira)
Kostas Kloudas created FLINK-17729:
--

 Summary: Make mandatory to have lib/, plugin/ and dist in 
yarn.provided.lib.dirs
 Key: FLINK-17729
 URL: https://issues.apache.org/jira/browse/FLINK-17729
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission, Deployment / YARN
Affects Versions: 1.11.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas
 Fix For: 1.11.0


This will make mandatory for users of the "yarn.provided.lib.dirs" to also 
include the  lib/, plugin/ and flink-dist jar. If these are not included in the 
shared resources, then the feature cannot be used and all the dependencies of 
an application are expected to be shipped from the client (as it was the 
previous behaviour). If they are provided, then these are going to be used and 
NOT what the user may have locally (e.g. different flink or log4j versions).

The reason for this requirement is to avoid unpleasant surprises with 
classloading issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Exact feature freeze date

2020-05-15 Thread Robert Metzger
I'm okay, but I would suggest to agree on a time of day. What about Monday
morning in Europe?

On Fri, May 15, 2020 at 1:43 PM Piotr Nowojski  wrote:

> Hi,
>
> Couple of contributors asked for extending cutting the release branch
> until Monday, what do you think about such extension?
>
> (+1 from my side)
>
> Piotrek
>
> > On 25 Apr 2020, at 21:24, Yu Li  wrote:
> >
> > +1 for extending the feature freeze to May 15th.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Fri, 24 Apr 2020 at 14:43, Yuan Mei  wrote:
> >
> >> +1
> >>
> >> On Thu, Apr 23, 2020 at 4:10 PM Stephan Ewen  wrote:
> >>
> >>> Hi all!
> >>>
> >>> I want to bring up a discussion about when we want to do the feature
> >> freeze
> >>> for 1.11.
> >>>
> >>> When kicking off the release cycle, we tentatively set the date to end
> of
> >>> April, which would be in one week.
> >>>
> >>> I can say from the features I am involved with (FLIP-27, FLIP-115,
> >>> reviewing some state backend improvements, etc.) that it would be
> helpful
> >>> to have two additional weeks.
> >>>
> >>> When looking at various other feature threads, my feeling is that there
> >> are
> >>> more contributors and committers that could use a few more days.
> >>> The last two months were quite exceptional in and we did lose a bit of
> >>> development speed here and there.
> >>>
> >>> How do you think about making *May 15th* the feature freeze?
> >>>
> >>> Best,
> >>> Stephan
> >>>
> >>
>
>


[jira] [Created] (FLINK-17730) HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart times out

2020-05-15 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17730:
--

 Summary: 
HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart
 times out
 Key: FLINK-17730
 URL: https://issues.apache.org/jira/browse/FLINK-17730
 Project: Flink
  Issue Type: Bug
  Components: Build System / Azure Pipelines, FileSystems, Tests
Reporter: Robert Metzger


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1374&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=34f486e1-e1e4-5dd2-9c06-bfdd9b9c74a8

After 5 minutes 
{code}
2020-05-15T06:56:38.1688341Z "main" #1 prio=5 os_prio=0 tid=0x7fa10800b800 
nid=0x1161 runnable [0x7fa110959000]
2020-05-15T06:56:38.1688709Zjava.lang.Thread.State: RUNNABLE
2020-05-15T06:56:38.1689028Zat 
java.net.SocketInputStream.socketRead0(Native Method)
2020-05-15T06:56:38.1689496Zat 
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
2020-05-15T06:56:38.1689921Zat 
java.net.SocketInputStream.read(SocketInputStream.java:171)
2020-05-15T06:56:38.1690316Zat 
java.net.SocketInputStream.read(SocketInputStream.java:141)
2020-05-15T06:56:38.1690723Zat 
sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
2020-05-15T06:56:38.1691196Zat 
sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
2020-05-15T06:56:38.1691608Zat 
sun.security.ssl.InputRecord.read(InputRecord.java:532)
2020-05-15T06:56:38.1692023Zat 
sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
2020-05-15T06:56:38.1692558Z- locked <0xb94644f8> (a 
java.lang.Object)
2020-05-15T06:56:38.1692946Zat 
sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
2020-05-15T06:56:38.1693371Zat 
sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
2020-05-15T06:56:38.1694151Z- locked <0xb9464d20> (a 
sun.security.ssl.AppInputStream)
2020-05-15T06:56:38.1694908Zat 
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
2020-05-15T06:56:38.1695475Zat 
org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198)
2020-05-15T06:56:38.1696007Zat 
org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176)
2020-05-15T06:56:38.1696509Zat 
org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
2020-05-15T06:56:38.1696993Zat 
com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
2020-05-15T06:56:38.1697466Zat 
com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
2020-05-15T06:56:38.1698069Zat 
com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
2020-05-15T06:56:38.1698567Zat 
com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
2020-05-15T06:56:38.1699041Zat 
com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
2020-05-15T06:56:38.1699624Zat 
com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
2020-05-15T06:56:38.1700090Zat 
com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
2020-05-15T06:56:38.1700584Zat 
com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107)
2020-05-15T06:56:38.1701282Zat 
com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
2020-05-15T06:56:38.1701800Zat 
com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125)
2020-05-15T06:56:38.1702328Zat 
com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
2020-05-15T06:56:38.1702804Zat 
org.apache.hadoop.fs.s3a.S3AInputStream.lambda$read$3(S3AInputStream.java:445)
2020-05-15T06:56:38.1703270Zat 
org.apache.hadoop.fs.s3a.S3AInputStream$$Lambda$42/1204178174.execute(Unknown 
Source)
2020-05-15T06:56:38.1703677Zat 
org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
2020-05-15T06:56:38.1704090Zat 
org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
2020-05-15T06:56:38.1704607Zat 
org.apache.hadoop.fs.s3a.Invoker$$Lambda$23/1991724700.execute(Unknown Source)
2020-05-15T06:56:38.1705115Zat 
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
2020-05-15T06:56:38.1705551Zat 
org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
2020-05-15T06:56:38.1705937Zat 
org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:231)
2020-05-15T06:56:38.1706363Zat 
org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:441)
2020-05-15T06:56:38.1707052Z- locked <0xb7d98b60> (a 
org.apache.hadoop.fs.s3a.S3AInputStream)
2020-05-15T06:56:38.1707438Zat 
java.io.DataInputStream.read(DataInputStream.java:149)
2020-05-15T06:56:38.1707904Zat 
org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.re

[jira] [Created] (FLINK-17731) Artifact X already exists for build Y

2020-05-15 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17731:
--

 Summary: Artifact X already exists for build Y
 Key: FLINK-17731
 URL: https://issues.apache.org/jira/browse/FLINK-17731
 Project: Flink
  Issue Type: Bug
  Components: Build System / Azure Pipelines, Tests
Reporter: Robert Metzger
Assignee: Robert Metzger


Example: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1392&view=logs&j=54a39f72-3d67-5c07-dce2-774f9b8414bc&t=b26c14d3-94a7-5850-bd1e-9b6578f16671&l=37



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17732) RescalingITCase.testSavepointRescalingOutPartitionedOperatorStateList

2020-05-15 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17732:
--

 Summary: 
RescalingITCase.testSavepointRescalingOutPartitionedOperatorStateList
 Key: FLINK-17732
 URL: https://issues.apache.org/jira/browse/FLINK-17732
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Reporter: Robert Metzger


CI: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1413&view=logs&j=16ccbdb7-2a3e-53da-36eb-fb718edc424a&t=cf61ce33-6fba-5fbe-2c0c-e41c4013e891

{code}
2020-05-15T12:09:16.9432669Z [ERROR] 
testSavepointRescalingOutPartitionedOperatorStateList[backend = 
filesystem](org.apache.flink.test.checkpointing.RescalingITCase)  Time elapsed: 
180.189 s  <<< ERROR!
2020-05-15T12:09:16.9433577Z java.util.concurrent.TimeoutException
2020-05-15T12:09:16.9434238Zat 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
2020-05-15T12:09:16.9435119Zat 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
2020-05-15T12:09:16.9436062Zat 
org.apache.flink.test.checkpointing.RescalingITCase.testSavepointRescalingPartitionedOperatorState(RescalingITCase.java:473)
2020-05-15T12:09:16.9437313Zat 
org.apache.flink.test.checkpointing.RescalingITCase.testSavepointRescalingOutPartitionedOperatorStateList(RescalingITCase.java:427)
2020-05-15T12:09:16.9438112Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-05-15T12:09:16.9438858Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-05-15T12:09:16.9439611Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-05-15T12:09:16.9440367Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2020-05-15T12:09:16.9441502Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2020-05-15T12:09:16.9442020Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2020-05-15T12:09:16.9442535Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2020-05-15T12:09:16.9442984Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2020-05-15T12:09:16.9443557Zat 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2020-05-15T12:09:16.9444014Zat 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
2020-05-15T12:09:16.9444379Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2020-05-15T12:09:16.9444901Zat 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
2020-05-15T12:09:16.9445314Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
2020-05-15T12:09:16.9445812Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
2020-05-15T12:09:16.9446289Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2020-05-15T12:09:16.9446660Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2020-05-15T12:09:16.9447097Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
2020-05-15T12:09:16.9447478Zat 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
2020-05-15T12:09:16.9447915Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
2020-05-15T12:09:16.9448284Zat 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
2020-05-15T12:09:16.9448689Zat 
org.junit.runners.Suite.runChild(Suite.java:128)
2020-05-15T12:09:16.9449062Zat 
org.junit.runners.Suite.runChild(Suite.java:27)
2020-05-15T12:09:16.9449405Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2020-05-15T12:09:16.9450063Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2020-05-15T12:09:16.9450620Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
2020-05-15T12:09:16.9451313Zat 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
2020-05-15T12:09:16.9451741Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
2020-05-15T12:09:16.9452218Zat 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2020-05-15T12:09:16.9452694Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2020-05-15T12:09:16.9453065Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2020-05-15T12:09:16.9453468Zat 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
2020-05-15T12:09:16.9454027Zat 
org.junit.runners.Suite.runChild(Suite.java:128)
2020-05-15T12:09:16.9454408Zat 
org.junit.runners.Suite.runChild(Suite.java:27)
2020-05-15T12:09:16.9454844Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2020-05-15T12:09:16.9455269Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2020-05-15T12:09:16.9455742Zat 
org.j

Re: [DISCUSS] Exact feature freeze date

2020-05-15 Thread Till Rohrmann
+1 from my side extend the feature freeze until Monday morning.

Cheers,
Till

On Fri, May 15, 2020 at 2:04 PM Robert Metzger  wrote:

> I'm okay, but I would suggest to agree on a time of day. What about Monday
> morning in Europe?
>
> On Fri, May 15, 2020 at 1:43 PM Piotr Nowojski 
> wrote:
>
> > Hi,
> >
> > Couple of contributors asked for extending cutting the release branch
> > until Monday, what do you think about such extension?
> >
> > (+1 from my side)
> >
> > Piotrek
> >
> > > On 25 Apr 2020, at 21:24, Yu Li  wrote:
> > >
> > > +1 for extending the feature freeze to May 15th.
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Fri, 24 Apr 2020 at 14:43, Yuan Mei  wrote:
> > >
> > >> +1
> > >>
> > >> On Thu, Apr 23, 2020 at 4:10 PM Stephan Ewen 
> wrote:
> > >>
> > >>> Hi all!
> > >>>
> > >>> I want to bring up a discussion about when we want to do the feature
> > >> freeze
> > >>> for 1.11.
> > >>>
> > >>> When kicking off the release cycle, we tentatively set the date to
> end
> > of
> > >>> April, which would be in one week.
> > >>>
> > >>> I can say from the features I am involved with (FLIP-27, FLIP-115,
> > >>> reviewing some state backend improvements, etc.) that it would be
> > helpful
> > >>> to have two additional weeks.
> > >>>
> > >>> When looking at various other feature threads, my feeling is that
> there
> > >> are
> > >>> more contributors and committers that could use a few more days.
> > >>> The last two months were quite exceptional in and we did lose a bit
> of
> > >>> development speed here and there.
> > >>>
> > >>> How do you think about making *May 15th* the feature freeze?
> > >>>
> > >>> Best,
> > >>> Stephan
> > >>>
> > >>
> >
> >
>


[jira] [Created] (FLINK-17733) Add documentation for real-time hive

2020-05-15 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-17733:


 Summary: Add documentation for real-time hive
 Key: FLINK-17733
 URL: https://issues.apache.org/jira/browse/FLINK-17733
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17734) Add specialized collecting sink function

2020-05-15 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-17734:
---

 Summary: Add specialized collecting sink function
 Key: FLINK-17734
 URL: https://issues.apache.org/jira/browse/FLINK-17734
 Project: Flink
  Issue Type: New Feature
  Components: API / DataStream
Reporter: Caizhi Weng
 Fix For: 1.11.0


A specialized collecting sink is needed to implement the algorithms in 
FLINK-14807



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Interested in applying as a technical writer

2020-05-15 Thread Marta Paes Moreira
Hi, Deepak.

Thanks for the introduction — it's cool to see that you're interested in
contributing to Flink as part of GSoD!

We're looking forward to receiving your application! Let us know if you
have any questions, in the meantime.

Marta

On Thu, May 14, 2020 at 11:35 PM Deepak Vohra 
wrote:

> I am interested in applying as a technical writer to the Apache Flink
> project in Google Season of Docs. In the project exploration phase I would
> like to introduce myself as a potential applicant (when the application
> opens). I have experience using several data processing frameworks and have
> published dozens of articles and a few books on the same. Some books on
> similar topics :
>  1.Practical Hadoop Ecosystemhttps://
> www.amazon.com/gp/product/B01M0NAHU3/ref=dbs_a_def_rwt_hsch_vapi_tkin_p1_i5
> 2. Apache HBase Primerhttps://
> www.amazon.com/gp/product/B01MTOSTAB/ref=dbs_a_def_rwt_bibl_vppi_i1
> I have also published 5 other books on Docker and Kubernetes; Kubernetes
> being a commonly used deployment platform for Apache Flink.
> regards,Deepak


[jira] [Created] (FLINK-17735) Add specialized collecting iterator to Blink planner

2020-05-15 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-17735:
---

 Summary: Add specialized collecting iterator to Blink planner
 Key: FLINK-17735
 URL: https://issues.apache.org/jira/browse/FLINK-17735
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Caizhi Weng
 Fix For: 1.11.0


Add specialized collecting iterator is needed to implement the algorithms in 
FLINK-14807

As legacy planner does not have data structures like 
{{ResettableExternalBuffer}} in Blink planner, we're not implementing the 
algorithms in legacy planner. Legacy planner just use the current collect 
implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Stability guarantees for @PublicEvolving classes

2020-05-15 Thread Till Rohrmann
I completely agree that there are many other aspect of our guarantees and
processes around the @Public and @PublicEvolving classes which need to be
discussed and properly defined. For the sake of keeping this discussion
thread narrowly scoped, I would suggest to start a separate discussion
about the following points (not exhaustive):

- What should be annotated with @Public and @PublicEvolving?
- Process for transforming @PublicEvolving into @Public; How to ensure
that @PublicEvolving will eventually be promoted to @Public?
- Process of retiring a @Public/@PublicEvolving API

I will start a vote thread about the change I proposed here which is to
ensure API and binary compatibility for @PublicEvolving classes between
bugfix releases (x.y.z and x.y.u).

Cheers,
Till

On Fri, May 15, 2020 at 6:33 AM Zhu Zhu  wrote:

> +1 for "API + binary compatibility for @PublicEvolving classes for all bug
> fix
> releases in a minor release (x.y.z is compatible to x.y.u)"
>
> This @PublicEnvolving would then be a hard limit to changes.
> So it's important to rethink the policy towards using it, as Stephan
> proposed.
>
> I think any Flink interfaces that are visible to users should be explicitly
> marked as @Public or @PublicEnvolving.
> Any other interfaces should not be marked as @Public/@PublicEnvolving.
> This would be essential for us to check whether we are breaking any user
> faced interfaces unexpectedly.
> The only exception would be the case that we had to expose a method/class
> due to implementation limitations, it should be explicitly marked it
> as @Internal.
>
> Thanks,
> Zhu Zhu
>
> Yun Tang  于2020年5月15日周五 上午11:41写道:
>
> > +1 for this idea, and I also like Xintong's suggestion to make it
> > explicitly when the @PublicEvolving API could upgrade to @Public API.
> > If we have the rule to upgrade API stable level but not define the clear
> > timeline, I'm afraid not everyone have the enthusiasm to upgrade this.
> >
> > The minor suggestion is that I think two major release (which is x.y.0 as
> > Chesnay clarified) might be a bit quick. From the release history [1],
> > Flink bump major version every 3 ~ 6 months and two major release gap
> > could only be at least half a year.
> > I think half a year might be a bit too frequent for users to collect
> > enough feedbacks, and upgrading API stable level every 3 major versions
> > should be better.
> >
> > [1] https://flink.apache.org/downloads.html#flink
> >
> > Best
> > Yun Tang
> >
> >
> > 
> > From: Xintong Song 
> > Sent: Friday, May 15, 2020 11:04
> > To: dev 
> > Subject: Re: [DISCUSS] Stability guarantees for @PublicEvolving classes
> >
> > ### Documentation on API compatibility policies
> >
> > Do we have any formal documentation about the API compatibility policies?
> > The only things I found are:
> >
> >- In the release announcement (take 1.10.0 as an example) [1]:
> >"This version is API-compatible with previous 1.x releases for APIs
> >annotated with the @Public annotation."
> >- JavaDoc for Public [2] and PublicEvolving [3].
> >
> > I think we might have a formal documentation, clearly state our policies
> > for API compatibility.
> >
> >- What does the annotations mean
> >- In what circumstance would the APIs remain compatible / become
> >incompatible
> >- How do APIs retire (e.g., first deprecated then removed?)
> >
> > Maybe there is already such kind of documentation that I overlooked? If
> so,
> > we probably want to make it more explicit and easy-to-find.
> >
> > ### @Public vs. @PublicEvolving for new things
> >
> > I share Stephan's concern that, with @PublicEvolving used for every new
> > feature and rarely upgraded to @Public, we are practically making no
> > compatibility guarantee between minor versions (x.y.* / x.z.*). On the
> > other hand, I think in many circumstances we do need some time to collect
> > feedbacks for new features before we have enough confidence to make the
> > commitment that our APIs are stable. Therefore, it makes more sense to me
> > to first make new features @PublicEvolving and then upgrade to @Public in
> > the next one or two releases (unless there's a good reason to further
> > postpone it).
> >
> > I think the key point is how do we make sure the @PublicEvolving features
> > upgrade to @Public. Maybe we can add a parameter to indicate the expected
> > upgrading version. E.g., a new feature introduced in release 1.10.0 might
> > be annotated as @PublicEvolving("1.12.0"), indicating that it is expected
> > to be upgraded to @Public in release 1.12.0. We can check the annotations
> > against the version automatically, forcing to either upgrad the feature
> > to @Public or explicitly postpone it by modifying the annotation
> parameter
> > (if there's a good reason).
> >
> > Additionally, we can do the similar for deprecated features / APIs,
> > reminding us to remove things annotated as @Deprecated at certain time.
> >
> > Thank you~
> >
> > Xint

Re: [DISCUSS] increase "state.backend.fs.memory-threshold" from 1K to 100K

2020-05-15 Thread Yun Tang
Please correct me if I am wrong, "put the increased value into the default 
configuration" means
we will update that in default flink-conf.yaml but still leave the default 
value of `state.backend.fs.memory-threshold`as previously?
It seems I did not get the point why existing setups with existing configs will 
not be affected.

The concern I raised is because one of our large-scale job with 1024 
parallelism source of union state meet the JM OOM problem when we increase this 
value.
I think if we introduce memory control when serializing TDD asynchronously [1], 
we could be much more confident to increase this configuration as the memory 
footprint
expands at that time by a lot of serialized TDDs.


[1] 
https://github.com/apache/flink/blob/32bd0944d0519093c0a4d5d809c6636eb3a7fc31/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L752

Best
Yun Tang


From: Stephan Ewen 
Sent: Friday, May 15, 2020 16:53
To: dev 
Cc: Till Rohrmann ; Piotr Nowojski 
Subject: Re: [DISCUSS] increase "state.backend.fs.memory-threshold" from 1K to 
100K

I see, thanks for all the input.

I agree with Yun Tang that the use of UnionState is problematic and can
cause issues in conjunction with this.
However, most of the large-scale users I know that also struggle with
UnionState have also increased this threshold, because with this low
threshold, they get an excess number of small files and overwhelm their
HDFS / S3 / etc.

An intermediate solution could be to put the increased value into the
default configuration. That way, existing setups with existing configs will
not be affected, but new users / installations will have a simper time.

Best,
Stephan


On Thu, May 14, 2020 at 9:20 PM Yun Tang  wrote:

> Tend to be not in favor of this proposal as union state is somewhat abused
> in several popular source connectors (e.g. kafka), and increasing this
> value could lead to JM OOM when sending tdd from JM to TMs with large
> parallelism.
>
> After we collect union state and initialize the map list [1], we already
> have union state ready to assign. At this time, the memory footprint has
> not increase too much as the union state which shared across tasks have the
> same reference of ByteStreamStateHandle. However, when we send tdd with the
> taskRestore to TMs, akka will serialize those ByteStreamStateHandle within
> tdd to increases the memory footprint. If the source have 1024
> parallelisms, and any one of the sub-task would then have 1024*100KB size
> state handles. The sum of total memory footprint cannot be ignored.
>
> If we plan to increase the default value of
> state.backend.fs.memory-threshold, we should first resolve the above case.
> In other words, this proposal could be a trade-off, which benefit perhaps
> 99% users, but might bring harmful effects to 1% user with large-scale
> flink jobs.
>
>
> [1]
> https://github.com/apache/flink/blob/c1ea6fcfd05c72a68739bda8bd16a2d1c15522c0/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/RoundRobinOperatorStateRepartitioner.java#L64-L87
>
> Best
> Yun Tang
>
>
> 
> From: Yu Li 
> Sent: Thursday, May 14, 2020 23:51
> To: Till Rohrmann 
> Cc: dev ; Piotr Nowojski 
> Subject: Re: [DISCUSS] increase "state.backend.fs.memory-threshold" from
> 1K to 100K
>
> TL;DR: I have some reservations but tend to be +1 for the proposal,
> meanwhile suggest we have a more thorough solution in the long run.
>
> Please correct me if I'm wrong, but it seems the root cause of the issue is
> too many small files generated.
>
> I have some concerns for the case of session cluster [1], as well as
> possible issues for users at large scale, otherwise I think increasing
> `state.backend.fs.memory-threshold` to 100K is a good choice, based on the
> assumption that a large portion of our users are running small jobs with
> small states.
>
> OTOH, maybe extending the solution [2] of resolving RocksDB small file
> problem (as proposed by FLINK-11937 [3]) to also support operator state
> could be an alternative? We have already applied the solution in production
> for operator state and solved the HDFS NN RPC bottleneck problem on last
> year's Singles' day.
>
> Best Regards,
> Yu
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-stable/concepts/glossary.html#flink-session-cluster
> [2]
>
> https://docs.google.com/document/d/1ukLfqNt44yqhDFL3uIhd68NevVdawccb6GflGNFzLcg
> <
> https://docs.google.com/document/d/1ukLfqNt44yqhDFL3uIhd68NevVdawccb6GflGNFzLcg/edit#heading=h.rl48knhoni0h
> >
> [3] https://issues.apache.org/jira/browse/FLINK-11937
>
>
> On Thu, 14 May 2020 at 21:45, Till Rohrmann  wrote:
>
> > I cannot say much about the concrete value but if our users have problems
> > with the existing default values, then it makes sense to me to change it.
> >
> > One thing to check could be whether it is possible to provide a
> meaningful
> > exception in case that the state size exceeds the fram

[VOTE] Guarantee that @PublicEvolving classes are API and binary compatible across bug fix releases (x.y.u and x.y.v)

2020-05-15 Thread Till Rohrmann
Dear community,

with reference to the dev ML thread about guaranteeing API and binary
compatibility for @PublicEvolving classes across bug fix releases [1] I
would like to start a vote about it.

The proposal is that the Flink community starts to guarantee
that @PublicEvolving classes will be API and binary compatible across bug
fix releases of the same minor version. This means that a version x.y.u is
API and binary compatible to x.y.v with u <= v wrt all @PublicEvolving
classes.

The voting options are the following:

* +1, Provide the above described guarantees
* -1, Do not provide the above described guarantees (please provide
specific comments)

The vote will be open for at least 72 hours. It is adopted by majority
approval with at least 3 PMC affirmative votes.

[1]
https://lists.apache.org/thread.html/rb0d0f887b291a490ed3773352c90ddf5f11e3d882dc501e3b8cf0ed0%40%3Cdev.flink.apache.org%3E

Cheers,
Till


[jira] [Created] (FLINK-17736) Add flink CEP examples

2020-05-15 Thread dengziming (Jira)
dengziming created FLINK-17736:
--

 Summary: Add flink CEP examples
 Key: FLINK-17736
 URL: https://issues.apache.org/jira/browse/FLINK-17736
 Project: Flink
  Issue Type: Improvement
  Components: Examples
Reporter: dengziming


There is not a flink cep example, we can add one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Exact feature freeze date

2020-05-15 Thread Jingsong Li
+1 for Monday morning.

Best,
Jingsong Lee

On Fri, May 15, 2020 at 8:45 PM Till Rohrmann  wrote:

> +1 from my side extend the feature freeze until Monday morning.
>
> Cheers,
> Till
>
> On Fri, May 15, 2020 at 2:04 PM Robert Metzger 
> wrote:
>
> > I'm okay, but I would suggest to agree on a time of day. What about
> Monday
> > morning in Europe?
> >
> > On Fri, May 15, 2020 at 1:43 PM Piotr Nowojski 
> > wrote:
> >
> > > Hi,
> > >
> > > Couple of contributors asked for extending cutting the release branch
> > > until Monday, what do you think about such extension?
> > >
> > > (+1 from my side)
> > >
> > > Piotrek
> > >
> > > > On 25 Apr 2020, at 21:24, Yu Li  wrote:
> > > >
> > > > +1 for extending the feature freeze to May 15th.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Fri, 24 Apr 2020 at 14:43, Yuan Mei 
> wrote:
> > > >
> > > >> +1
> > > >>
> > > >> On Thu, Apr 23, 2020 at 4:10 PM Stephan Ewen 
> > wrote:
> > > >>
> > > >>> Hi all!
> > > >>>
> > > >>> I want to bring up a discussion about when we want to do the
> feature
> > > >> freeze
> > > >>> for 1.11.
> > > >>>
> > > >>> When kicking off the release cycle, we tentatively set the date to
> > end
> > > of
> > > >>> April, which would be in one week.
> > > >>>
> > > >>> I can say from the features I am involved with (FLIP-27, FLIP-115,
> > > >>> reviewing some state backend improvements, etc.) that it would be
> > > helpful
> > > >>> to have two additional weeks.
> > > >>>
> > > >>> When looking at various other feature threads, my feeling is that
> > there
> > > >> are
> > > >>> more contributors and committers that could use a few more days.
> > > >>> The last two months were quite exceptional in and we did lose a bit
> > of
> > > >>> development speed here and there.
> > > >>>
> > > >>> How do you think about making *May 15th* the feature freeze?
> > > >>>
> > > >>> Best,
> > > >>> Stephan
> > > >>>
> > > >>
> > >
> > >
> >
>


-- 
Best, Jingsong Lee


Re: [DISCUSS] Stability guarantees for @PublicEvolving classes

2020-05-15 Thread Till Rohrmann
The vote thread can be found here
https://lists.apache.org/thread.html/rc58099fb0e31d0eac951a7bbf7f8bda8b7b65c9ed0c04622f5333745%40%3Cdev.flink.apache.org%3E
.

Cheers,
Till

On Fri, May 15, 2020 at 3:03 PM Till Rohrmann  wrote:

> I completely agree that there are many other aspect of our guarantees and
> processes around the @Public and @PublicEvolving classes which need to be
> discussed and properly defined. For the sake of keeping this discussion
> thread narrowly scoped, I would suggest to start a separate discussion
> about the following points (not exhaustive):
>
> - What should be annotated with @Public and @PublicEvolving?
> - Process for transforming @PublicEvolving into @Public; How to ensure
> that @PublicEvolving will eventually be promoted to @Public?
> - Process of retiring a @Public/@PublicEvolving API
>
> I will start a vote thread about the change I proposed here which is to
> ensure API and binary compatibility for @PublicEvolving classes between
> bugfix releases (x.y.z and x.y.u).
>
> Cheers,
> Till
>
> On Fri, May 15, 2020 at 6:33 AM Zhu Zhu  wrote:
>
>> +1 for "API + binary compatibility for @PublicEvolving classes for all bug
>> fix
>> releases in a minor release (x.y.z is compatible to x.y.u)"
>>
>> This @PublicEnvolving would then be a hard limit to changes.
>> So it's important to rethink the policy towards using it, as Stephan
>> proposed.
>>
>> I think any Flink interfaces that are visible to users should be
>> explicitly
>> marked as @Public or @PublicEnvolving.
>> Any other interfaces should not be marked as @Public/@PublicEnvolving.
>> This would be essential for us to check whether we are breaking any user
>> faced interfaces unexpectedly.
>> The only exception would be the case that we had to expose a method/class
>> due to implementation limitations, it should be explicitly marked it
>> as @Internal.
>>
>> Thanks,
>> Zhu Zhu
>>
>> Yun Tang  于2020年5月15日周五 上午11:41写道:
>>
>> > +1 for this idea, and I also like Xintong's suggestion to make it
>> > explicitly when the @PublicEvolving API could upgrade to @Public API.
>> > If we have the rule to upgrade API stable level but not define the clear
>> > timeline, I'm afraid not everyone have the enthusiasm to upgrade this.
>> >
>> > The minor suggestion is that I think two major release (which is x.y.0
>> as
>> > Chesnay clarified) might be a bit quick. From the release history [1],
>> > Flink bump major version every 3 ~ 6 months and two major release gap
>> > could only be at least half a year.
>> > I think half a year might be a bit too frequent for users to collect
>> > enough feedbacks, and upgrading API stable level every 3 major versions
>> > should be better.
>> >
>> > [1] https://flink.apache.org/downloads.html#flink
>> >
>> > Best
>> > Yun Tang
>> >
>> >
>> > 
>> > From: Xintong Song 
>> > Sent: Friday, May 15, 2020 11:04
>> > To: dev 
>> > Subject: Re: [DISCUSS] Stability guarantees for @PublicEvolving classes
>> >
>> > ### Documentation on API compatibility policies
>> >
>> > Do we have any formal documentation about the API compatibility
>> policies?
>> > The only things I found are:
>> >
>> >- In the release announcement (take 1.10.0 as an example) [1]:
>> >"This version is API-compatible with previous 1.x releases for APIs
>> >annotated with the @Public annotation."
>> >- JavaDoc for Public [2] and PublicEvolving [3].
>> >
>> > I think we might have a formal documentation, clearly state our policies
>> > for API compatibility.
>> >
>> >- What does the annotations mean
>> >- In what circumstance would the APIs remain compatible / become
>> >incompatible
>> >- How do APIs retire (e.g., first deprecated then removed?)
>> >
>> > Maybe there is already such kind of documentation that I overlooked? If
>> so,
>> > we probably want to make it more explicit and easy-to-find.
>> >
>> > ### @Public vs. @PublicEvolving for new things
>> >
>> > I share Stephan's concern that, with @PublicEvolving used for every new
>> > feature and rarely upgraded to @Public, we are practically making no
>> > compatibility guarantee between minor versions (x.y.* / x.z.*). On the
>> > other hand, I think in many circumstances we do need some time to
>> collect
>> > feedbacks for new features before we have enough confidence to make the
>> > commitment that our APIs are stable. Therefore, it makes more sense to
>> me
>> > to first make new features @PublicEvolving and then upgrade to @Public
>> in
>> > the next one or two releases (unless there's a good reason to further
>> > postpone it).
>> >
>> > I think the key point is how do we make sure the @PublicEvolving
>> features
>> > upgrade to @Public. Maybe we can add a parameter to indicate the
>> expected
>> > upgrading version. E.g., a new feature introduced in release 1.10.0
>> might
>> > be annotated as @PublicEvolving("1.12.0"), indicating that it is
>> expected
>> > to be upgraded to @Public in release 1.12.0. We can check the
>> 

Re: [VOTE] Guarantee that @PublicEvolving classes are API and binary compatible across bug fix releases (x.y.u and x.y.v)

2020-05-15 Thread Thomas Weise
+1


On Fri, May 15, 2020 at 6:15 AM Till Rohrmann  wrote:

> Dear community,
>
> with reference to the dev ML thread about guaranteeing API and binary
> compatibility for @PublicEvolving classes across bug fix releases [1] I
> would like to start a vote about it.
>
> The proposal is that the Flink community starts to guarantee
> that @PublicEvolving classes will be API and binary compatible across bug
> fix releases of the same minor version. This means that a version x.y.u is
> API and binary compatible to x.y.v with u <= v wrt all @PublicEvolving
> classes.
>
> The voting options are the following:
>
> * +1, Provide the above described guarantees
> * -1, Do not provide the above described guarantees (please provide
> specific comments)
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval with at least 3 PMC affirmative votes.
>
> [1]
>
> https://lists.apache.org/thread.html/rb0d0f887b291a490ed3773352c90ddf5f11e3d882dc501e3b8cf0ed0%40%3Cdev.flink.apache.org%3E
>
> Cheers,
> Till
>


Re: [VOTE] Guarantee that @PublicEvolving classes are API and binary compatible across bug fix releases (x.y.u and x.y.v)

2020-05-15 Thread Jingsong Li
+1 (non-binding)

Best,
Jingsong Lee

On Fri, May 15, 2020 at 9:26 PM Thomas Weise  wrote:

> +1
>
>
> On Fri, May 15, 2020 at 6:15 AM Till Rohrmann 
> wrote:
>
> > Dear community,
> >
> > with reference to the dev ML thread about guaranteeing API and binary
> > compatibility for @PublicEvolving classes across bug fix releases [1] I
> > would like to start a vote about it.
> >
> > The proposal is that the Flink community starts to guarantee
> > that @PublicEvolving classes will be API and binary compatible across bug
> > fix releases of the same minor version. This means that a version x.y.u
> is
> > API and binary compatible to x.y.v with u <= v wrt all @PublicEvolving
> > classes.
> >
> > The voting options are the following:
> >
> > * +1, Provide the above described guarantees
> > * -1, Do not provide the above described guarantees (please provide
> > specific comments)
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval with at least 3 PMC affirmative votes.
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/rb0d0f887b291a490ed3773352c90ddf5f11e3d882dc501e3b8cf0ed0%40%3Cdev.flink.apache.org%3E
> >
> > Cheers,
> > Till
> >
>


-- 
Best, Jingsong Lee


Re: [DISCUSS] Exact feature freeze date

2020-05-15 Thread Yun Tang
+1 for Monday morning in Europe.

Best
Yun Tang

From: Jingsong Li 
Sent: Friday, May 15, 2020 21:17
To: dev 
Subject: Re: [DISCUSS] Exact feature freeze date

+1 for Monday morning.

Best,
Jingsong Lee

On Fri, May 15, 2020 at 8:45 PM Till Rohrmann  wrote:

> +1 from my side extend the feature freeze until Monday morning.
>
> Cheers,
> Till
>
> On Fri, May 15, 2020 at 2:04 PM Robert Metzger 
> wrote:
>
> > I'm okay, but I would suggest to agree on a time of day. What about
> Monday
> > morning in Europe?
> >
> > On Fri, May 15, 2020 at 1:43 PM Piotr Nowojski 
> > wrote:
> >
> > > Hi,
> > >
> > > Couple of contributors asked for extending cutting the release branch
> > > until Monday, what do you think about such extension?
> > >
> > > (+1 from my side)
> > >
> > > Piotrek
> > >
> > > > On 25 Apr 2020, at 21:24, Yu Li  wrote:
> > > >
> > > > +1 for extending the feature freeze to May 15th.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Fri, 24 Apr 2020 at 14:43, Yuan Mei 
> wrote:
> > > >
> > > >> +1
> > > >>
> > > >> On Thu, Apr 23, 2020 at 4:10 PM Stephan Ewen 
> > wrote:
> > > >>
> > > >>> Hi all!
> > > >>>
> > > >>> I want to bring up a discussion about when we want to do the
> feature
> > > >> freeze
> > > >>> for 1.11.
> > > >>>
> > > >>> When kicking off the release cycle, we tentatively set the date to
> > end
> > > of
> > > >>> April, which would be in one week.
> > > >>>
> > > >>> I can say from the features I am involved with (FLIP-27, FLIP-115,
> > > >>> reviewing some state backend improvements, etc.) that it would be
> > > helpful
> > > >>> to have two additional weeks.
> > > >>>
> > > >>> When looking at various other feature threads, my feeling is that
> > there
> > > >> are
> > > >>> more contributors and committers that could use a few more days.
> > > >>> The last two months were quite exceptional in and we did lose a bit
> > of
> > > >>> development speed here and there.
> > > >>>
> > > >>> How do you think about making *May 15th* the feature freeze?
> > > >>>
> > > >>> Best,
> > > >>> Stephan
> > > >>>
> > > >>
> > >
> > >
> >
>


--
Best, Jingsong Lee


[jira] [Created] (FLINK-17737) KeyedStateCheckpointingITCase fails in UC mode

2020-05-15 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-17737:
-

 Summary: KeyedStateCheckpointingITCase fails in UC mode
 Key: FLINK-17737
 URL: https://issues.apache.org/jira/browse/FLINK-17737
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing, Runtime / Task, Tests
Affects Versions: 1.11.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Exact feature freeze date

2020-05-15 Thread Zhijiang
+1 for Monday morning in Europe.

Best,
Zhijiang


--
From:Yun Tang 
Send Time:2020年5月15日(星期五) 21:58
To:dev 
Subject:Re: [DISCUSS] Exact feature freeze date

+1 for Monday morning in Europe.

Best
Yun Tang

From: Jingsong Li 
Sent: Friday, May 15, 2020 21:17
To: dev 
Subject: Re: [DISCUSS] Exact feature freeze date

+1 for Monday morning.

Best,
Jingsong Lee

On Fri, May 15, 2020 at 8:45 PM Till Rohrmann  wrote:

> +1 from my side extend the feature freeze until Monday morning.
>
> Cheers,
> Till
>
> On Fri, May 15, 2020 at 2:04 PM Robert Metzger 
> wrote:
>
> > I'm okay, but I would suggest to agree on a time of day. What about
> Monday
> > morning in Europe?
> >
> > On Fri, May 15, 2020 at 1:43 PM Piotr Nowojski 
> > wrote:
> >
> > > Hi,
> > >
> > > Couple of contributors asked for extending cutting the release branch
> > > until Monday, what do you think about such extension?
> > >
> > > (+1 from my side)
> > >
> > > Piotrek
> > >
> > > > On 25 Apr 2020, at 21:24, Yu Li  wrote:
> > > >
> > > > +1 for extending the feature freeze to May 15th.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Fri, 24 Apr 2020 at 14:43, Yuan Mei 
> wrote:
> > > >
> > > >> +1
> > > >>
> > > >> On Thu, Apr 23, 2020 at 4:10 PM Stephan Ewen 
> > wrote:
> > > >>
> > > >>> Hi all!
> > > >>>
> > > >>> I want to bring up a discussion about when we want to do the
> feature
> > > >> freeze
> > > >>> for 1.11.
> > > >>>
> > > >>> When kicking off the release cycle, we tentatively set the date to
> > end
> > > of
> > > >>> April, which would be in one week.
> > > >>>
> > > >>> I can say from the features I am involved with (FLIP-27, FLIP-115,
> > > >>> reviewing some state backend improvements, etc.) that it would be
> > > helpful
> > > >>> to have two additional weeks.
> > > >>>
> > > >>> When looking at various other feature threads, my feeling is that
> > there
> > > >> are
> > > >>> more contributors and committers that could use a few more days.
> > > >>> The last two months were quite exceptional in and we did lose a bit
> > of
> > > >>> development speed here and there.
> > > >>>
> > > >>> How do you think about making *May 15th* the feature freeze?
> > > >>>
> > > >>> Best,
> > > >>> Stephan
> > > >>>
> > > >>
> > >
> > >
> >
>


--
Best, Jingsong Lee



Re: [VOTE] Guarantee that @PublicEvolving classes are API and binary compatible across bug fix releases (x.y.u and x.y.v)

2020-05-15 Thread Zhijiang
Sounds good, +1.

Best,
Zhijiang


--
From:Thomas Weise 
Send Time:2020年5月15日(星期五) 21:33
To:dev 
Subject:Re: [VOTE] Guarantee that @PublicEvolving classes are API and binary 
compatible across bug fix releases (x.y.u and x.y.v)

+1


On Fri, May 15, 2020 at 6:15 AM Till Rohrmann  wrote:

> Dear community,
>
> with reference to the dev ML thread about guaranteeing API and binary
> compatibility for @PublicEvolving classes across bug fix releases [1] I
> would like to start a vote about it.
>
> The proposal is that the Flink community starts to guarantee
> that @PublicEvolving classes will be API and binary compatible across bug
> fix releases of the same minor version. This means that a version x.y.u is
> API and binary compatible to x.y.v with u <= v wrt all @PublicEvolving
> classes.
>
> The voting options are the following:
>
> * +1, Provide the above described guarantees
> * -1, Do not provide the above described guarantees (please provide
> specific comments)
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval with at least 3 PMC affirmative votes.
>
> [1]
>
> https://lists.apache.org/thread.html/rb0d0f887b291a490ed3773352c90ddf5f11e3d882dc501e3b8cf0ed0%40%3Cdev.flink.apache.org%3E
>
> Cheers,
> Till
>



[jira] [Created] (FLINK-17738) Remove legacy scheduler option

2020-05-15 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-17738:
-

 Summary: Remove legacy scheduler option
 Key: FLINK-17738
 URL: https://issues.apache.org/jira/browse/FLINK-17738
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.11.0


In order to no longer support the {{LegacyScheduler}} we can remove the 
{{jobmanager.scheduler}} as a first step. This would also help the FLIP-27 
effort since FLIP-27 is only supported by the new scheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17739) ResultPartitionTest.testInitializeMoreStateThanBuffer is unstable

2020-05-15 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-17739:
--

 Summary: ResultPartitionTest.testInitializeMoreStateThanBuffer is 
unstable
 Key: FLINK-17739
 URL: https://issues.apache.org/jira/browse/FLINK-17739
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Reporter: Piotr Nowojski
 Fix For: 1.11.0


When run in loop, after ~50-100 runs it throws:

{noformat}
java.lang.AssertionError: 
Expected :2
Actual   :1


at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at 
org.apache.flink.runtime.io.network.partition.ResultPartitionTest.testInitializeMoreStateThanBuffer(ResultPartitionTest.java:525)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Guarantee that @PublicEvolving classes are API and binary compatible across bug fix releases (x.y.u and x.y.v)

2020-05-15 Thread Ufuk Celebi
+1

– Ufuk


On Fri, May 15, 2020 at 4:54 PM Zhijiang 
wrote:

> Sounds good, +1.
>
> Best,
> Zhijiang
>
>
> --
> From:Thomas Weise 
> Send Time:2020年5月15日(星期五) 21:33
> To:dev 
> Subject:Re: [VOTE] Guarantee that @PublicEvolving classes are API and
> binary compatible across bug fix releases (x.y.u and x.y.v)
>
> +1
>
>
> On Fri, May 15, 2020 at 6:15 AM Till Rohrmann 
> wrote:
>
> > Dear community,
> >
> > with reference to the dev ML thread about guaranteeing API and binary
> > compatibility for @PublicEvolving classes across bug fix releases [1] I
> > would like to start a vote about it.
> >
> > The proposal is that the Flink community starts to guarantee
> > that @PublicEvolving classes will be API and binary compatible across bug
> > fix releases of the same minor version. This means that a version x.y.u
> is
> > API and binary compatible to x.y.v with u <= v wrt all @PublicEvolving
> > classes.
> >
> > The voting options are the following:
> >
> > * +1, Provide the above described guarantees
> > * -1, Do not provide the above described guarantees (please provide
> > specific comments)
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval with at least 3 PMC affirmative votes.
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/rb0d0f887b291a490ed3773352c90ddf5f11e3d882dc501e3b8cf0ed0%40%3Cdev.flink.apache.org%3E
> >
> > Cheers,
> > Till
> >
>
>


Re: [VOTE] Guarantee that @PublicEvolving classes are API and binary compatible across bug fix releases (x.y.u and x.y.v)

2020-05-15 Thread Yuan Mei
+1

On Sat, May 16, 2020 at 12:21 AM Ufuk Celebi  wrote:

> +1
>
> – Ufuk
>
>
> On Fri, May 15, 2020 at 4:54 PM Zhijiang  .invalid>
> wrote:
>
> > Sounds good, +1.
> >
> > Best,
> > Zhijiang
> >
> >
> > --
> > From:Thomas Weise 
> > Send Time:2020年5月15日(星期五) 21:33
> > To:dev 
> > Subject:Re: [VOTE] Guarantee that @PublicEvolving classes are API and
> > binary compatible across bug fix releases (x.y.u and x.y.v)
> >
> > +1
> >
> >
> > On Fri, May 15, 2020 at 6:15 AM Till Rohrmann 
> > wrote:
> >
> > > Dear community,
> > >
> > > with reference to the dev ML thread about guaranteeing API and binary
> > > compatibility for @PublicEvolving classes across bug fix releases [1] I
> > > would like to start a vote about it.
> > >
> > > The proposal is that the Flink community starts to guarantee
> > > that @PublicEvolving classes will be API and binary compatible across
> bug
> > > fix releases of the same minor version. This means that a version x.y.u
> > is
> > > API and binary compatible to x.y.v with u <= v wrt all @PublicEvolving
> > > classes.
> > >
> > > The voting options are the following:
> > >
> > > * +1, Provide the above described guarantees
> > > * -1, Do not provide the above described guarantees (please provide
> > > specific comments)
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval with at least 3 PMC affirmative votes.
> > >
> > > [1]
> > >
> > >
> >
> https://lists.apache.org/thread.html/rb0d0f887b291a490ed3773352c90ddf5f11e3d882dc501e3b8cf0ed0%40%3Cdev.flink.apache.org%3E
> > >
> > > Cheers,
> > > Till
> > >
> >
> >
>


Re: [VOTE] Guarantee that @PublicEvolving classes are API and binary compatible across bug fix releases (x.y.u and x.y.v)

2020-05-15 Thread Yangze Guo
+1

Best,
Yangze Guo

On Sat, May 16, 2020 at 12:26 AM Yuan Mei  wrote:
>
> +1
>
> On Sat, May 16, 2020 at 12:21 AM Ufuk Celebi  wrote:
>
> > +1
> >
> > – Ufuk
> >
> >
> > On Fri, May 15, 2020 at 4:54 PM Zhijiang  > .invalid>
> > wrote:
> >
> > > Sounds good, +1.
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > > --
> > > From:Thomas Weise 
> > > Send Time:2020年5月15日(星期五) 21:33
> > > To:dev 
> > > Subject:Re: [VOTE] Guarantee that @PublicEvolving classes are API and
> > > binary compatible across bug fix releases (x.y.u and x.y.v)
> > >
> > > +1
> > >
> > >
> > > On Fri, May 15, 2020 at 6:15 AM Till Rohrmann 
> > > wrote:
> > >
> > > > Dear community,
> > > >
> > > > with reference to the dev ML thread about guaranteeing API and binary
> > > > compatibility for @PublicEvolving classes across bug fix releases [1] I
> > > > would like to start a vote about it.
> > > >
> > > > The proposal is that the Flink community starts to guarantee
> > > > that @PublicEvolving classes will be API and binary compatible across
> > bug
> > > > fix releases of the same minor version. This means that a version x.y.u
> > > is
> > > > API and binary compatible to x.y.v with u <= v wrt all @PublicEvolving
> > > > classes.
> > > >
> > > > The voting options are the following:
> > > >
> > > > * +1, Provide the above described guarantees
> > > > * -1, Do not provide the above described guarantees (please provide
> > > > specific comments)
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by majority
> > > > approval with at least 3 PMC affirmative votes.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/rb0d0f887b291a490ed3773352c90ddf5f11e3d882dc501e3b8cf0ed0%40%3Cdev.flink.apache.org%3E
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > >
> > >
> >


[jira] [Created] (FLINK-17740) Remove flink-container/kubernetes

2020-05-15 Thread Andrey Zagrebin (Jira)
Andrey Zagrebin created FLINK-17740:
---

 Summary: Remove flink-container/kubernetes
 Key: FLINK-17740
 URL: https://issues.apache.org/jira/browse/FLINK-17740
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Docker, Deployment / Kubernetes
Reporter: Andrey Zagrebin
Assignee: Chesnay Schepler
 Fix For: 1.11.0


FLINK-17161 added Kubernetes integration examples for Job Cluster.
FLINK-17656 copies job service yaml from flink-container/kubernetes to e2e 
Kubernetes Job Cluster test making them independent.
Therefore, we do not need flink-container/kubernetes and it can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17741) Create integration or e2e test for out of order (savepoint) barriers

2020-05-15 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-17741:
-

 Summary: Create integration or e2e test for out of order 
(savepoint) barriers
 Key: FLINK-17741
 URL: https://issues.apache.org/jira/browse/FLINK-17741
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: Roman Khachatryan


Unaligned checkpoints MVP assumes at most one barrier at a time. This is 
ensured by max-concurrent-checkpoints on CheckpointCoordinator.

But it seems possible that CheckpointCoordinator considers a checkpoint 
canceled, starts a new one, while some old barriers are still flowing in the 
graph.

As of now, this situation isn't tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17742) Blob file not removed after task was cancelled in standalone mode

2020-05-15 Thread lisen (Jira)
lisen created FLINK-17742:
-

 Summary: Blob file not removed after task was cancelled in 
standalone mode
 Key: FLINK-17742
 URL: https://issues.apache.org/jira/browse/FLINK-17742
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.10.0
Reporter: lisen
 Fix For: 1.10.0


In standalone mode, i submit a job with my user jars, we know, the jars will be 
upload to blob server, and the TaskExecutor will be download blob file to 
local, So, jobmanager node and taskmanager node have two blob file at the same 
time. But the situation is,when the job finished or canceled, jobmanager node's 
blob file is removed and taskmanager is not. I think there is a risk of 
bursting here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17743) Python e2e failed by Cannot run program "venv.zip/.conda/bin/python"

2020-05-15 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-17743:


 Summary: Python e2e failed by Cannot run program 
"venv.zip/.conda/bin/python"
 Key: FLINK-17743
 URL: https://issues.apache.org/jira/browse/FLINK-17743
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: Jingsong Lee
 Fix For: 1.11.0


[https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_apis/build/builds/1457/logs/158]

[https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_apis/build/builds/1435/logs/157]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17744) StreamContextEnvironment#execute cannot be call JobListener#onJobExecuted

2020-05-15 Thread lisen (Jira)
lisen created FLINK-17744:
-

 Summary: StreamContextEnvironment#execute cannot be call 
JobListener#onJobExecuted
 Key: FLINK-17744
 URL: https://issues.apache.org/jira/browse/FLINK-17744
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission
Affects Versions: 1.10.0
Reporter: lisen
 Fix For: 1.10.0


When I register a jobListener  to stream environment. I want  
JobListener#onJobExecuted is executed when job was finished or cancelled. But 
in StreamContextEnvironment, the method  JobListener#onJobExecuted is not 
called now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Exact feature freeze date

2020-05-15 Thread Danny Chan
+1 for Monday morning in Europe.

Best,
Danny Chan
在 2020年5月15日 +0800 PM10:51,Zhijiang ,写道:
> +1 for Monday morning in Europe.
>
> Best,
> Zhijiang
>
>
> --
> From:Yun Tang 
> Send Time:2020年5月15日(星期五) 21:58
> To:dev 
> Subject:Re: [DISCUSS] Exact feature freeze date
>
> +1 for Monday morning in Europe.
>
> Best
> Yun Tang
> 
> From: Jingsong Li 
> Sent: Friday, May 15, 2020 21:17
> To: dev 
> Subject: Re: [DISCUSS] Exact feature freeze date
>
> +1 for Monday morning.
>
> Best,
> Jingsong Lee
>
> On Fri, May 15, 2020 at 8:45 PM Till Rohrmann  wrote:
>
> > +1 from my side extend the feature freeze until Monday morning.
> >
> > Cheers,
> > Till
> >
> > On Fri, May 15, 2020 at 2:04 PM Robert Metzger 
> > wrote:
> >
> > > I'm okay, but I would suggest to agree on a time of day. What about
> > Monday
> > > morning in Europe?
> > >
> > > On Fri, May 15, 2020 at 1:43 PM Piotr Nowojski 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Couple of contributors asked for extending cutting the release branch
> > > > until Monday, what do you think about such extension?
> > > >
> > > > (+1 from my side)
> > > >
> > > > Piotrek
> > > >
> > > > > On 25 Apr 2020, at 21:24, Yu Li  wrote:
> > > > >
> > > > > +1 for extending the feature freeze to May 15th.
> > > > >
> > > > > Best Regards,
> > > > > Yu
> > > > >
> > > > >
> > > > > On Fri, 24 Apr 2020 at 14:43, Yuan Mei 
> > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > On Thu, Apr 23, 2020 at 4:10 PM Stephan Ewen 
> > > wrote:
> > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > I want to bring up a discussion about when we want to do the
> > feature
> > > > > > freeze
> > > > > > > for 1.11.
> > > > > > >
> > > > > > > When kicking off the release cycle, we tentatively set the date to
> > > end
> > > > of
> > > > > > > April, which would be in one week.
> > > > > > >
> > > > > > > I can say from the features I am involved with (FLIP-27, FLIP-115,
> > > > > > > reviewing some state backend improvements, etc.) that it would be
> > > > helpful
> > > > > > > to have two additional weeks.
> > > > > > >
> > > > > > > When looking at various other feature threads, my feeling is that
> > > there
> > > > > > are
> > > > > > > more contributors and committers that could use a few more days.
> > > > > > > The last two months were quite exceptional in and we did lose a 
> > > > > > > bit
> > > of
> > > > > > > development speed here and there.
> > > > > > >
> > > > > > > How do you think about making *May 15th* the feature freeze?
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > >
> > > >
> > > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>


[jira] [Created] (FLINK-17745) PackagedProgram' extractedTempLibraries and jarfiles may be duplicate

2020-05-15 Thread lisen (Jira)
lisen created FLINK-17745:
-

 Summary: PackagedProgram' extractedTempLibraries and jarfiles may 
be duplicate
 Key: FLINK-17745
 URL: https://issues.apache.org/jira/browse/FLINK-17745
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission
Affects Versions: 1.10.0
Reporter: lisen
 Fix For: 1.10.0


When i submit a flink app with a fat jar, PackagedProgram will extracted temp 
libraries by the fat jar, and add to pipeline.jars, and the pipeline.jars 
contains  fat jar and temp libraries. I don't think we should add fat jar to 
the pipeline.jars if extractedTempLibraries is not empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17746) PackagedProgram' extractedTempLibraries and jarfiles may be duplicate

2020-05-15 Thread lisen (Jira)
lisen created FLINK-17746:
-

 Summary: PackagedProgram' extractedTempLibraries and jarfiles may 
be duplicate
 Key: FLINK-17746
 URL: https://issues.apache.org/jira/browse/FLINK-17746
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission
Affects Versions: 1.10.0
Reporter: lisen
 Fix For: 1.10.0


When i submit a flink app with a fat jar, PackagedProgram will extracted temp 
libraries by the fat jar, and add to pipeline.jars, and the pipeline.jars 
contains  fat jar and temp libraries. I don't think we should add fat jar to 
the pipeline.jars if extractedTempLibraries is not empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17747) PackagedProgram' extractedTempLibraries and jarfiles may be duplicate

2020-05-15 Thread lisen (Jira)
lisen created FLINK-17747:
-

 Summary: PackagedProgram' extractedTempLibraries and jarfiles may 
be duplicate
 Key: FLINK-17747
 URL: https://issues.apache.org/jira/browse/FLINK-17747
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission
Affects Versions: 1.10.0
Reporter: lisen
 Fix For: 1.10.0


When i submit a flink app with a fat jar, PackagedProgram will extracted temp 
libraries by the fat jar, and add to pipeline.jars, and the pipeline.jars 
contains  fat jar and temp libraries. I don't think we should add fat jar to 
the pipeline.jars if extractedTempLibraries is not empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17748) Remove registration of TableSource/TableSink in Table Env

2020-05-15 Thread Kurt Young (Jira)
Kurt Young created FLINK-17748:
--

 Summary: Remove registration of TableSource/TableSink in Table Env 
 Key: FLINK-17748
 URL: https://issues.apache.org/jira/browse/FLINK-17748
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Kurt Young
Assignee: Zhenghua Gao
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17749) Remove fromTableSource method from TableEnvironment

2020-05-15 Thread Kurt Young (Jira)
Kurt Young created FLINK-17749:
--

 Summary: Remove fromTableSource method from TableEnvironment 
 Key: FLINK-17749
 URL: https://issues.apache.org/jira/browse/FLINK-17749
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Kurt Young
Assignee: Kurt Young
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Exact feature freeze date

2020-05-15 Thread Congxian Qiu
+1 for Monday morning in Europe.
Best,
Congxian


Danny Chan  于2020年5月16日周六 上午10:40写道:

> +1 for Monday morning in Europe.
>
> Best,
> Danny Chan
> 在 2020年5月15日 +0800 PM10:51,Zhijiang  .invalid>,写道:
> > +1 for Monday morning in Europe.
> >
> > Best,
> > Zhijiang
> >
> >
> > --
> > From:Yun Tang 
> > Send Time:2020年5月15日(星期五) 21:58
> > To:dev 
> > Subject:Re: [DISCUSS] Exact feature freeze date
> >
> > +1 for Monday morning in Europe.
> >
> > Best
> > Yun Tang
> > 
> > From: Jingsong Li 
> > Sent: Friday, May 15, 2020 21:17
> > To: dev 
> > Subject: Re: [DISCUSS] Exact feature freeze date
> >
> > +1 for Monday morning.
> >
> > Best,
> > Jingsong Lee
> >
> > On Fri, May 15, 2020 at 8:45 PM Till Rohrmann 
> wrote:
> >
> > > +1 from my side extend the feature freeze until Monday morning.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, May 15, 2020 at 2:04 PM Robert Metzger 
> > > wrote:
> > >
> > > > I'm okay, but I would suggest to agree on a time of day. What about
> > > Monday
> > > > morning in Europe?
> > > >
> > > > On Fri, May 15, 2020 at 1:43 PM Piotr Nowojski 
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Couple of contributors asked for extending cutting the release
> branch
> > > > > until Monday, what do you think about such extension?
> > > > >
> > > > > (+1 from my side)
> > > > >
> > > > > Piotrek
> > > > >
> > > > > > On 25 Apr 2020, at 21:24, Yu Li  wrote:
> > > > > >
> > > > > > +1 for extending the feature freeze to May 15th.
> > > > > >
> > > > > > Best Regards,
> > > > > > Yu
> > > > > >
> > > > > >
> > > > > > On Fri, 24 Apr 2020 at 14:43, Yuan Mei 
> > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > On Thu, Apr 23, 2020 at 4:10 PM Stephan Ewen  >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi all!
> > > > > > > >
> > > > > > > > I want to bring up a discussion about when we want to do the
> > > feature
> > > > > > > freeze
> > > > > > > > for 1.11.
> > > > > > > >
> > > > > > > > When kicking off the release cycle, we tentatively set the
> date to
> > > > end
> > > > > of
> > > > > > > > April, which would be in one week.
> > > > > > > >
> > > > > > > > I can say from the features I am involved with (FLIP-27,
> FLIP-115,
> > > > > > > > reviewing some state backend improvements, etc.) that it
> would be
> > > > > helpful
> > > > > > > > to have two additional weeks.
> > > > > > > >
> > > > > > > > When looking at various other feature threads, my feeling is
> that
> > > > there
> > > > > > > are
> > > > > > > > more contributors and committers that could use a few more
> days.
> > > > > > > > The last two months were quite exceptional in and we did
> lose a bit
> > > > of
> > > > > > > > development speed here and there.
> > > > > > > >
> > > > > > > > How do you think about making *May 15th* the feature freeze?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >
>


Re: [DISCUSS] increase "state.backend.fs.memory-threshold" from 1K to 100K

2020-05-15 Thread Congxian Qiu
Hi,

Overall, I agree with increasing this value. but the default value set to
100K maybe something too large from my side.

I want to share some more information from my side.

The small files problem is indeed a problem many users may encounter in
production env. The states(Keyed state and Operator state) can become small
files in DFS, but increase the value of `state.backend.fs.memory-threshold`
may encounter the JM OOM problem as Yun said previously.
We've tried increase this value in our production env, but some connectors
which UnionState prevent us to do this, the memory consumed by these jobs
can be very large (in our case, thousands of parallelism, set
`state.backend.fs.memory-threshold` to 64K, will consume 10G+ memory for
JM), so in the end, we use the solution proposed in FLINK-11937[1] for both
keyed state and operator state.


[1] https://issues.apache.org/jira/browse/FLINK-11937
Best,
Congxian


Yun Tang  于2020年5月15日周五 下午9:09写道:

> Please correct me if I am wrong, "put the increased value into the default
> configuration" means
> we will update that in default flink-conf.yaml but still leave the default
> value of `state.backend.fs.memory-threshold`as previously?
> It seems I did not get the point why existing setups with existing configs
> will not be affected.
>
> The concern I raised is because one of our large-scale job with 1024
> parallelism source of union state meet the JM OOM problem when we increase
> this value.
> I think if we introduce memory control when serializing TDD asynchronously
> [1], we could be much more confident to increase this configuration as the
> memory footprint
> expands at that time by a lot of serialized TDDs.
>
>
> [1]
> https://github.com/apache/flink/blob/32bd0944d0519093c0a4d5d809c6636eb3a7fc31/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L752
>
> Best
> Yun Tang
>
> 
> From: Stephan Ewen 
> Sent: Friday, May 15, 2020 16:53
> To: dev 
> Cc: Till Rohrmann ; Piotr Nowojski <
> pi...@ververica.com>
> Subject: Re: [DISCUSS] increase "state.backend.fs.memory-threshold" from
> 1K to 100K
>
> I see, thanks for all the input.
>
> I agree with Yun Tang that the use of UnionState is problematic and can
> cause issues in conjunction with this.
> However, most of the large-scale users I know that also struggle with
> UnionState have also increased this threshold, because with this low
> threshold, they get an excess number of small files and overwhelm their
> HDFS / S3 / etc.
>
> An intermediate solution could be to put the increased value into the
> default configuration. That way, existing setups with existing configs will
> not be affected, but new users / installations will have a simper time.
>
> Best,
> Stephan
>
>
> On Thu, May 14, 2020 at 9:20 PM Yun Tang  wrote:
>
> > Tend to be not in favor of this proposal as union state is somewhat
> abused
> > in several popular source connectors (e.g. kafka), and increasing this
> > value could lead to JM OOM when sending tdd from JM to TMs with large
> > parallelism.
> >
> > After we collect union state and initialize the map list [1], we already
> > have union state ready to assign. At this time, the memory footprint has
> > not increase too much as the union state which shared across tasks have
> the
> > same reference of ByteStreamStateHandle. However, when we send tdd with
> the
> > taskRestore to TMs, akka will serialize those ByteStreamStateHandle
> within
> > tdd to increases the memory footprint. If the source have 1024
> > parallelisms, and any one of the sub-task would then have 1024*100KB size
> > state handles. The sum of total memory footprint cannot be ignored.
> >
> > If we plan to increase the default value of
> > state.backend.fs.memory-threshold, we should first resolve the above
> case.
> > In other words, this proposal could be a trade-off, which benefit perhaps
> > 99% users, but might bring harmful effects to 1% user with large-scale
> > flink jobs.
> >
> >
> > [1]
> >
> https://github.com/apache/flink/blob/c1ea6fcfd05c72a68739bda8bd16a2d1c15522c0/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/RoundRobinOperatorStateRepartitioner.java#L64-L87
> >
> > Best
> > Yun Tang
> >
> >
> > 
> > From: Yu Li 
> > Sent: Thursday, May 14, 2020 23:51
> > To: Till Rohrmann 
> > Cc: dev ; Piotr Nowojski 
> > Subject: Re: [DISCUSS] increase "state.backend.fs.memory-threshold" from
> > 1K to 100K
> >
> > TL;DR: I have some reservations but tend to be +1 for the proposal,
> > meanwhile suggest we have a more thorough solution in the long run.
> >
> > Please correct me if I'm wrong, but it seems the root cause of the issue
> is
> > too many small files generated.
> >
> > I have some concerns for the case of session cluster [1], as well as
> > possible issues for users at large scale, otherwise I think increasing
> > `state.backend.fs.memory-threshold` to 100K is a good choice, based on
> the
> > assu

Google Season of Docs

2020-05-15 Thread Amr Maghraby
Dear Apace Flink,
My name is Amr Maghraby, I am a new graduate from AAST college got the
first rank on my class with CGPA 3.92 and joined the international
competition in the US called ROV got the second worldwide and last summer I
have involved in Google Summer of code 2019 and did good work also, I
participated in problem-solving competitions ACM ACPC and Hash Code. I was
asking if I could apply for GSOD?
Waiting for your reply.
Thanks,
Amr Maghraby


regarding Google Season of Docs

2020-05-15 Thread Yuvraj Manral
Respected sir/mam,

I came around the projects proposed by Apache Flink for Season of Docs 2020.
I am a newbie to organisation but really liked the ideas of projects and
would love to start contributing and prepare my proposal for Season of Docs.

Please guide me through. Where should I start and then proceed ?
Thanking you in anticipation

Yuvraj Manral 
RSVP


Interested In Google Season of Docs!

2020-05-15 Thread Roopal Jain
Hello Flink Dev Community!

I am interested in participating in Google Season of Docs for Apache Flink.
I went through the FLIP-60 detailed proposal and thought this is something
I could do well. I am currently working as a software engineer and have a
B.E in Computer Engineering from one of India's reputed engineering
colleges. I have prior open-source contribution with mentoring for Google
Summer of Code and Google Code-In.
I have prior work experience on Apache Spark and a good grasp on SQL, Java,
and Python.
Please guide me more on how to get started?

Thanks & Regards,
Roopal Jain


Re: [DISCUSS] Stability guarantees for @PublicEvolving classes

2020-05-15 Thread Congxian Qiu
Sorry for the late jump in.

+1 to keep the compatibility of @PublicEvolving between minor
releases(x.y.a -> x.y.b), as a user I always think this as a bug-fix
release, break the compatibility between minor releases may give users a
surprise.

As the previous emails said, how and when will a @PublicEvolving
become @Public, and I'm not sure if we can have a technical solution to
keep such a rule. (In my opinion, check such things -- change
@PublicEvolving to @Public -- manually may not so easy)

Best
Congxian


Till Rohrmann  于2020年5月15日周五 下午9:18写道:

> The vote thread can be found here
>
> https://lists.apache.org/thread.html/rc58099fb0e31d0eac951a7bbf7f8bda8b7b65c9ed0c04622f5333745%40%3Cdev.flink.apache.org%3E
> .
>
> Cheers,
> Till
>
> On Fri, May 15, 2020 at 3:03 PM Till Rohrmann 
> wrote:
>
> > I completely agree that there are many other aspect of our guarantees and
> > processes around the @Public and @PublicEvolving classes which need to be
> > discussed and properly defined. For the sake of keeping this discussion
> > thread narrowly scoped, I would suggest to start a separate discussion
> > about the following points (not exhaustive):
> >
> > - What should be annotated with @Public and @PublicEvolving?
> > - Process for transforming @PublicEvolving into @Public; How to ensure
> > that @PublicEvolving will eventually be promoted to @Public?
> > - Process of retiring a @Public/@PublicEvolving API
> >
> > I will start a vote thread about the change I proposed here which is to
> > ensure API and binary compatibility for @PublicEvolving classes between
> > bugfix releases (x.y.z and x.y.u).
> >
> > Cheers,
> > Till
> >
> > On Fri, May 15, 2020 at 6:33 AM Zhu Zhu  wrote:
> >
> >> +1 for "API + binary compatibility for @PublicEvolving classes for all
> bug
> >> fix
> >> releases in a minor release (x.y.z is compatible to x.y.u)"
> >>
> >> This @PublicEnvolving would then be a hard limit to changes.
> >> So it's important to rethink the policy towards using it, as Stephan
> >> proposed.
> >>
> >> I think any Flink interfaces that are visible to users should be
> >> explicitly
> >> marked as @Public or @PublicEnvolving.
> >> Any other interfaces should not be marked as @Public/@PublicEnvolving.
> >> This would be essential for us to check whether we are breaking any user
> >> faced interfaces unexpectedly.
> >> The only exception would be the case that we had to expose a
> method/class
> >> due to implementation limitations, it should be explicitly marked it
> >> as @Internal.
> >>
> >> Thanks,
> >> Zhu Zhu
> >>
> >> Yun Tang  于2020年5月15日周五 上午11:41写道:
> >>
> >> > +1 for this idea, and I also like Xintong's suggestion to make it
> >> > explicitly when the @PublicEvolving API could upgrade to @Public API.
> >> > If we have the rule to upgrade API stable level but not define the
> clear
> >> > timeline, I'm afraid not everyone have the enthusiasm to upgrade this.
> >> >
> >> > The minor suggestion is that I think two major release (which is x.y.0
> >> as
> >> > Chesnay clarified) might be a bit quick. From the release history [1],
> >> > Flink bump major version every 3 ~ 6 months and two major release gap
> >> > could only be at least half a year.
> >> > I think half a year might be a bit too frequent for users to collect
> >> > enough feedbacks, and upgrading API stable level every 3 major
> versions
> >> > should be better.
> >> >
> >> > [1] https://flink.apache.org/downloads.html#flink
> >> >
> >> > Best
> >> > Yun Tang
> >> >
> >> >
> >> > 
> >> > From: Xintong Song 
> >> > Sent: Friday, May 15, 2020 11:04
> >> > To: dev 
> >> > Subject: Re: [DISCUSS] Stability guarantees for @PublicEvolving
> classes
> >> >
> >> > ### Documentation on API compatibility policies
> >> >
> >> > Do we have any formal documentation about the API compatibility
> >> policies?
> >> > The only things I found are:
> >> >
> >> >- In the release announcement (take 1.10.0 as an example) [1]:
> >> >"This version is API-compatible with previous 1.x releases for APIs
> >> >annotated with the @Public annotation."
> >> >- JavaDoc for Public [2] and PublicEvolving [3].
> >> >
> >> > I think we might have a formal documentation, clearly state our
> policies
> >> > for API compatibility.
> >> >
> >> >- What does the annotations mean
> >> >- In what circumstance would the APIs remain compatible / become
> >> >incompatible
> >> >- How do APIs retire (e.g., first deprecated then removed?)
> >> >
> >> > Maybe there is already such kind of documentation that I overlooked?
> If
> >> so,
> >> > we probably want to make it more explicit and easy-to-find.
> >> >
> >> > ### @Public vs. @PublicEvolving for new things
> >> >
> >> > I share Stephan's concern that, with @PublicEvolving used for every
> new
> >> > feature and rarely upgraded to @Public, we are practically making no
> >> > compatibility guarantee between minor versions (x.y.* / x.z.*). On the
> >> > other hand, I think in many 

Re: [VOTE] Guarantee that @PublicEvolving classes are API and binary compatible across bug fix releases (x.y.u and x.y.v)

2020-05-15 Thread Congxian Qiu
+1 (non-binding)
Best,
Congxian


Yangze Guo  于2020年5月16日周六 上午12:51写道:

> +1
>
> Best,
> Yangze Guo
>
> On Sat, May 16, 2020 at 12:26 AM Yuan Mei  wrote:
> >
> > +1
> >
> > On Sat, May 16, 2020 at 12:21 AM Ufuk Celebi  wrote:
> >
> > > +1
> > >
> > > – Ufuk
> > >
> > >
> > > On Fri, May 15, 2020 at 4:54 PM Zhijiang  > > .invalid>
> > > wrote:
> > >
> > > > Sounds good, +1.
> > > >
> > > > Best,
> > > > Zhijiang
> > > >
> > > >
> > > > --
> > > > From:Thomas Weise 
> > > > Send Time:2020年5月15日(星期五) 21:33
> > > > To:dev 
> > > > Subject:Re: [VOTE] Guarantee that @PublicEvolving classes are API and
> > > > binary compatible across bug fix releases (x.y.u and x.y.v)
> > > >
> > > > +1
> > > >
> > > >
> > > > On Fri, May 15, 2020 at 6:15 AM Till Rohrmann 
> > > > wrote:
> > > >
> > > > > Dear community,
> > > > >
> > > > > with reference to the dev ML thread about guaranteeing API and
> binary
> > > > > compatibility for @PublicEvolving classes across bug fix releases
> [1] I
> > > > > would like to start a vote about it.
> > > > >
> > > > > The proposal is that the Flink community starts to guarantee
> > > > > that @PublicEvolving classes will be API and binary compatible
> across
> > > bug
> > > > > fix releases of the same minor version. This means that a version
> x.y.u
> > > > is
> > > > > API and binary compatible to x.y.v with u <= v wrt all
> @PublicEvolving
> > > > > classes.
> > > > >
> > > > > The voting options are the following:
> > > > >
> > > > > * +1, Provide the above described guarantees
> > > > > * -1, Do not provide the above described guarantees (please provide
> > > > > specific comments)
> > > > >
> > > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > > approval with at least 3 PMC affirmative votes.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/rb0d0f887b291a490ed3773352c90ddf5f11e3d882dc501e3b8cf0ed0%40%3Cdev.flink.apache.org%3E
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > >
> > > >
> > >
>


Re: [PROPOSAL] Google Season of Docs 2020.

2020-05-15 Thread Robert Metzger
FYI: I'm a moderator of the dev@ list, and we've received about 5 emails
from applicants that were not subscribed to the list.
Initially, I rejected their messages, asking them to subscribe and send the
email again. This has not happened from any of them.
That's why I accepted new applicant messages now. However, this means that
they won't receive email responses if we just reply to the list.

tl:dr: Please use "Reply to all" and make sure the applicant's email
address is included when responding to any of those applications. Thanks :)

On Tue, May 12, 2020 at 11:28 AM Till Rohrmann  wrote:

> This is great newst :-) Thanks Marta for driving this effort!
>
> On Mon, May 11, 2020 at 4:22 PM Sivaprasanna 
> wrote:
>
> > Awesome. Great job.
> >
> > On Mon, 11 May 2020 at 7:22 PM, Seth Wiesman 
> wrote:
> >
> > > Thank you for putting this together Marta!
> > >
> > > On Mon, May 11, 2020 at 8:35 AM Fabian Hueske 
> wrote:
> > >
> > > > Thanks Marta and congratulations!
> > > >
> > > > Am Mo., 11. Mai 2020 um 14:55 Uhr schrieb Robert Metzger <
> > > > rmetz...@apache.org>:
> > > >
> > > > > Awesome :)
> > > > > Thanks a lot for driving this Marta!
> > > > >
> > > > > Nice to see Flink (by virtue of having Apache as part of the name)
> so
> > > > high
> > > > > on the list, with other good open source projects :)
> > > > >
> > > > >
> > > > > On Mon, May 11, 2020 at 2:18 PM Marta Paes Moreira <
> > > ma...@ververica.com>
> > > > > wrote:
> > > > >
> > > > > > I'm happy to announce that we were *accepted* into this year's
> > Google
> > > > > > Season of Docs!
> > > > > >
> > > > > > The list of projects was published today [1]. The next step is
> for
> > > > > > technical writers to reach out (May 11th-June 8th) and apply
> (June
> > > > > 9th-July
> > > > > > 9th).
> > > > > >
> > > > > > Thanks again to everyone involved! I'm really looking forward to
> > > > kicking
> > > > > > off the project in September.
> > > > > >
> > > > > > [1]
> > https://developers.google.com/season-of-docs/docs/participants/
> > > > > >
> > > > > > Marta
> > > > > >
> > > > > > On Thu, Apr 30, 2020 at 5:14 PM Marta Paes Moreira <
> > > > ma...@ververica.com>
> > > > > > wrote:
> > > > > >
> > > > > > > The application to Season of Docs 2020 is close to being
> > finalized.
> > > > > I've
> > > > > > > created a PR with the application announcement for the Flink
> blog
> > > [1]
> > > > > (as
> > > > > > > required by Google OSS).
> > > > > > >
> > > > > > > Thanks a lot to everyone who pitched in — and special thanks to
> > > > > Aljoscha
> > > > > > > and Seth for volunteering as mentors!
> > > > > > >
> > > > > > > I'll send an update to this thread once the results are out
> (May
> > > > 11th).
> > > > > > >
> > > > > > > [1] https://github.com/apache/flink-web/pull/332
> > > > > > >
> > > > > > > On Mon, Apr 27, 2020 at 9:28 PM Seth Wiesman <
> > sjwies...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Marta,
> > > > > > >>
> > > > > > >> I think this is a great idea, I'd be happy to help mentor a
> > table
> > > > > > >> documentation project.
> > > > > > >>
> > > > > > >> Seth
> > > > > > >>
> > > > > > >> On Thu, Apr 23, 2020 at 8:38 AM Marta Paes Moreira <
> > > > > ma...@ververica.com
> > > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Thanks for the feedback!
> > > > > > >> >
> > > > > > >> > So far, the projects on the table are:
> > > > > > >> >
> > > > > > >> >1. Improving the Table API/SQL documentation.
> > > > > > >> >2. Improving the documentation about Deployments.
> > > > > > >> >3. Restructuring and standardizing the documentation
> about
> > > > > > >> Connectors.
> > > > > > >> >4. Finishing the Chinese translation.
> > > > > > >> >
> > > > > > >> > I think 2. would require a lot of technical knowledge about
> > > Flink,
> > > > > > which
> > > > > > >> > might not be a good fit for GSoD (as discussed last year).
> > > > > > >> >
> > > > > > >> > As for mentors, we have:
> > > > > > >> >
> > > > > > >> >- Aljoscha (Table API/SQL)
> > > > > > >> >- Till (Deployments)
> > > > > > >> >- Stephan also said he'd be happy to participate as a
> > mentor
> > > if
> > > > > > >> needed.
> > > > > > >> >
> > > > > > >> > For the translation project, I'm pulling in the people
> > involved
> > > in
> > > > > > last
> > > > > > >> > year's thread (Jark and Jincheng), as we would need two
> > > > > > chinese-speaking
> > > > > > >> > mentors.
> > > > > > >> >
> > > > > > >> > I'll follow up with a draft proposal early next week, once
> we
> > > > reach
> > > > > a
> > > > > > >> > consensus and have enough mentors (2 per project). Thanks
> > again!
> > > > > > >> >
> > > > > > >> > Marta
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Fri, Apr 17, 2020 at 2:53 PM Till Rohrmann <
> > > > trohrm...@apache.org
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > Thanks for driving this effort Marta.
> > > > > > >> > >
> > > > > > >> > > I'd be up