Re: [ANNOUNCE] New Documentation Style Guide

2020-02-17 Thread Becket Qin
The guideline is very practical! I like it! Thanks for putting it together,
Aljoscha.

Jiangjie (Becket) Qin

On Mon, Feb 17, 2020 at 10:04 AM Xintong Song  wrote:

> Thanks for the summary!
>
> I've read through the guidelines and found it very helpful. Many of the
> questions I had while working on the 1.10 docs the answer can be found in
> the guideline. And it also inspires me with questions I have never thought
> of, especially the language style part.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Sun, Feb 16, 2020 at 12:55 AM Zhijiang
>  wrote:
>
> > Thanks for bringing this great and valuable document.
> >
> > I read through the document and was inspired especially by some sections
> > in "Voice and Tone" and "General Guiding Principles".
> >  I think it is not only helpful for writing Flink documents, but also
> > provides guideline/benefit for other writings.
> > It also reminded me to extend the Flink glossary if necessary.
> >
> > Best,
> > Zhijiang
> >
> >
> > --
> > From:Jingsong Li 
> > Send Time:2020 Feb. 15 (Sat.) 23:21
> > To:dev 
> > Subject:Re: [ANNOUNCE] New Documentation Style Guide
> >
> > Thank for the great work,
> >
> > In 1.10, I have modified and reviewed some documents. In that process,
> > sometimes there is some confusion, how to write is the standard. How to
> > write is correct to the users.
> > Docs style now tells me. Learned a lot.
> >
> > Best,
> > Jingsong Lee
> >
> > On Sat, Feb 15, 2020 at 10:00 PM Dian Fu  wrote:
> >
> > > Thanks for the great work! This is very helpful to keep the
> documentation
> > > style consistent across the whole project. It's also very helpful for
> > > non-native English contributors like me.
> > >
> > > > 在 2020年2月15日,下午3:42,Jark Wu  写道:
> > > >
> > > > Great summary! Thanks for adding the translation specification in it.
> > > > I learned a lot from the guide.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Fri, 14 Feb 2020 at 23:39, Aljoscha Krettek 
> > > wrote:
> > > >
> > > >> Hi Everyone,
> > > >>
> > > >> we just merged a new style guide for documentation writing:
> > > >> https://flink.apache.org/contributing/docs-style.html.
> > > >>
> > > >> Anyone who is writing documentation or is planning to do so should
> > check
> > > >> this out. Please open a Jira Issue or respond here if you have any
> > > >> comments or questions.
> > > >>
> > > >> Some of the most important points in the style guide are:
> > > >>
> > > >>  - We should use direct language and address the reader as you
> instead
> > > >> of passive constructions. Please read the guide if you want to
> > > >> understand what this means.
> > > >>
> > > >>  - We should use "alert blocks" instead of simple inline alert tags.
> > > >> Again, please refer to the guide to see what this means exactly if
> > > >> you're not sure.
> > > >>
> > > >> There's plenty more and some interesting links about
> > > >> technical/documentation writing as well.
> > > >>
> > > >> Best,
> > > >> Aljoscha
> > > >>
> > >
> > >
> >
> > --
> > Best, Jingsong Lee
> >
> >
>


[jira] [Created] (FLINK-16113) ExpressionReducer shouldn't escape the reduced string value

2020-02-17 Thread Jark Wu (Jira)
Jark Wu created FLINK-16113:
---

 Summary: ExpressionReducer shouldn't escape the reduced string 
value
 Key: FLINK-16113
 URL: https://issues.apache.org/jira/browse/FLINK-16113
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.10.1


ExpressionReducer shouldn't escape the reduced string value, the escaping 
should only happen in code generation, otherwise the output result is 
inccorect. 

Here is a simple example to reproduce the problem:


{code:java}
  val smallTupleData3: Seq[(Int, Long, String)] = {
val data = new mutable.MutableList[(Int, Long, String)]
data.+=((1, 1L, "你好"))
data.+=((2, 2L, "你好"))
data.+=((3, 2L, "你好世界"))
data
  }

  @Test
  def test(): Unit = {
val t = env.fromCollection(smallTupleData3)
  .toTable(tEnv, 'a, 'b, 'c)
tEnv.createTemporaryView("MyTable", t)
val sqlQuery = s"select * from MyTable where c = '你好'"

val result = tEnv.sqlQuery(sqlQuery).toAppendStream[Row]
val sink = new TestingAppendSink
result.addSink(sink)
env.execute()
println(sink.getAppendResults.mkString("\n"))
  }
{code}

The output:

{code:java}
1,1,\u4F60\u597D
2,2,\u4F60\u597D
{code}

This is also mentioned in user mailing list: 
http://apache-flink.147419.n8.nabble.com/ParquetTableSource-blink-table-planner-tp1696p1720.html




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[RESULT][VOTE] FLIP-97: Support scalar vectorized Python UDF in PyFlink

2020-02-17 Thread Dian Fu
Hi all,

Thanks you all for the discussion and votes.
So far, we have
  - 3 binding +1 votes (Jincheng, Hequn, Dian)
  - 1 non-binding +1 votes (Jingsong)
  - No -1 votes

The voting time has passed and there are enough +1 votes. Therefore, I'm happy 
to announce that FLIP-97[1] has been accepted.

Thanks,
Dian

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink

[jira] [Created] (FLINK-16114) Support Scalar Vectorized Python UDF in PyFlink

2020-02-17 Thread Dian Fu (Jira)
Dian Fu created FLINK-16114:
---

 Summary: Support Scalar Vectorized Python UDF in PyFlink
 Key: FLINK-16114
 URL: https://issues.apache.org/jira/browse/FLINK-16114
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


Scalar Python UDF has already been supported in Flink 1.10 
([FLIP-58|https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table])
 and it operates one row at a time. It works in the way that the Java operator 
serializes one input row to bytes and sends them to the Python worker; the 
Python worker deserializes the input row and evaluates the Python UDF with it; 
the result row is serialized and sent back to the Java operator.

It suffers from the following problems:
 # High serialization/deserialization overhead
 # It’s difficult to leverage the popular Python libraries used by data 
scientists, such as Pandas, Numpy, etc which provide high performance data 
structure and functions.

We want to introduce vectorized Python UDF to address this problem. For 
vectorized Python UDF, a batch of rows are transferred between JVM and Python 
VM in columnar format. The batch of rows will be converted to a collection of 
Pandas.Series and given to the vectorized Python UDF which could then leverage 
the popular Python libraries such as Pandas, Numpy, etc for the Python UDF 
implementation.

More details could be found in 
[FLIP-97.|https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16115) Aliyun oss filesystem could not work with plugin mechanism

2020-02-17 Thread Yang Wang (Jira)
Yang Wang created FLINK-16115:
-

 Summary: Aliyun oss filesystem could not work with plugin mechanism
 Key: FLINK-16115
 URL: https://issues.apache.org/jira/browse/FLINK-16115
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Affects Versions: 1.10.0
Reporter: Yang Wang


>From release-1.9, Flink suggest users to load all filesystem with plugin, 
>including oss. However, it could not work for oss filesystem. The root cause 
>is it does not shade the {{org.apache.flink.runtime.fs.hdfs}} and 
>{{org.apache.flink.runtime.util}}. So they will always be loaded by system 
>classloader and throw the following exceptions.

 
{code:java}
2020-02-17 17:28:47,247 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Could not start 
cluster entrypoint StandaloneSessionClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to 
initialize the cluster entrypoint StandaloneSessionClusterEntrypoint.
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
at 
org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint.main(StandaloneSessionClusterEntrypoint.java:64)
Caused by: java.lang.NoSuchMethodError: 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.(Lorg/apache/flink/fs/shaded/hadoop3/org/apache/hadoop/fs/FileSystem;)V
at 
org.apache.flink.fs.osshadoop.OSSFileSystemFactory.create(OSSFileSystemFactory.java:85)
at 
org.apache.flink.core.fs.PluginFileSystemFactory.create(PluginFileSystemFactory.java:61)
at 
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:441)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
at 
org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:100)
at 
org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:89)
at 
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:125)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:305)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:263)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:207)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
at 
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
... 2 more
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16116) Remove shading from oss filesystems build

2020-02-17 Thread Yang Wang (Jira)
Yang Wang created FLINK-16116:
-

 Summary: Remove shading from oss filesystems build
 Key: FLINK-16116
 URL: https://issues.apache.org/jira/browse/FLINK-16116
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Reporter: Yang Wang


Since Flink will use plugin to load all the filesystem, the class conflict will 
not be a problem. So just like S3, i suggest remove the shading for oss 
filesystem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New Documentation Style Guide

2020-02-17 Thread Yu Li
I think the guide itself is a great example of the new style! Thanks for
driving this, Aljoscha!

Best Regards,
Yu


On Mon, 17 Feb 2020 at 16:44, Becket Qin  wrote:

> The guideline is very practical! I like it! Thanks for putting it together,
> Aljoscha.
>
> Jiangjie (Becket) Qin
>
> On Mon, Feb 17, 2020 at 10:04 AM Xintong Song 
> wrote:
>
> > Thanks for the summary!
> >
> > I've read through the guidelines and found it very helpful. Many of the
> > questions I had while working on the 1.10 docs the answer can be found in
> > the guideline. And it also inspires me with questions I have never
> thought
> > of, especially the language style part.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Sun, Feb 16, 2020 at 12:55 AM Zhijiang
> >  wrote:
> >
> > > Thanks for bringing this great and valuable document.
> > >
> > > I read through the document and was inspired especially by some
> sections
> > > in "Voice and Tone" and "General Guiding Principles".
> > >  I think it is not only helpful for writing Flink documents, but also
> > > provides guideline/benefit for other writings.
> > > It also reminded me to extend the Flink glossary if necessary.
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > > --
> > > From:Jingsong Li 
> > > Send Time:2020 Feb. 15 (Sat.) 23:21
> > > To:dev 
> > > Subject:Re: [ANNOUNCE] New Documentation Style Guide
> > >
> > > Thank for the great work,
> > >
> > > In 1.10, I have modified and reviewed some documents. In that process,
> > > sometimes there is some confusion, how to write is the standard. How to
> > > write is correct to the users.
> > > Docs style now tells me. Learned a lot.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Sat, Feb 15, 2020 at 10:00 PM Dian Fu 
> wrote:
> > >
> > > > Thanks for the great work! This is very helpful to keep the
> > documentation
> > > > style consistent across the whole project. It's also very helpful for
> > > > non-native English contributors like me.
> > > >
> > > > > 在 2020年2月15日,下午3:42,Jark Wu  写道:
> > > > >
> > > > > Great summary! Thanks for adding the translation specification in
> it.
> > > > > I learned a lot from the guide.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > On Fri, 14 Feb 2020 at 23:39, Aljoscha Krettek <
> aljos...@apache.org>
> > > > wrote:
> > > > >
> > > > >> Hi Everyone,
> > > > >>
> > > > >> we just merged a new style guide for documentation writing:
> > > > >> https://flink.apache.org/contributing/docs-style.html.
> > > > >>
> > > > >> Anyone who is writing documentation or is planning to do so should
> > > check
> > > > >> this out. Please open a Jira Issue or respond here if you have any
> > > > >> comments or questions.
> > > > >>
> > > > >> Some of the most important points in the style guide are:
> > > > >>
> > > > >>  - We should use direct language and address the reader as you
> > instead
> > > > >> of passive constructions. Please read the guide if you want to
> > > > >> understand what this means.
> > > > >>
> > > > >>  - We should use "alert blocks" instead of simple inline alert
> tags.
> > > > >> Again, please refer to the guide to see what this means exactly if
> > > > >> you're not sure.
> > > > >>
> > > > >> There's plenty more and some interesting links about
> > > > >> technical/documentation writing as well.
> > > > >>
> > > > >> Best,
> > > > >> Aljoscha
> > > > >>
> > > >
> > > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> > >
> >
>


Re: [ANNOUNCE] New Documentation Style Guide

2020-02-17 Thread jincheng sun
Thanks for this great job Aljoscha!

Best,
Jincheng



Yu Li  于2020年2月17日周一 下午6:42写道:

> I think the guide itself is a great example of the new style! Thanks for
> driving this, Aljoscha!
>
> Best Regards,
> Yu
>
>
> On Mon, 17 Feb 2020 at 16:44, Becket Qin  wrote:
>
> > The guideline is very practical! I like it! Thanks for putting it
> together,
> > Aljoscha.
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Feb 17, 2020 at 10:04 AM Xintong Song 
> > wrote:
> >
> > > Thanks for the summary!
> > >
> > > I've read through the guidelines and found it very helpful. Many of the
> > > questions I had while working on the 1.10 docs the answer can be found
> in
> > > the guideline. And it also inspires me with questions I have never
> > thought
> > > of, especially the language style part.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Sun, Feb 16, 2020 at 12:55 AM Zhijiang
> > >  wrote:
> > >
> > > > Thanks for bringing this great and valuable document.
> > > >
> > > > I read through the document and was inspired especially by some
> > sections
> > > > in "Voice and Tone" and "General Guiding Principles".
> > > >  I think it is not only helpful for writing Flink documents, but also
> > > > provides guideline/benefit for other writings.
> > > > It also reminded me to extend the Flink glossary if necessary.
> > > >
> > > > Best,
> > > > Zhijiang
> > > >
> > > >
> > > > --
> > > > From:Jingsong Li 
> > > > Send Time:2020 Feb. 15 (Sat.) 23:21
> > > > To:dev 
> > > > Subject:Re: [ANNOUNCE] New Documentation Style Guide
> > > >
> > > > Thank for the great work,
> > > >
> > > > In 1.10, I have modified and reviewed some documents. In that
> process,
> > > > sometimes there is some confusion, how to write is the standard. How
> to
> > > > write is correct to the users.
> > > > Docs style now tells me. Learned a lot.
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Sat, Feb 15, 2020 at 10:00 PM Dian Fu 
> > wrote:
> > > >
> > > > > Thanks for the great work! This is very helpful to keep the
> > > documentation
> > > > > style consistent across the whole project. It's also very helpful
> for
> > > > > non-native English contributors like me.
> > > > >
> > > > > > 在 2020年2月15日,下午3:42,Jark Wu  写道:
> > > > > >
> > > > > > Great summary! Thanks for adding the translation specification in
> > it.
> > > > > > I learned a lot from the guide.
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > On Fri, 14 Feb 2020 at 23:39, Aljoscha Krettek <
> > aljos...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > >> Hi Everyone,
> > > > > >>
> > > > > >> we just merged a new style guide for documentation writing:
> > > > > >> https://flink.apache.org/contributing/docs-style.html.
> > > > > >>
> > > > > >> Anyone who is writing documentation or is planning to do so
> should
> > > > check
> > > > > >> this out. Please open a Jira Issue or respond here if you have
> any
> > > > > >> comments or questions.
> > > > > >>
> > > > > >> Some of the most important points in the style guide are:
> > > > > >>
> > > > > >>  - We should use direct language and address the reader as you
> > > instead
> > > > > >> of passive constructions. Please read the guide if you want to
> > > > > >> understand what this means.
> > > > > >>
> > > > > >>  - We should use "alert blocks" instead of simple inline alert
> > tags.
> > > > > >> Again, please refer to the guide to see what this means exactly
> if
> > > > > >> you're not sure.
> > > > > >>
> > > > > >> There's plenty more and some interesting links about
> > > > > >> technical/documentation writing as well.
> > > > > >>
> > > > > >> Best,
> > > > > >> Aljoscha
> > > > > >>
> > > > >
> > > > >
> > > >
> > > > --
> > > > Best, Jingsong Lee
> > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-16117) Avoid register source in TableTestBase#addTableSource

2020-02-17 Thread Zhenghua Gao (Jira)
Zhenghua Gao created FLINK-16117:


 Summary: Avoid register source in TableTestBase#addTableSource
 Key: FLINK-16117
 URL: https://issues.apache.org/jira/browse/FLINK-16117
 Project: Flink
  Issue Type: Sub-task
Reporter: Zhenghua Gao


This affects thousands of unit tests:

1) explainSourceAsString of CatalogSourceTable changes

2)JoinTest#testUDFInJoinCondition: SQL keywords must be escaped

3) GroupWindowTest#testTimestampEventTimeTumblingGroupWindowWithProperties: 
Reference to a rowtime or proctime window required

4) SetOperatorsTest#testInWithProject: legacy type vs new type
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16118) testDynamicTableFunction fails

2020-02-17 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-16118:
-

 Summary: testDynamicTableFunction fails
 Key: FLINK-16118
 URL: https://issues.apache.org/jira/browse/FLINK-16118
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.11.0
Reporter: Roman Khachatryan


https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_apis/build/builds/5186/logs/16

 
{code:java}
2020-02-14T14:46:56.8515984Z [ERROR] 
testDynamicTableFunction(org.apache.flink.table.planner.runtime.stream.sql.FunctionITCase)
 Time elapsed: 3.452 s <<< FAILURE!
 2020-02-14T14:46:56.8517003Z java.lang.AssertionError:
 2020-02-14T14:46:56.8517232Z
 2020-02-14T14:46:56.8517485Z Expected: <[Test is a string, 42, null]>
 2020-02-14T14:46:56.8517739Z but: was <[42, Test is a string, null]>
 2020-02-14T14:46:56.8518067Z at 
org.apache.flink.table.planner.runtime.stream.sql.FunctionITCase.testDynamicTableFunction(FunctionITCase.java:611){code}
 

 

The change was to enable chaining of the ContinuousFileReaderOperator 
(https://github.com/apache/flink/pull/11097).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2020-02-17 Thread Robert Metzger
@Leonard: On Azure, I'm not splitting the execution of the end to end tests
anymore. We won't have the overhead of compiling the same profile multiple
times anymore.


@all: We have recently merged a first version of the Azure configuration
files to Flink [1]. This will allow us to build pull requests with all the
additional checks we had in place for Travis as well.
In the next few days, I'm going to build pushes and the nightly crons on
Azure as well.

>From now on, you can set up Azure Pipelines for your own Flink fork as
well, and execute end to end tests there quite easily [2].
I'll be closely monitoring the new setup in the coming days. Expect some
smaller issues while not all pull requests have my changes (at some point,
I will change a configuration in Azure, which will break builds that do not
have my changes)
Once Azure is stable, and we have the same features as the Travis build,
we'll stop processing builds on Travis.


[1] https://github.com/apache/flink/pull/10976
[2]
https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines#id-[preview]AzurePipelines-Runningendtoendtests:

On Mon, Dec 9, 2019 at 2:16 PM Leonard Xu  wrote:

> +1 for the migration.
> *10 parallel builds with 300 minute timeouts * is very useful for tasks
> that takes long time like e2e tests.
> And in Travis, looks like we compile entire project for every cron task
> even if they use same profile, eg:
>  `name: e2e - misc - hadoop 2.8
>   name: e2e - ha - hadoop 2.8
>   name: e2e - sticky - hadoop 2.8
>   name: e2e - checkpoints - hadoop 2.8
>   name: e2e - container - hadoop 2.8
>   name: e2e - heavy - hadoop 2.8
>   name: e2e - tpcds - hadoop 2.8`
> We will compile entire project with profile `hadoop 2.8`  7 times, and
> every task will take about 25  minutes.
> @robert @chesnay Should we consider to compile once for multi cron task
> which have same profile in the new Azure Pipelines?
>
> Best,
> Leonard Xu
>
> > On Dec 9, 2019, at 11:57, Congxian Qiu  wrote:
> >
> > +1 for migrating to Azure pipelines as this can have shorter build time,
> > and faster response.
> >
> > Best,
> > Congxian
> >
> >
> > Xiyuan Wang  于2019年12月9日周一 上午10:13写道:
> >
> >> Hi Robert,
> >>  Thanks for bring up this topic. The 2 ARM machines(16cores) which I
> >> donated is just for POC test. We(Huawei) can donate more once moving to
> >> official Azure pipeline. :)
> >>
> >> Robert Metzger  于2019年12月6日周五 上午3:25写道:
> >>
> >>> Thanks for your comments Yun.
> >>> If there's strong support for idea 2, it would actually make my
> >>> life easier: the migration would be easier to do.
> >>>
> >>> I also noticed that the uploads to transfer.sh were broken, but this
> >> should
> >>> be fixed in the "rmetzger.flink" builds (coming from rmetzger/flink).
> The
> >>> builds in "flink-ci.flink" (coming from flink-ci/flink) might have
> >> troubles
> >>> with transfer.sh.
> >>>
> >>>
> >>> On Thu, Dec 5, 2019 at 5:50 PM Yun Tang  wrote:
> >>>
>  Hi Robert
> 
>  Really exciting to see this new more powerful CI tool to get rid of
> the
> >>> 50
>  minutes limit of traivs-CI free account.
> 
>  After reading the wiki, I support idea 2 of AZP-setup version-2.
> 
>  However, after I dig into some failing builds at
>  https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view
> >> the
>  logs of some IT cases which would be uploaded by traivs_watchdog to
>  transfer.sh previously.
>  I think this feature is also easy to implement in AZP, right?
> 
>  Best
>  Yun Tang
> 
>  On 12/6/19, 12:19 AM, "Robert Metzger"  wrote:
> 
> I've created a first draft of my plans in the wiki:
> 
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines
>  .
> I'm looking forward to your comments.
> 
> On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger <
> >> rmetz...@apache.org>
>  wrote:
> 
> > Thank you all for the positive feedback. I will start putting
>  together a
> > page in the wiki.
> >
> > @Jark: Azure Pipelines provides a free services, that is even
> >>> better
>  than
> > what Travis provides for free: 10 parallel builds with 6 hours
>  timeouts.
> >
> > @Chesnay: I will answer your questions in the yet-to-be-written
> > documentation in the wiki.
> >
> >
> > On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise  >>>
>  wrote:
> >
> >> +1 I had good experiences with Azure pipelines in the past.
> >>
> >> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <
>  aljos...@apache.org>
> >> wrote:
> >>
> >>> +1
> >>>
> >>> Thanks for the effort! The tooling seems to be quite a bit
> >> nicer
>  and I
> >>> like that we can grow by adding more machines.
> >>>
> >>> Best,
> >>> Aljoscha
> >>>
>  On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> 
> 

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

2020-02-17 Thread Timo Walther

Hi Kurt,

no there is no JIRA ticket yet. But in any case, I think it is better to 
have good testing infrastructure that abstracts source generation, sink 
generation, testing data etc. If we will introduce tableEnv.values() it 
will also not solve everything because time-based operations might need 
time attributes and so on.


Using DDL in tests should also be avoided because strings are even more 
difficult to maintain.


Regards,
Timo


On 08.02.20 04:29, Kurt Young wrote:

Hi Timo,

tableEnv.fromElements/values() sounds good, do we have a jira ticket to
track the issue?

Best,
Kurt


On Fri, Feb 7, 2020 at 10:56 PM Timo Walther  wrote:


Hi Kurt,

Dawid is currently working on making a tableEnv.fromElements/values()
kind of source possible in the future. We can use this to replace some
of the tests. Otherwise I guess we should come up with a better test
infrastructure to make defining source not necessary anymore.

Regards,
Timo


On 07.02.20 11:24, Kurt Young wrote:

Thanks all for your feedback, since no objection has been raised, I've
created
https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.

Since this issue would require lots of tests adjustment before it really
happen,
it won't be done in a short time. Feel free to give feedback anytime here
or in jira
if you have other opinions.

Best,
Kurt


On Wed, Feb 5, 2020 at 8:26 PM Kurt Young  wrote:


Hi Zhenghua,

After removing TableSource::getTableSchema, during optimization, I could
imagine
the schema information might come from relational nodes such as

TableScan.


Best,
Kurt


On Wed, Feb 5, 2020 at 8:24 PM Kurt Young  wrote:


Hi Jingsong,

Yes current TableFactory is not ideal for users to use either. I think

we

should
also spend some time in 1.11 to improve the usability of

TableEnvironment

when
users trying to read or write something. Automatic scheme inference

would

be
one of them. Other from this, we also support convert a DataStream to
Table, which
can serve some flexible requirements to read or write data.

Best,
Kurt


On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao  wrote:


+1 to remove these methods.

One concern about invocations of TableSource::getTableSchema:
By removing such methods, we can stop calling

TableSource::getTableSchema

in some place(such
as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
ConnectorCatalogTable, TableSourceQueryOperation).

But in other place we need field types and names of the table

source(such

as
BatchExecLookupJoinRule/StreamExecLookupJoinRule,
PushProjectIntoTableSourceScanRule,
CommonLookupJoin).  So how should we deal with this?

*Best Regards,*
*Zhenghua Gao*


On Wed, Feb 5, 2020 at 2:36 PM Kurt Young  wrote:


Hi all,

I'd like to bring up a discussion about removing registration of
TableSource and
TableSink in TableEnvironment as well as in ConnectTableDescriptor.

The

affected
method would be:

TableEnvironment::registerTableSource
TableEnvironment::fromTableSource
TableEnvironment::registerTableSink
ConnectTableDescriptor::registerTableSource
ConnectTableDescriptor::registerTableSink
ConnectTableDescriptor::registerTableSourceAndSink

(Most of them are already deprecated, except for
TableEnvironment::fromTableSource,
which was intended to deprecate but missed by accident).

FLIP-64 [1] already explained why we want to deprecate TableSource &
TableSink from
user's interface. In a short word, these interfaces should only read

&

write the physical
representation of the table, and they are not fitting well after we

already

introduced some
logical table fields such as computed column, watermarks.

Another reason is the exposure of registerTableSource in Table Env

just

make the whole
SQL protocol opposite. TableSource should be used as a reader of

table, it

should rely on
other metadata information held by framework, which eventually comes

from

DDL or
ConnectDescriptor. But if we register a TableSource to Table Env, we

have

no choice but
have to rely on TableSource::getTableSchema. It will make the design
obscure, sometimes
TableSource should trust the information comes from framework, but
sometimes it should
also generate its own schema information.

Furthermore, if the authority about schema information is not clear,

it

will make things much
more complicated if we want to improve the table api usability such

as

introducing automatic
schema inference in the near future.

Since this is an API break change, I've also included user mailing

list to

gather more feedbacks.

Best,
Kurt

[1]





https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

















[Discuss] Update the pull request description template.

2020-02-17 Thread Xintong Song
Hi all,

It seems our PR description template is a bit outdated, and I would like to
propose updating it.

I was working on a Kubernetes related PR, and realized that our PR
description does not mention the new Kubernetes integration questioning
about deployment related changes. Currently is is as follows:

> Anything that affects deployment or recovery: JobManager (and its
> components), Checkpointing, Yarn/Mesos, ZooKeeper:
>

In addition to outdated contents, there might be other stuff that we want
to add to the template. For example, I would suggest add a question about
whether there are any memory allocation introduced by the PR, so we review
them carefully and avoid problems due to un-accounted memory allocations
like FLINK-15981. (To be fair, for FLINK-15981 the memory allocation was
introduced before we start to account for everything memory usage, but
noticing such memory allocations early should help us prevent similar
problems in the future).

Therefore, I'd also like to collect ideas on how do you think the template
should be updated in this discussion thread.

Looking forward to your feedback~!

Thank you~

Xintong Song


Re: [ANNOUNCE] New Documentation Style Guide

2020-02-17 Thread Aljoscha Krettek
Just to be clear: I didn't write this style guide. Marta (in cc) wrote 
this, I just finally merged it.


Best,
Aljoscha

On 17.02.20 11:45, jincheng sun wrote:

Thanks for this great job Aljoscha!

Best,
Jincheng



Yu Li  于2020年2月17日周一 下午6:42写道:


I think the guide itself is a great example of the new style! Thanks for
driving this, Aljoscha!

Best Regards,
Yu


On Mon, 17 Feb 2020 at 16:44, Becket Qin  wrote:


The guideline is very practical! I like it! Thanks for putting it

together,

Aljoscha.

Jiangjie (Becket) Qin

On Mon, Feb 17, 2020 at 10:04 AM Xintong Song 
wrote:


Thanks for the summary!

I've read through the guidelines and found it very helpful. Many of the
questions I had while working on the 1.10 docs the answer can be found

in

the guideline. And it also inspires me with questions I have never

thought

of, especially the language style part.

Thank you~

Xintong Song



On Sun, Feb 16, 2020 at 12:55 AM Zhijiang
 wrote:


Thanks for bringing this great and valuable document.

I read through the document and was inspired especially by some

sections

in "Voice and Tone" and "General Guiding Principles".
  I think it is not only helpful for writing Flink documents, but also
provides guideline/benefit for other writings.
It also reminded me to extend the Flink glossary if necessary.

Best,
Zhijiang


--
From:Jingsong Li 
Send Time:2020 Feb. 15 (Sat.) 23:21
To:dev 
Subject:Re: [ANNOUNCE] New Documentation Style Guide

Thank for the great work,

In 1.10, I have modified and reviewed some documents. In that

process,

sometimes there is some confusion, how to write is the standard. How

to

write is correct to the users.
Docs style now tells me. Learned a lot.

Best,
Jingsong Lee

On Sat, Feb 15, 2020 at 10:00 PM Dian Fu 

wrote:



Thanks for the great work! This is very helpful to keep the

documentation

style consistent across the whole project. It's also very helpful

for

non-native English contributors like me.


在 2020年2月15日,下午3:42,Jark Wu  写道:

Great summary! Thanks for adding the translation specification in

it.

I learned a lot from the guide.

Best,
Jark

On Fri, 14 Feb 2020 at 23:39, Aljoscha Krettek <

aljos...@apache.org>

wrote:



Hi Everyone,

we just merged a new style guide for documentation writing:
https://flink.apache.org/contributing/docs-style.html.

Anyone who is writing documentation or is planning to do so

should

check

this out. Please open a Jira Issue or respond here if you have

any

comments or questions.

Some of the most important points in the style guide are:

  - We should use direct language and address the reader as you

instead

of passive constructions. Please read the guide if you want to
understand what this means.

  - We should use "alert blocks" instead of simple inline alert

tags.

Again, please refer to the guide to see what this means exactly

if

you're not sure.

There's plenty more and some interesting links about
technical/documentation writing as well.

Best,
Aljoscha






--
Best, Jingsong Lee












Re: [Discuss] Update the pull request description template.

2020-02-17 Thread Chesnay Schepler
I think it should just be removed since 99% of pull requests ignore it 
anyway.


On 17/02/2020 13:31, Xintong Song wrote:

Hi all,

It seems our PR description template is a bit outdated, and I would like to
propose updating it.

I was working on a Kubernetes related PR, and realized that our PR
description does not mention the new Kubernetes integration questioning
about deployment related changes. Currently is is as follows:


Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper:


In addition to outdated contents, there might be other stuff that we want
to add to the template. For example, I would suggest add a question about
whether there are any memory allocation introduced by the PR, so we review
them carefully and avoid problems due to un-accounted memory allocations
like FLINK-15981. (To be fair, for FLINK-15981 the memory allocation was
introduced before we start to account for everything memory usage, but
noticing such memory allocations early should help us prevent similar
problems in the future).

Therefore, I'd also like to collect ideas on how do you think the template
should be updated in this discussion thread.

Looking forward to your feedback~!

Thank you~

Xintong Song





[jira] [Created] (FLINK-16119) Port base RelNode classes from Scala to Java

2020-02-17 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-16119:
---

 Summary: Port base RelNode classes from Scala to Java
 Key: FLINK-16119
 URL: https://issues.apache.org/jira/browse/FLINK-16119
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Legacy Planner, Table SQL / Planner
Reporter: Hequn Cheng


Currently, when adding new Flink RelNodes, we have to write a Scala one due to 
the problem that we can't use the implemented methods of a Scala trait from 
Java([see 
details|https://alvinalexander.com/scala/how-to-wrap-scala-traits-used-accessed-java-classes-methods]).
 Take DataStreamCorrelate as an example, it extends both CommonCorrelate and 
DataStreamRel and we can't convert DataStreamCorrelate to Java directly. 

It would be great if we can convert these base RelNode classes(CommonCorrelate, 
DataStreamRel, etc) from Scala to Java so that we can add new Java RelNodes and 
convert the existed RelNodes to Java.

CC [~twalthr]








--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Discuss] Update the pull request description template.

2020-02-17 Thread Congxian Qiu
JFYI, there is an issue[1] which I think is related to this thread
[1] https://issues.apache.org/jira/browse/FLINK-15977

Best,
Congxian


Chesnay Schepler  于2020年2月17日周一 下午9:08写道:

> I think it should just be removed since 99% of pull requests ignore it
> anyway.
>
> On 17/02/2020 13:31, Xintong Song wrote:
> > Hi all,
> >
> > It seems our PR description template is a bit outdated, and I would like
> to
> > propose updating it.
> >
> > I was working on a Kubernetes related PR, and realized that our PR
> > description does not mention the new Kubernetes integration questioning
> > about deployment related changes. Currently is is as follows:
> >
> >> Anything that affects deployment or recovery: JobManager (and its
> >> components), Checkpointing, Yarn/Mesos, ZooKeeper:
> >>
> > In addition to outdated contents, there might be other stuff that we want
> > to add to the template. For example, I would suggest add a question about
> > whether there are any memory allocation introduced by the PR, so we
> review
> > them carefully and avoid problems due to un-accounted memory allocations
> > like FLINK-15981. (To be fair, for FLINK-15981 the memory allocation was
> > introduced before we start to account for everything memory usage, but
> > noticing such memory allocations early should help us prevent similar
> > problems in the future).
> >
> > Therefore, I'd also like to collect ideas on how do you think the
> template
> > should be updated in this discussion thread.
> >
> > Looking forward to your feedback~!
> >
> > Thank you~
> >
> > Xintong Song
> >
>
>


Re: [ANNOUNCE] New Documentation Style Guide

2020-02-17 Thread Hequn Cheng
Thanks Marta and Aljoscha for the detailed document! This is very helpful.

Best,
Hequn

On Mon, Feb 17, 2020 at 9:07 PM Aljoscha Krettek 
wrote:

> Just to be clear: I didn't write this style guide. Marta (in cc) wrote
> this, I just finally merged it.
>
> Best,
> Aljoscha
>
> On 17.02.20 11:45, jincheng sun wrote:
> > Thanks for this great job Aljoscha!
> >
> > Best,
> > Jincheng
> >
> >
> >
> > Yu Li  于2020年2月17日周一 下午6:42写道:
> >
> >> I think the guide itself is a great example of the new style! Thanks for
> >> driving this, Aljoscha!
> >>
> >> Best Regards,
> >> Yu
> >>
> >>
> >> On Mon, 17 Feb 2020 at 16:44, Becket Qin  wrote:
> >>
> >>> The guideline is very practical! I like it! Thanks for putting it
> >> together,
> >>> Aljoscha.
> >>>
> >>> Jiangjie (Becket) Qin
> >>>
> >>> On Mon, Feb 17, 2020 at 10:04 AM Xintong Song 
> >>> wrote:
> >>>
>  Thanks for the summary!
> 
>  I've read through the guidelines and found it very helpful. Many of
> the
>  questions I had while working on the 1.10 docs the answer can be found
> >> in
>  the guideline. And it also inspires me with questions I have never
> >>> thought
>  of, especially the language style part.
> 
>  Thank you~
> 
>  Xintong Song
> 
> 
> 
>  On Sun, Feb 16, 2020 at 12:55 AM Zhijiang
>   wrote:
> 
> > Thanks for bringing this great and valuable document.
> >
> > I read through the document and was inspired especially by some
> >>> sections
> > in "Voice and Tone" and "General Guiding Principles".
> >   I think it is not only helpful for writing Flink documents, but
> also
> > provides guideline/benefit for other writings.
> > It also reminded me to extend the Flink glossary if necessary.
> >
> > Best,
> > Zhijiang
> >
> >
> > --
> > From:Jingsong Li 
> > Send Time:2020 Feb. 15 (Sat.) 23:21
> > To:dev 
> > Subject:Re: [ANNOUNCE] New Documentation Style Guide
> >
> > Thank for the great work,
> >
> > In 1.10, I have modified and reviewed some documents. In that
> >> process,
> > sometimes there is some confusion, how to write is the standard. How
> >> to
> > write is correct to the users.
> > Docs style now tells me. Learned a lot.
> >
> > Best,
> > Jingsong Lee
> >
> > On Sat, Feb 15, 2020 at 10:00 PM Dian Fu 
> >>> wrote:
> >
> >> Thanks for the great work! This is very helpful to keep the
>  documentation
> >> style consistent across the whole project. It's also very helpful
> >> for
> >> non-native English contributors like me.
> >>
> >>> 在 2020年2月15日,下午3:42,Jark Wu  写道:
> >>>
> >>> Great summary! Thanks for adding the translation specification in
> >>> it.
> >>> I learned a lot from the guide.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Fri, 14 Feb 2020 at 23:39, Aljoscha Krettek <
> >>> aljos...@apache.org>
> >> wrote:
> >>>
>  Hi Everyone,
> 
>  we just merged a new style guide for documentation writing:
>  https://flink.apache.org/contributing/docs-style.html.
> 
>  Anyone who is writing documentation or is planning to do so
> >> should
> > check
>  this out. Please open a Jira Issue or respond here if you have
> >> any
>  comments or questions.
> 
>  Some of the most important points in the style guide are:
> 
>    - We should use direct language and address the reader as you
>  instead
>  of passive constructions. Please read the guide if you want to
>  understand what this means.
> 
>    - We should use "alert blocks" instead of simple inline alert
> >>> tags.
>  Again, please refer to the guide to see what this means exactly
> >> if
>  you're not sure.
> 
>  There's plenty more and some interesting links about
>  technical/documentation writing as well.
> 
>  Best,
>  Aljoscha
> 
> >>
> >>
> >
> > --
> > Best, Jingsong Lee
> >
> >
> 
> >>>
> >>
> >
>


[RESULT][VOTE] Release flink-shaded 10.0, release candidate #3

2020-02-17 Thread Chesnay Schepler

I'm happy to announce that we have unanimously approved this release.

There are XXX approving votes, XXX of which are binding:

- Yu (non-binding)
- Ufuk (binding)
- Zhu (non-binding)
- Hequn (non-binding)
- Dian (non-binding)
- Robert (binding)
- Zhijiang (non-binding)
- Jincheng (binding)

There are no disapproving votes.

Thanks everyone!

On 17/02/2020 08:12, jincheng sun wrote:

+1 (binding)

With simple check;
   - Downloaded the source and built successfully
   - Verified the signature and checksum

Best,
Jincheng


Zhijiang  于2020年2月16日周日 上午12:11写道:


+1 (non-binding)

- Checked the release notes
- Downloaded the source and built successfully
- Verified the signature and checksum
- Verified the artifacts in the Maven Central Repository

Best,
Zhijiang


--
From:Robert Metzger 
Send Time:2020 Feb. 14 (Fri.) 21:35
To:dev 
Subject:Re: [VOTE] Release flink-shaded 10.0, release candidate #3

+1 (binding)

- Checked some artifacts in the staging repo
  - checked license documentation
- source release is binary free

On Fri, Feb 14, 2020 at 8:01 AM Dian Fu  wrote:


+1 (non-binding)

- Verified the signature and checksum
- Checked the release note that all the tickets included in this release
are there
- Checked the website PR and it LGTM
- Checked the notice file of the newly added module
flink-shade-zookeeper-3 and it LGTM

Regards,
Dian


在 2020年2月14日,下午2:58,Hequn Cheng  写道:

Thank you Chesnay for the release!

+1 (non-binding)

- The problem that exists in RC1 has been resolved.
- Release notes looks good.
- Built from source archive successfully.
- Check commit history manually. Nothing looks weird.
- Signatures and hash are correct.
- All artifacts have been deployed to the maven central repository.
- The website pull request looks good

Best, Hequn

On Fri, Feb 14, 2020 at 1:14 AM Zhu Zhu  wrote:


+1 (non-binding)

- checked release notes, JIRA tickets and commit history
- verified the signature and checksum
- checked the maven central artifacts
  * examined the zookeeper shaded jars (both 3.4.10 and 3.5.6), curator

and

zookeeper classes are there and shaded
- built from the source archive as well as the git tag
- checked the website pull request

Thanks,
Zhu Zhu

Ufuk Celebi  于2020年2月14日周五 上午12:32写道:


PS: Also verified the NOTICE changes since the last RC.

On Thu, Feb 13, 2020 at 5:25 PM Ufuk Celebi  wrote:


Hey Chensay,

+1 (binding).

- Verified checksum ✅
- Verified signature ✅
- Jira changelog looks good to me ✅
- Website PR looks good to me ✅
- Verified no unshaded dependencies (except the Hadoop modules

which I

think is expected) ✅
- Verified dependency management fix FLINK-15540
(commons-collections:3.2.2 as expected) ✅
- Verified pom exclusion fix FLINK-15815 (no META-INF/maven except

for

flink-shaded-force-shading and the Hadoop modules which I think is
expected) ✅

– Ufuk

On Thu, Feb 13, 2020 at 3:08 PM Yu Li  wrote:

+1 (non-binding)

Checked issues listed in release notes: ok
Checked sums and signatures: ok
Checked the maven central artifices: ok
Built from source: ok (8u101, 11.0.4)
Built from source (with -Dshade-sources): ok (8u101, 11.0.4)
Checked contents of zookeeper shaded jars: ok
- no unshaded classes
- shading pattern is correct
Checked website pull request listing the new release: ok

Best Regards,
Yu


On Wed, 12 Feb 2020 at 22:09, Chesnay Schepler 
wrote:

Hi everyone,
Please review and vote on the release candidate #3 for the version

10.0,

as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific

comments)


The complete staging area is available for your review, which

includes:

* JIRA release notes [1],
* the official Apache source release to be deployed to

dist.apache.org

[2], which are signed with the key with fingerprint 11D464BA [3],
* all artifacts to be deployed to the Maven Central Repository

[4],

* source code tag "release-10.0-rc3 [5],
* website pull request listing the new release [6].

The vote will be open for at least 72 hours. It is adopted by

majority

approval, with at least 3 PMC affirmative votes.

Thanks,
Chesnay

[1]



https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346746

[2]

https://dist.apache.org/repos/dist/dev/flink/flink-shaded-10.0-rc3/

[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4]

https://repository.apache.org/content/repositories/orgapacheflink-1337

[5]



https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=tag;h=refs/tags/release-10.0-rc3

[6] https://github.com/apache/flink-web/pull/304












[jira] [Created] (FLINK-16120) Update the hive section of the connectors doc

2020-02-17 Thread Rui Li (Jira)
Rui Li created FLINK-16120:
--

 Summary: Update the hive section of the connectors doc
 Key: FLINK-16120
 URL: https://issues.apache.org/jira/browse/FLINK-16120
 Project: Flink
  Issue Type: Task
  Components: Connectors / Hive, Documentation
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16121) Introduce ArrowReader and ArrowWriter for Arrow format data read and write

2020-02-17 Thread Dian Fu (Jira)
Dian Fu created FLINK-16121:
---

 Summary: Introduce ArrowReader and ArrowWriter for Arrow format 
data read and write
 Key: FLINK-16121
 URL: https://issues.apache.org/jira/browse/FLINK-16121
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


As the title described, this aim of this JIRA is to introduce classes such as 
ArrowReader which is used to read the execution results of vectorized Python 
UDF and ArrowWriter which is used to convert Flink rows to Arrow format before 
sending them to the Python worker for vectorized Python UDF execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Hotfixes on the master

2020-02-17 Thread Robert Metzger
Hi all,

I would like to revive this very old thread on the topic of "unreviewed
hotfixes on master" again.
Out of the 35 commits listed on the latest commits page on GitHub, 18 have
the tag "hotfix", on the next page it is 9, then 16, 17, ...
In the last 140 commits, 42% were hotfixes.

For the sake of this discussion, let's distinguish between two types of
hotfixes:
a) *reviewed hotfix commits*: They have been reviewed through a pull
request, then committed to master.
b) *unreviewed hotfix commits*: These have been pushed straight to master,
without a review.

It's quite difficult to find out whether a hotfix has been reviewed or not
(because many hotfix commits are reviewed & pushed as part of a PR), but
here are some recent examples of commits where I could not find evidence of
a pull request:

// these could probably be combined into on JIRA ticket, as they affect the
same component + they touch dependencies
47a1725ae14a772ba8590ee97dffd7fdf5bc04b2 [hotfix][docs][conf] Log included
packages / excluded classes
a5894677d95336a67d5539584b9204bcdd14fac5 [hotfix][docs][conf] Setup logging
for generator
325927064542c2d018f9da33660c1cdf57e0e382 [hotfix][docs][conf] Add query
service port to port section
3c696a34145e838c046805b36553a50ec9bfbda0 [hotfix][docs][conf] Add query
service port to port section

// dependency change
736ebc0b40abab88902ada3f564777c3ade03001 [hotfix][build] Remove various
unused test dependencies

// more than a regeneration / typo / compile error change
30b5f6173e688ea20b82226db6923db19dec29a5 [hotfix][tests] Adjust
FileUtilsTest.testDeleteSymbolicLinkDirectory() to handle unsupported
situations in Windows
fc59aa4ecc2a7170bfda14ffadf0a30aa2b793bf [FLINK-16065][core] Unignore
FileUtilsTest.testDeleteDirectoryConcurrently()

// dependency changes
fe7145787a7f36b21aad748ffea4ee8ab03c02b7 [hotfix][build] Remove unused
akka-testkit dependencies
dd34b050e8e7bd4b03ad0870a432b1631e1c0e9d [hotfix][build] Remove unused
shaded-asm7 dependencies

// dependency changes
244d2db78307cd7dff1c60a664046adb6fe5c405 [hotfix][web][build] Cleanup
dependencies

In my opinion, everything that is not a typo, a compile error (breaking the
master) or something generated (like parts of the docs) should go through a
quick pull request.
Why? I don't think many people review changes in the commit log in a way
they review pull request changes.

In addition to that, I propose to prefix hotfixes that have been added as
part of a ticket with that ticket number.
So instead of "[hotfix] Harden kubernetes test", we do "[FLINK-13978][minor]
Harden kubernetes test".
Why? For people checking the commit history, it is much easier to see if a
hotfix has been reviewed as part of a JIRA ticket review, or whether it is
a "hotpush" hotfix.

For changes that are too small for a JIRA ticket, but need a review, I
propose to use the "[minor]" tag. A good example of such a change is this:
https://github.com/apache/flink/commit/0dc4e767c9c48ac58430a59d05185f2b071f53f5

My tagging minor changes accordingly in the pull requests, it is also
easier for fellow committers to quickly check them.

Summary:
[FLINK-]: regular, reviewed change
[FLINK-][minor]: minor, unrelated changes reviewed with a regular ticket
[minor]: minor, reviewed change
[hotfix]: unreviewed change that fixes a typo, compile error or something
generated


What's your opinion on this?


On Sat, May 28, 2016 at 1:36 PM Vasiliki Kalavri 
wrote:

> Hi all,
>
> in principle I agree with Max. I personally avoid hotfixes and always open
> a PR, even for javadoc improvements.
>
> I believe the main problem is that we don't have a clear definition of what
> constitutes a "hotfix". Ideally, even cosmetic changes and documentation
> should be reviewed; I've seen documentation added as a hotfix that had
> spelling mistakes, which led to another hotfix... Using hotfixes to do
> major refactoring or add features is absolutely unacceptable, in my view.
> On the other hand, with the current PR load it's not practical to ban
> hotfixes all together.
>
> I would suggest to update our contribution guidelines with some definition
> of a hotfix. We could add a list of questions to ask before pushing one.
> e.g.:
> - does the change fix a spelling mistake in the docs? => hotfix
> - does the change add a missing javadoc? => hotfix
> - does the change improve a comment? => hotfix?
> - is the change a small refactoring in a code component you are maintainer
> of? => hotfix
> - did you change code in a component you are not very familiar with / not
> the maintainer of? => open PR
> - is this major refactoring? (e.g. more than X lines of code) => open PR
> - does it fix a trivial bug? => open JIRA and PR
>
> and so on...
>
> What do you think?
>
> Cheers,
> -V.
>
> On 27 May 2016 at 17:40, Greg Hogan  wrote:
>
> > Max,
> >
> > I certainly agree that hotfixes are not ideal for large refactorings and
> > new features. Some thoughts ...
> >
> > A hotfix should be maven verified, as should a re

[jira] [Created] (FLINK-16122) build system: transfer.sh uploads are unstable (February 2020)

2020-02-17 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-16122:
--

 Summary: build system: transfer.sh uploads are unstable (February 
2020)
 Key: FLINK-16122
 URL: https://issues.apache.org/jira/browse/FLINK-16122
 Project: Flink
  Issue Type: Task
  Components: Build System
Reporter: Robert Metzger


This issue has been brought up on the dev@ list: 
https://lists.apache.org/thread.html/rb6661e419b869f040e66a4dd46022fd11961e8e5aebe646b2260f6f8%40%3Cdev.flink.apache.org%3E

Issues:
- timeouts
- logs not available





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Remove transfer.sh from CI setup

2020-02-17 Thread Robert Metzger
Tracking this here: https://issues.apache.org/jira/browse/FLINK-16122

On Sat, Feb 15, 2020 at 8:12 PM Robert Metzger  wrote:

> I agree that we need to fix this.
>
> We could either misuse the "build artifact" feature of azure pipelines to
> publish the logs, or we set up something simple for Flink (like running an
> instance of https://github.com/lachs0r/0x0 or
> https://github.com/dutchcoders/transfer.sh :) )
>
> On Fri, Feb 14, 2020 at 8:44 PM Chesnay Schepler 
> wrote:
>
>> The S3 setup only works in the apache repo though; not on contributor
>> branches or PR builds.
>>
>> We can tighten the timeouts (already talked to Robert about that), at
>> which point it doesn't hurt.
>>
>> On 14/02/2020 18:28, Stephan Ewen wrote:
>> > Hi all!
>> >
>> > I propose to remove the log upload via transfer.sh and rely on the S3
>> > upload instead.
>> >
>> > The reason is that transfer.sh seems to be very unreliable (times out in
>> > many profiles recently) and it seems that we also often don't get
>> access to
>> > uploaded logs (errors out on the transfer.sh website).
>> >
>> > Best,
>> > Stephan
>> >
>>
>>


[jira] [Created] (FLINK-16123) Add routable Kafka connector

2020-02-17 Thread Igal Shilman (Jira)
Igal Shilman created FLINK-16123:


 Summary: Add routable Kafka connector
 Key: FLINK-16123
 URL: https://issues.apache.org/jira/browse/FLINK-16123
 Project: Flink
  Issue Type: Task
  Components: Stateful Functions
Reporter: Igal Shilman


In some cases it is beneficial to associate a stateful function instance with a 
key in a Kafka topic.

In that case, a simplified Kafka ingress definition can be introduced. 

Consider the following example:

Imagine a Kafka topic named "signups" (1) where the keys are ut8 strings 
representing user ids,

and the values are Protobuf messages of type (2) 
com.user.foo.bar.greeter.SingupMessage.

We would like to have a stateful function of type(3)
{code:java}
FunctionType( com.user.foo.bar, SingupProcessor{code}
to be invoked for each incoming signup message.

The following spec definition:
{code:java}

  - ingress:
  meta:
type:  org.apache.flink.statefun.sdk.kafka/routable-kafka-connector
id: com.user.foo.bar/greeter
  spec:
properties:
  - consumer.group: greeter
topics: 
  - singups: (1)
  typeUrl: (2) "com.user.foo.bar.greeter.SingupMessage"
  target: (3) "com.user.foo.bar/SingupProcessor"

{code}


Defines a Kafka ingress that consumes   from a singups 
topic,
and produces an Routable Protobuf message with the following type and 
properties:

{code}
message Routable {
   Address target; (1)
   Any payload;
}
{code}
Where:
(1) is Address(FunctionType(com.user.foo.bar, SingupProcessor),  )
(2) the Any's typeUrl would be com.user.foo.bar.greeter.SingupMessage and the 
value bytes
would come directly from the consumer record value bytes

This would require an additional AutoRoutable router,
that basically forwards the payload to the target address.

 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

2020-02-17 Thread Bowen Li
Hi all,

If there's no more comments, I would like to kick off a vote for this FLIP
[1].

FYI, the flip number is changed to 93 since there was a race condition of
taking 92.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog

On Wed, Jan 22, 2020 at 11:05 AM Bowen Li  wrote:

> Hi Flavio,
>
> First, this is a generic question on how flink-jdbc is set up, not
> specific to jdbc catalog, thus is better to be on its own thread.
>
> But to just quickly answer your question, you need to see where the
> incompatibility is. There may be incompatibility on 1) jdbc drivers and 2)
> the databases. 1) is fairly stable and back-compatible. 2) normally has
> things to do with your queries, not the driver.
>
>
>
> On Tue, Jan 21, 2020 at 3:21 PM Flavio Pompermaier 
> wrote:
>
>> Hi all,
>> I'm happy to see a lot of interest in easing the integration with JDBC
>> data
>> sources. Maybe this could be a rare situation (not in my experience
>> however..) but what if I have to connect to the same type of source (e.g.
>> Mysql) with 2 incompatible version...? How can I load the 2 (or more)
>> connectors jars without causing conflicts?
>>
>> Il Mar 14 Gen 2020, 23:32 Bowen Li  ha scritto:
>>
>> > Hi devs,
>> >
>> > I've updated the wiki according to feedbacks. Please take another look.
>> >
>> > Thanks!
>> >
>> >
>> > On Fri, Jan 10, 2020 at 2:24 PM Bowen Li  wrote:
>> >
>> > > Thanks everyone for the prompt feedback. Please see my response below.
>> > >
>> > > > In Postgress, the TIME/TIMESTAMP WITH TIME ZONE has the
>> > > java.time.Instant semantic, and should be mapped to Flink's
>> > TIME/TIMESTAMP
>> > > WITH LOCAL TIME ZONE
>> > >
>> > > Zhenghua, you are right that pg's 'timestamp with timezone' should be
>> > > translated into flink's 'timestamp with local timezone'. I don't find
>> > 'time
>> > > with (local) timezone' though, so we may not support that type from
>> pg in
>> > > Flink.
>> > >
>> > > > I suggest that the parameters can be completely consistent with the
>> > > JDBCTableSource / JDBCTableSink. If you take a look to JDBC api:
>> > > "DriverManager.getConnection".
>> > > That allow "default db, username, pwd" things optional. They can
>> included
>> > > in URL. Of course JDBC api also allows establishing connections to
>> > > different databases in a db instance. So I think we don't need
>> provide a
>> > > "base_url", we can just provide a real "url". To be consistent with
>> JDBC
>> > > api.
>> > >
>> > > Jingsong, what I'm saying is a builder can be added on demand later if
>> > > there's enough user requesting it, and doesn't need to be a core part
>> of
>> > > the FLIP.
>> > >
>> > > Besides, unfortunately Postgres doesn't allow changing databases via
>> > JDBC.
>> > >
>> > > JDBC provides different connecting options as you mentioned, but I'd
>> like
>> > > to keep our design and API simple and having to handle extra parsing
>> > logic.
>> > > And it doesn't shut the door for what you proposed as a future effort.
>> > >
>> > > > Since the PostgreSQL does not have catalog but schema under
>> database,
>> > > why not mapping the PG-database to Flink catalog and PG-schema to
>> Flink
>> > > database
>> > >
>> > > Danny, because 1) there are frequent use cases where users want to
>> switch
>> > > databases or referencing objects across databases in a pg instance 2)
>> > > schema is an optional namespace layer in pg, it always has a default
>> > value
>> > > ("public") and can be invisible to users if they'd like to as shown in
>> > the
>> > > FLIP 3) as you mentioned it is specific to postgres, and I don't feel
>> > it's
>> > > necessary to map Postgres substantially different than others DBMSs
>> with
>> > > additional complexity
>> > >
>> > > >'base_url' configuration: We are following the configuration format
>> > > guideline [1] which suggest to use dash (-) instead of underline (_).
>> And
>> > > I'm a little confused the meaning of "base_url" at the first glance,
>> > > another idea is split it into several configurations: 'driver',
>> > 'hostname',
>> > > 'port'.
>> > >
>> > > Jark, I agreed we should use "base-url" in yaml config.
>> > >
>> > > I'm not sure about having hostname and port separately because you can
>> > > specify multiple hosts with ports in jdbc, like
>> > > "jdbc:dbms/host1:port1,host2:port2/", for connection failovers.
>> > Separating
>> > > them would make configurations harder.
>> > >
>> > > I will add clear doc and example to avoid any possible confusion.
>> > >
>> > > > 'default-database' is optional, then which database will be used or
>> > what
>> > > is the behavior when the default database is not selected.
>> > >
>> > > This should be DBMS specific. For postgres, it will be the 
>> > > database.
>> > >
>> > >
>> > > On Thu, Jan 9, 2020 at 9:48 PM Zhenghua Gao  wrote:
>> > >
>> > >> Hi Bowen, Thanks for driving this.
>> > >> I think it would be very convenience to use tables in external DBs
>> with
>> > >> JDBC Catalog.

[DISCUSS] AssignerWithPeriodicWatermarks with max delay

2020-02-17 Thread Eduardo Winpenny Tejedor
Hi all,

I've been using Apache Flink for the last few months but I'm new to
the dev community. I'd like to contribute code (and possibly more) to
the community and I've been advised a good starting point would be
suggesting improvements for those areas that I found lacking. I'll
create a separate [DISCUSS] thread for each of those (if this is
indeed the process!).

-- Problem statement --

In my use cases I've had to output data at regular (event time)
intervals, regardless of whether there's been any events flowing
through the app. For those occasions when no events flow I've been
happy to delay the emission of data for some time. This amount of time
is reasonable and still several times larger than the worse delays of
my event bus. It also meets the business requirements :)

Flink's documentation suggests marking a source as temporarily idle
for such occasions but to the best of my knowledge it will not advance
the watermark if there's no events at all flowing through the system.


-- Proposed solution --

Provide all implementations of AssignerWithPeriodicWatermarks in the
Flink project a mechanism to specify a max time delay after which the
watermark will advance if no events have been processed. The watermark
will always stay as far as the specified delay when advanced in this
way.

To achieve backward compatibility I suggest providing the
implementations of AssignerWithPeriodicWatermarks with a builder
method that'll allow to specify said max delay. Other options to
introducing this change in a non-invasive way are welcome.

I'm hoping for your suggestions/comments/questions.

Thanks,
Eduardo


Re: [DISCUSS] Improve history server with log support

2020-02-17 Thread Rong Rong
Hi All,

Thank you all for the prompt feedbacks. Based on the discussion I think
this seems to be a very useful feature.
I would start an initial draft of a design doc (or should it be a FLIP?)
and share with the community.



Hi Yang, Thanks for the interest and thanks for sharing the ideas.

In fact, I couldn't agree more with this point:

> So i am thinking whether we could provide a unified api and storage format
> for logs. Then
> we could add different implementation for storage type and use it in
> history server. And users
> could get the logs from the history server just like Flink cluster is
> running.


This was our initial intention, since
1. utilizing Flink HS the same as a RUNNING cluster one is very tempting
based on our user feedback: there's no learning curve because the UI looks
almost exactly the same!
2. each cluster environment handles log aggregation a bit differently. It
would always be best to unified the API and let each individual cluster
module to extend it.


There is one caveat of utilizing Flink HS for this use case in our initial
study/experiment:
In our YARN cluster, we observed a few, but not negligible, failures are
neither due to the job nor due to Flink itself - these are related to
hardware failure or network connection issues. In this case there would be
no time for the JM to upload the ArchivedExecutionGraph to the underlying
filesystem. Our thought is to periodically make archives to the HS
filesystem, but this is only a thought and still have many details to iron
out.

We would share the design doc soon, and we would love to hear more of your
ideas and looking forward to your feedbacks.


Thanks,
Rong


On Sun, Feb 16, 2020 at 7:02 PM Yang Wang  wrote:

>  Hi Rong Rong,
>
>
> Thanks for starting this discussion. I think the log is an important part
> of improving user
> experience of Flink. The logs is very important for debugging problems or
> checking the
> expected output. Some users, especially for machine learning, print global
> steps or
> residual to the logs. When the application is finished successfully or not,
> the logs should
> also be accessible.
>
>
> Currently, when deploying Flink on Yarn, the application logs will be
> aggregated to HDFS
> on a configured path classified by host. The command `yarn application
> logs` could be used
> to get the logs to local.
>
>
> For K8s deployment, daemon set or sidecar container could be used to
> collect logs to
> persistent storage(e.g. HDFS, S3, elastic search, etc.).
>
>
> So i am thinking whether we could provide a unified api and storage format
> for logs. Then
> we could add different implementation for storage type and use it in
> history server. And users
> could get the logs from the history server just like Flink cluster is
> running.
>
>
>
> Best,
> Yang
>
> Venkata Sanath Muppalla  于2020年2月15日周六 下午3:19写道:
>
> > @Xiaogang Could please share more details about the trace mechanism you
> > mentioned. As Rong mentioned, we are also working on something similar.
> >
> > On Fri, Feb 14, 2020, 9:12 AM Rong Rong  wrote:
> >
> > > Thank you for the prompt feedbacks
> > >
> > > @Aljoscha. Yes you are absolutely correct - adding Hadoop dependency to
> > > cluster runtime component is definitely not what we are proposing.
> > > We were trying to see how the community thinks about the idea of adding
> > log
> > > support into History server.
> > >   - The reference to this JIRA ticket is more on the intention rather
> > than
> > > the solution. -  in fact the intention is slightly different, we were
> > > trying to put it in the history server while the original JIRA proposed
> > to
> > > add it in the live runtime modules.
> > >   - IMO, in order to support different cluster environments: the
> generic
> > > cluster component should only provide an interface, where each cluster
> > impl
> > > module should extend from.
> > >
> > >
> > > @Xiaogang, thank you for bringing up the idea of utilizing a trace
> > system.
> > >
> > > The event tracing would definitely provide additional, in fact more
> > > valuable information for debugging purposes.
> > > In fact we were also internally experimenting with the idea similar to
> > > Spark's ListenerInterface [1] to capture some of the important messages
> > > sent via akka.
> > > But we are still in a very early preliminary stage, thus we haven't
> > > included them in this discussion.
> > >
> > > We would love to hear more regarding the trace system you proposed.
> could
> > > you share more information regarding this?
> > > Such as how would the live events being listened; how would the trace
> > being
> > > collected/stored; etc.
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://spark.apache.org/docs/2.0.2/api/java/org/apache/spark/scheduler/SparkListener.html
> > >
> > > Thanks,
> > > Rong
> > >
> > >
> > > On Thu, Feb 13, 2020 at 7:33 AM Aljoscha Krettek 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > what's the difference in approach to the mentioned related Jira Issu

[jira] [Created] (FLINK-16124) Add a AWS Kinesis Stateful Functions Ingress

2020-02-17 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-16124:
---

 Summary: Add a AWS Kinesis Stateful Functions Ingress
 Key: FLINK-16124
 URL: https://issues.apache.org/jira/browse/FLINK-16124
 Project: Flink
  Issue Type: New Feature
  Components: Stateful Functions
Affects Versions: statefun-1.1
Reporter: Tzu-Li (Gordon) Tai


AWS Kinesis is also a popularly used sources for Apache Flink applications, 
given their capability to reset their consumer position to a specific offset 
that works well with Flink's fault-tolerance model. This also applies to 
Stateful Functions, and having a shipped ingress for Kinesis supported will 
also ease the use of Stateful Functions for AWS users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Improve history server with log support

2020-02-17 Thread SHI Xiaogang
Hi all,

Thanks a lot for your interest. We are very interesting to contribute the
trace system to the community. We will draft a design document and share it
soon.

The trace system actually is an complement to existing metric and logging
systems, and definitely can not replace the logging system. The proposal
here, in my opinion, is a good improvement to existing logging system. The
trace system is not much related to the proposal. We can discuss the trace
system in a separated thread.

Regarding the proposal here, i tagree that we should unify the API to
collect logs in different clusters. I am really looking forward to Rong's
design.

Regards,
Xiaogang

Rong Rong  于2020年2月18日周二 上午5:24写道:

> Hi All,
>
> Thank you all for the prompt feedbacks. Based on the discussion I think
> this seems to be a very useful feature.
> I would start an initial draft of a design doc (or should it be a FLIP?)
> and share with the community.
>
>
>
> Hi Yang, Thanks for the interest and thanks for sharing the ideas.
>
> In fact, I couldn't agree more with this point:
>
> > So i am thinking whether we could provide a unified api and storage
> format
> > for logs. Then
> > we could add different implementation for storage type and use it in
> > history server. And users
> > could get the logs from the history server just like Flink cluster is
> > running.
>
>
> This was our initial intention, since
> 1. utilizing Flink HS the same as a RUNNING cluster one is very tempting
> based on our user feedback: there's no learning curve because the UI looks
> almost exactly the same!
> 2. each cluster environment handles log aggregation a bit differently. It
> would always be best to unified the API and let each individual cluster
> module to extend it.
>
>
> There is one caveat of utilizing Flink HS for this use case in our initial
> study/experiment:
> In our YARN cluster, we observed a few, but not negligible, failures are
> neither due to the job nor due to Flink itself - these are related to
> hardware failure or network connection issues. In this case there would be
> no time for the JM to upload the ArchivedExecutionGraph to the underlying
> filesystem. Our thought is to periodically make archives to the HS
> filesystem, but this is only a thought and still have many details to iron
> out.
>
> We would share the design doc soon, and we would love to hear more of your
> ideas and looking forward to your feedbacks.
>
>
> Thanks,
> Rong
>
>
> On Sun, Feb 16, 2020 at 7:02 PM Yang Wang  wrote:
>
> >  Hi Rong Rong,
> >
> >
> > Thanks for starting this discussion. I think the log is an important part
> > of improving user
> > experience of Flink. The logs is very important for debugging problems or
> > checking the
> > expected output. Some users, especially for machine learning, print
> global
> > steps or
> > residual to the logs. When the application is finished successfully or
> not,
> > the logs should
> > also be accessible.
> >
> >
> > Currently, when deploying Flink on Yarn, the application logs will be
> > aggregated to HDFS
> > on a configured path classified by host. The command `yarn application
> > logs` could be used
> > to get the logs to local.
> >
> >
> > For K8s deployment, daemon set or sidecar container could be used to
> > collect logs to
> > persistent storage(e.g. HDFS, S3, elastic search, etc.).
> >
> >
> > So i am thinking whether we could provide a unified api and storage
> format
> > for logs. Then
> > we could add different implementation for storage type and use it in
> > history server. And users
> > could get the logs from the history server just like Flink cluster is
> > running.
> >
> >
> >
> > Best,
> > Yang
> >
> > Venkata Sanath Muppalla  于2020年2月15日周六 下午3:19写道:
> >
> > > @Xiaogang Could please share more details about the trace mechanism you
> > > mentioned. As Rong mentioned, we are also working on something similar.
> > >
> > > On Fri, Feb 14, 2020, 9:12 AM Rong Rong  wrote:
> > >
> > > > Thank you for the prompt feedbacks
> > > >
> > > > @Aljoscha. Yes you are absolutely correct - adding Hadoop dependency
> to
> > > > cluster runtime component is definitely not what we are proposing.
> > > > We were trying to see how the community thinks about the idea of
> adding
> > > log
> > > > support into History server.
> > > >   - The reference to this JIRA ticket is more on the intention rather
> > > than
> > > > the solution. -  in fact the intention is slightly different, we were
> > > > trying to put it in the history server while the original JIRA
> proposed
> > > to
> > > > add it in the live runtime modules.
> > > >   - IMO, in order to support different cluster environments: the
> > generic
> > > > cluster component should only provide an interface, where each
> cluster
> > > impl
> > > > module should extend from.
> > > >
> > > >
> > > > @Xiaogang, thank you for bringing up the idea of utilizing a trace
> > > system.
> > > >
> > > > The event tracing would definitely provide additional, in fact mor

[jira] [Created] (FLINK-16125) Make zookeeper.connect optional for Kafka connectors

2020-02-17 Thread Jiangjie Qin (Jira)
Jiangjie Qin created FLINK-16125:


 Summary: Make zookeeper.connect optional for Kafka connectors
 Key: FLINK-16125
 URL: https://issues.apache.org/jira/browse/FLINK-16125
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Affects Versions: 1.10.0
Reporter: Jiangjie Qin


FLINK-14649 accidentally changed the connector option {{zookeeper.connect}} 
from optional to required for all the Kafka connector versions, while it is 
only required for 0.8. 

The fix would be make it optional again. This does mean that people who are 
using Kafka 0.8 might miss this option and get an error from Kafka code instead 
of Flink code, but given that Kafka 0.8 probably has a small user base now and 
users will still get an error. I think it is fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Would you please give me the permission as a contributor?

2020-02-17 Thread bin dong
Hi men,

I wanna contribute to Apache Flink.
Would you please give me the permission as a contributor?
My JIRA ID is dongbin


Re: Would you please give me the permission as a contributor?

2020-02-17 Thread Benchao Li
Hi bin,

Welcome to the community!

You no longer need contributor permissions. You can simply create a JIRA
ticket and ask to be assigned to work on it.
Please also take a look at the Flink's contribution guidelines [1] for more
information.

[1] https://flink.apache.org/contributing/how-to-contribute.html


bin dong  于2020年2月18日周二 上午10:57写道:

> Hi men,
>
> I wanna contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is dongbin
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn


Re: [DISCUSS][TABLE] Issue with package structure in the Table API

2020-02-17 Thread Jingsong Li
Thanks for bringing this discussion.

+1 to peforming this big change as early as possible.

You solved my question, why we need "_root_". Yes, I don't like this import
too.

And it is very strange that expressionDsl is in api, but can only work in
api.scala. (Because scala extends ImplicitExpressionConversions)

About transition gradually in 3 releases. Can users rewrite to the right
way in one release? Or they must update and update?

Best,
Jingsong Lee

On Fri, Feb 14, 2020 at 12:12 AM Timo Walther  wrote:

> Hi everyone,
>
> thanks for bringing our offline discussion to the mailing list, Dawid.
> This is a very bad mistake that has been made in the past. In general,
> we should discourage putting the terms "java" and "scala" in package
> names as this has side effects on Scala imports.
>
> I really don't like forcing users to put a "_root_" in their imports. It
> also happended to me a couple of times while developing Flink code that
> I was sitting in front of my IDE wondering why the code doesn't compile.
>
> I'm also in favor of peforming this big change as early as possible. I'm
> sure Table API users are already quite annoyed by all the
> changes/refactorings happening. Changing the imports twice or three
> times is even more cumbersome.
>
> Having to import just "org.apache.flink.table.api._" is a big usability
> plus for new users and especially interactive shell/notebook users.
>
> Regards,
> Timo
>
>
> On 13.02.20 14:39, Dawid Wysakowicz wrote:
> > Hi devs,
> >
> > I wanted to bring up a problem that we have in our package structure.
> >
> > As a result of https://issues.apache.org/jira/browse/FLINK-13045 we
> > started advertising importing two packages in the scala API:
> > import org.apache.flink.table.api._
> > import org.apache.flink.table.api.scala._
> >
> > The intention was that the first package (org.apache.flink.table.api)
> > would contain all api classes that are required to work with the unified
> > TableEnvironment. Such as TableEnvironment, Table, Session, Slide and
> > expressionDsl. The second package (org.apache.flink.table.api.scala._)
> > would've been an optional package that contain bridging conversions
> > between Table and DataStream/DataSet APIs including the more specific
> > StreamTableEnvironment and BatchTableEnvironment.
> >
> > The part missing in the original plan was to move all expressions
> > implicit conversions to the org.apache.flink.table.api package. Without
> > that step users of pure table program (that do not use the
> > table-api-scala-bridge module) cannot use the Expression DSL. Therefore
> > we should try to move those expressions as soon as possible.
> >
> > The problem with this approach is that it clashes with common imports of
> > classes from java.* and scala.* packages. Users are forced to write:
> >
> > import org.apache.flink.table.api._
> > import org.apache.flink.table.api.scala_
> > import _root_.scala.collection.mutable.ArrayBuffer
> > import _root_.java.lang.Integer
> >
> > Besides being cumbersome, it also messes up the macro based type
> > extraction (org.apache.flink.api.scala#createTypeInformation) for all
> > classes from scala.* packages. I don't fully understand the reasons for
> > it, but the createTypeInformation somehow drops the _root_ for
> > WeakTypeTags. So e.g. for a call:
> > createTypeInformation[_root_.scala.collection.mutable.ArrayBuffer] it
> > actually tries to construct a TypeInformation for
> > org.apache.flink.table.api.scala.collection.mutable.ArrayBuffer, which
> > obviously fails.
> >
> >
> >
> > What I would suggest for a target solution is to have:
> >
> > 1. for users of unified Table API with Scala ExpressionDSL
> >
> > import org.apache.flink.table.api._ (for TableEnvironment, Tumble etc.
> > and expressions)
> >
> > 2. for users of Table API with scala's bridging conversions
> >
> > import org.apache.flink.table.api._ (for Tumble etc. and expressions)
> > import org.apache.flink.table.api.bridge.scala._ (for bridging
> > conversions and StreamTableEnvironment)
> >
> > 3. for users of unified Table API with Java ExpressionDSL
> >
> > import org.apache.flink.table.api.* (for TableEnvironment, Tumble etc.)
> > import org.apache.flink.table.api.Expressions.* (for Expression dsl)
> >
> > 4. for users of Table API with java's bridging conversions
> >
> > import org.apache.flink.table.api.* (for Tumble etc.)
> > import org.apache.flink.table.api.Expressions.* (for Expression dsl)
> > import org.apache.flink.table.api.bridge.java.*
> >
> > To have that working we need to:
> > * move the scala expression DSL to org.apache.flink.table.api package in
> > table-api-scala module
> > * move all classes from org.apache.flink.table.api.scala and
> > org.apache.flink.table.api.java packages to
> > org.apache.flink.table.api.bridge.scala and
> > org.apache.flink.table.api.bridge.java accordingly and drop the former
> > packages
> >
> > The biggest question I have is how do we want to perform that
> > transition. If we d

[jira] [Created] (FLINK-16126) Translate all connector related pages into Chinese

2020-02-17 Thread Jark Wu (Jira)
Jark Wu created FLINK-16126:
---

 Summary: Translate all connector related pages into Chinese
 Key: FLINK-16126
 URL: https://issues.apache.org/jira/browse/FLINK-16126
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Reporter: Jark Wu
 Fix For: 1.11.0


Translate all connector related pages into Chinese, including pages under 
`docs/dev/connectors/` and `docs/ops/filesystems/`. 

Connector pages under Batch API is not in the plan, because they will be 
dropped in the future. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16127) Translate "Fault Tolerance Guarantees" page of connectors into Chinese

2020-02-17 Thread Jark Wu (Jira)
Jark Wu created FLINK-16127:
---

 Summary: Translate "Fault Tolerance Guarantees" page of connectors 
into Chinese
 Key: FLINK-16127
 URL: https://issues.apache.org/jira/browse/FLINK-16127
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation
Reporter: Jark Wu


The page url is 
https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/connectors/guarantees.html
The markdown file is located in flink/docs/dev/connectors/guarantees.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16128) Translate "Google Cloud PubSub" page into Chinese

2020-02-17 Thread Jark Wu (Jira)
Jark Wu created FLINK-16128:
---

 Summary: Translate "Google Cloud PubSub" page into Chinese
 Key: FLINK-16128
 URL: https://issues.apache.org/jira/browse/FLINK-16128
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation
Reporter: Jark Wu


The page url is 
https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/connectors/pubsub.html
The markdown file is located in flink/docs/dev/connectors/pubsub.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16129) Translate "Overview" page of "File Systems" into Chinese

2020-02-17 Thread Jark Wu (Jira)
Jark Wu created FLINK-16129:
---

 Summary: Translate "Overview" page of "File Systems" into Chinese 
 Key: FLINK-16129
 URL: https://issues.apache.org/jira/browse/FLINK-16129
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation
Reporter: Jark Wu


The page url is 
https://ci.apache.org/projects/flink/flink-docs-master/zh/ops/filesystems/

The markdown file is located in flink/docs/ops/filesystems/index.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16130) Translate "Common Configurations" page of "File Systems" into Chinese

2020-02-17 Thread Jark Wu (Jira)
Jark Wu created FLINK-16130:
---

 Summary: Translate "Common Configurations" page of "File Systems" 
into Chinese 
 Key: FLINK-16130
 URL: https://issues.apache.org/jira/browse/FLINK-16130
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation
Reporter: Jark Wu


The page url is 
https://ci.apache.org/projects/flink/flink-docs-master/zh/ops/filesystems/common.html

The markdown file is located in flink/docs/ops/filesystems/common.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16131) Translate "Amazon S3" page of "File Systems" into Chinese

2020-02-17 Thread Jark Wu (Jira)
Jark Wu created FLINK-16131:
---

 Summary: Translate "Amazon S3" page of "File Systems" into Chinese 
 Key: FLINK-16131
 URL: https://issues.apache.org/jira/browse/FLINK-16131
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation
Reporter: Jark Wu


The page url is 
https://ci.apache.org/projects/flink/flink-docs-master/zh/ops/filesystems/s3.html

The markdown file is located in flink/docs/ops/filesystems/s3.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16132) Translate "Aliyun OSS" page of "File Systems" into Chinese

2020-02-17 Thread Jark Wu (Jira)
Jark Wu created FLINK-16132:
---

 Summary: Translate "Aliyun OSS" page of "File Systems" into 
Chinese 
 Key: FLINK-16132
 URL: https://issues.apache.org/jira/browse/FLINK-16132
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation
Reporter: Jark Wu


The page url is 
https://ci.apache.org/projects/flink/flink-docs-master/zh/ops/filesystems/oss.html

The markdown file is located in flink/docs/ops/filesystems/oss.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16133) Translate "Azure Blob Storage" page of "File Systems" into Chinese

2020-02-17 Thread Jark Wu (Jira)
Jark Wu created FLINK-16133:
---

 Summary: Translate "Azure Blob Storage" page of "File Systems" 
into Chinese 
 Key: FLINK-16133
 URL: https://issues.apache.org/jira/browse/FLINK-16133
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation
Reporter: Jark Wu


The page url is 
https://ci.apache.org/projects/flink/flink-docs-master/zh/ops/filesystems/azure.html

The markdown file is located in flink/docs/ops/filesystems/azure.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)