[jira] [Created] (FLINK-16431) Pass build profile into end to end test script on Azure

2020-03-05 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-16431:
--

 Summary: Pass build profile into end to end test script on Azure
 Key: FLINK-16431
 URL: https://issues.apache.org/jira/browse/FLINK-16431
 Project: Flink
  Issue Type: Bug
  Components: Build System / Azure Pipelines
Reporter: Robert Metzger
Assignee: Robert Metzger


The nightly tests scripts assumes that it has access to {{$PROFILE}}, which 
does not seem to be true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16432) Building Hive connector gives problems

2020-03-05 Thread Niels Basjes (Jira)
Niels Basjes created FLINK-16432:


 Summary: Building Hive connector gives problems
 Key: FLINK-16432
 URL: https://issues.apache.org/jira/browse/FLINK-16432
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.10.0
Reporter: Niels Basjes


When building the current Flink source I keep running to problems with the hive 
connector.
The problems focus around dependencies that are not available by default:
- org.pentaho:pentaho-aggdesigner-algorithm
- javax.jms:jms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16433) TableEnvironment doesn't clear buffered operations when it fails to translate the operation

2020-03-05 Thread Rui Li (Jira)
Rui Li created FLINK-16433:
--

 Summary: TableEnvironment doesn't clear buffered operations when 
it fails to translate the operation
 Key: FLINK-16433
 URL: https://issues.apache.org/jira/browse/FLINK-16433
 Project: Flink
  Issue Type: Bug
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16434) Add document to explain how to pack hive with their own hive dependencies

2020-03-05 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-16434:


 Summary: Add document to explain how to pack hive with their own 
hive dependencies
 Key: FLINK-16434
 URL: https://issues.apache.org/jira/browse/FLINK-16434
 Project: Flink
  Issue Type: Task
  Components: Connectors / Hive, Documentation
Reporter: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16435) Fix ide static check

2020-03-05 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-16435:


 Summary: Fix ide static check 
 Key: FLINK-16435
 URL: https://issues.apache.org/jira/browse/FLINK-16435
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Huang Xingbo
 Fix For: 1.11.0


We will replace since decorator with versionadded to fix ide static check in 
PyFlink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16436) Update Apache downloads link due to INFRA structural changes

2020-03-05 Thread Yu Li (Jira)
Yu Li created FLINK-16436:
-

 Summary: Update Apache downloads link due to INFRA structural 
changes
 Key: FLINK-16436
 URL: https://issues.apache.org/jira/browse/FLINK-16436
 Project: Flink
  Issue Type: Task
  Components: Project Website
Affects Versions: 1.11.0
Reporter: Yu Li
 Fix For: 1.11.0


As titled, the INFRA team has sent an email to PMC member of apache projects 
with title of "[NOTICE] Structural changes to Apache downloads", to remind 
projects to change links to Apache downloads from `www.apache.org/dist/` to 
`https://downloads.apache.org/`. Below is a quote of the main content of the 
email:
{quote}
As of March 2020, we are deprecating www.apache.org/dist/ in favor of
https://downloads.apache.org/ for backup downloads as well as signature
and checksum verification. The primary driver has been splitting up web
site visits and downloads to gain better control and offer a better
service for both downloads and web site visits.

As stated, this does not impact end-users, and should have a minimal
impact on projects, as our download selectors as well as visits to
www.apache.org/dist/ have been adjusted to make use of
downloads.apache.org instead. We do however ask that projects, in their
own time-frame, change references on their own web sites from
www.apache.org/dist/ to downloads.apache.org wherever such references
may exist, to complete the switch in full. We will NOT be turning off
www.apache.org/dist/ in the near future, but would greatly appreciate if
projects could help us transition away from the old URLs in their
documentation and on their download pages.

The standard way of uploading releases[1] will STILL apply, however
there may be a short delay (<= 15 minutes) between releasing and
releases showing up on downloads.apache.org for technical reasons.
{quote}

This JIRA aims at changing all references of download URL in our flink-web 
project accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-111: Docker image unification

2020-03-05 Thread Yangze Guo
Thanks for open this FLIP and summarize the current state of
Dockerfiles, Andrey! +1 for this idea.

I have some minor comments / questions:
- Regarding the flink_docker_utils#install_flink function, I think it
should also support build from local dist and build from a
user-defined archive.
- It seems that the install_shaded_hadoop could be an option of install_flink.
- Should we support JAVA 11? Currently, most of the docker file based on JAVA 8.
- I do not understand how to set config options through
"flink_docker_utils configure"? Does this step happen during the image
build or the container start? If it happens during the image build,
there would be a new image every time we change the config. If it just
a part of the container entrypoint, I think there is no need to add a
configure command, we could just add all dynamic config options to the
args list of "start_jobmaster"/"start_session_jobmanager". Am I
understanding this correctly?


Best,
Yangze Guo

Best,
Yangze Guo


On Wed, Mar 4, 2020 at 5:34 PM Andrey Zagrebin  wrote:
>
> Hi All,
>
> If you have ever touched the docker topic in Flink, you
> probably noticed that we have multiple places in docs and repos which
> address its various concerns.
>
> We have prepared a FLIP [1] to simplify the perception of docker topic in
> Flink by users. It mostly advocates for an approach of extending official
> Flink image from the docker hub. For convenience, it can come with a set of
> bash utilities and documented examples of their usage. The utilities allow
> to:
>
>- run the docker image in various modes (single job, session master,
>task manager etc)
>- customise the extending Dockerfile
>- and its entry point
>
> Eventually, the FLIP suggests to remove all other user facing Dockerfiles
> and building scripts from Flink repo, move all docker docs to
> apache/flink-docker and adjust existing docker use cases to refer to this
> new approach (mostly Kubernetes now).
>
> The first contributed version of Flink docker integration also contained
> example and docs for the integration with Bluemix in IBM cloud. We also
> suggest to maintain it outside of Flink repository (cc Markus Müller).
>
> Thanks,
> Andrey
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification


Re: [DISCUSS] FLIP-76: Unaligned checkpoints

2020-03-05 Thread Arvid Heise
Dear devs,

we conducted some POCs and updated the FLIP accordingly [1].

Key changes:
- POC showed that it is viable to spill only on checkpoint (in contrast to
spilling continuously to avoid overload of external systems)
- Greatly revised/refined recovery and rescaling
- Sketched the required components for persisting/recovery
- Refined migration plan

Since this is the second iteration with no big changes and promising POCs,
we would like to move to voting rather quickly unless we receive concerns
until tomorrow.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints

On Fri, Feb 28, 2020 at 3:30 PM Yu Li  wrote:

> Thanks for the further feedback Zhijiang and Piotr! I think this is a great
> feature and will watch the progress. Please also feel free to involve me in
> discussions/reviews on state-related part. Thanks.
>
> Best Regards,
> Yu
>
>
> On Thu, 27 Feb 2020 at 23:24, Piotr Nowojski  wrote:
>
> > Hi Yu,
> >
> > Re 4.
> >
> > Dynamic switching between unaligned and aligned checkpoints based on some
> > kind of thresholds (timeout, or checkpoint size) is definitely one of the
> > very first improvement that we want to tackle after implementing the MVP.
> > Depending on the time constraints, dynamic switching can make to 1.11 or
> > not. It’s hard to tell for me at this point of time.
> >
> > Piotrek
> >
> > > On 26 Feb 2020, at 15:59, Zhijiang  .INVALID>
> > wrote:
> > >
> > > Thanks for the further explanations, Yu!
> > >
> > > 1. The inflight buffer spilling process is indeed handled
> > asynchronously. While the buffer is not finished spilling, it would not
> be
> > recycled to reuse again.
> > > Your understanding is right. I guess I misunderstood your previous
> > concern of additional memory consumption from the perspective of buffer
> > usage.
> > > My point of no additional memory consumption is from the perspective of
> > total network memory size which would not be increased as a result.
> > >
> > > 2. We treat the inflight buffers as input&output states which are
> > equivalent with existing operator states, and try to make use of all the
> > existing mechanisms for
> > > state handle and assignment during recovery. So i guess for the local
> > recovery it should be the similar case. I would think through whether it
> > has some special
> > > work to do around with local recovery, and then clarify it in FLIP
> after
> > we reach an agreement internally. BTW, this FLIP has not finalized yet.
> > >
> > > 3. Yes, the previous proposal is for measuring how many inflight
> buffers
> > to be spilled which refers to the data size if really taking this way. I
> > think the proposed option
> > > in FLIP are the initial thoughts for various of possibilities. Which
> way
> > we decide to take for the first version, I guess we need to further
> > finalize before voting.
> > >
> > > 4. I think there probably exists the requirements or scenarios from
> > users as you mentioned. Actually we have not finalized the way of
> switching
> > to unaligned checkpoint yet.
> > > Anyway we could provide an option for users to try out this feature at
> > the beginning, although it might be not the most ideal one. Another input
> > is that we know the motivation
> > > of unaligned checkpoint is from the scenarios of backpressure, but it
> > might also performs well in the case of non backpressure, even shorten
> the
> > checkpoint duration without
> > > obvious performance regression in our previous POC testing. So the
> > backpressure might not be the only factor to switch to the unaligned way
> in
> > practice I guess. Anyway your
> > > inputs are valuable for us to make the final decision.
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > >
> > >
> > > --
> > > From:Yu Li 
> > > Send Time:2020 Feb. 26 (Wed.) 15:59
> > > To:dev ; Zhijiang 
> > > Subject:Re: [DISCUSS] FLIP-76: Unaligned checkpoints
> > >
> > > Hi Zhijiang,
> > >
> > > Thanks for the quick reply!
> > >
> > > For the 1st question, please allow me to confirm, that when doing
> > asynchronous checkpointing, disk spilling should happen in background in
> > parallel with receiving/sending new data, or else it would become
> > synchronous, right? Based on such assumption, some copy-on-write like
> > mechanism would be necessary to make sure the new updates won't modify
> the
> > to-be-checkpointed data, and this is where the additional memory
> > consumption comes from.
> > >
> > > About point #2, I suggest we write it down in the FLIP document about
> > local recovery support (if reach a consensus here), to make sure it won't
> > be neglected in later implementation (I believe there're still some work
> to
> > do following existing local recovery mechanism). What do you think?
> > >
> > > For the 3rd topic, do you mean UNALIGNED_WITH_MAX_INFLIGHT_DATA would
> > set some kind of threshold about "how much in-flight data to checkpoint"?
> > If so, could you f

Re: [DISCUSS] FLIP-111: Docker image unification

2020-03-05 Thread Yang Wang
 Hi Andrey,


Thanks for driving this significant FLIP. From the user ML, we could also
know there are
many users running Flink in container environment. Then the docker image
will be the
very basic requirement. Just as you say, we should provide a unified place
for all various
usage(e.g. session, job, native k8s, swarm, etc.).


> About docker utils

I really like the idea to provide some utils for the docker file and entry
point. The
`flink_docker_utils` will help to build the image easier. I am not sure
about the
`flink_docker_utils start_jobmaster`. Do you mean when we build a docker
image, we
need to add `RUN flink_docker_utils start_jobmaster` in the docker file?
Why do we need this?


> About docker entry point

I agree with you that the docker entry point could more powerful with more
functionality.
Mostly, it is about to override the config options. If we support dynamic
properties, i think
it is more convenient for users without any learning curve.
`docker run flink session_jobmanager -D rest.bind-port=8081`


> About the logging

Updating the `log4j-console.properties` to support multiple appender is a
better option.
Currently, the native K8s is suggesting users to debug the logs in this
way[1]. However,
there is also some problems. The stderr and stdout of JM/TM processes could
not be
forwarded to the docker container console.


[1].
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files


Best,
Yang




Andrey Zagrebin  于2020年3月4日周三 下午5:34写道:

> Hi All,
>
> If you have ever touched the docker topic in Flink, you
> probably noticed that we have multiple places in docs and repos which
> address its various concerns.
>
> We have prepared a FLIP [1] to simplify the perception of docker topic in
> Flink by users. It mostly advocates for an approach of extending official
> Flink image from the docker hub. For convenience, it can come with a set of
> bash utilities and documented examples of their usage. The utilities allow
> to:
>
>- run the docker image in various modes (single job, session master,
>task manager etc)
>- customise the extending Dockerfile
>- and its entry point
>
> Eventually, the FLIP suggests to remove all other user facing Dockerfiles
> and building scripts from Flink repo, move all docker docs to
> apache/flink-docker and adjust existing docker use cases to refer to this
> new approach (mostly Kubernetes now).
>
> The first contributed version of Flink docker integration also contained
> example and docs for the integration with Bluemix in IBM cloud. We also
> suggest to maintain it outside of Flink repository (cc Markus Müller).
>
> Thanks,
> Andrey
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification
>


[jira] [Created] (FLINK-16437) Make SlotManager allocate resource from ResourceManager at the worker granularity.

2020-03-05 Thread Xintong Song (Jira)
Xintong Song created FLINK-16437:


 Summary: Make SlotManager allocate resource from ResourceManager 
at the worker granularity.
 Key: FLINK-16437
 URL: https://issues.apache.org/jira/browse/FLINK-16437
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Xintong Song
 Fix For: 1.11.0


This is the first step of FLINK-14106, including all the major changes inside 
SlotManager and changes to the RM/SM interfaces, except changes for metrics and 
status.

At the end of this step, SlotManager should allocate resource from 
ResourceManager with a WorkerResourceSpec, instead of slot ResourceProfile. At 
this step, the WorkerResourceSpec will not be used, and the active RMs will 
always use `ActiveResourceManager#taskExecutorProcessSpec` for requesting TMs. 
We will change that in subsequent steps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16438) Make YarnResourceManager starts workers using WorkerResourceSpec requested by SlotManager

2020-03-05 Thread Xintong Song (Jira)
Xintong Song created FLINK-16438:


 Summary: Make YarnResourceManager starts workers using 
WorkerResourceSpec requested by SlotManager
 Key: FLINK-16438
 URL: https://issues.apache.org/jira/browse/FLINK-16438
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song


This means YarnResourceManager no longer:
 - be aware of the default task executor resources
 - assumes all workers are identical



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16439) Make KubernetesResourceManager starts workers using WorkerResourceSpec requested by SlotManager

2020-03-05 Thread Xintong Song (Jira)
Xintong Song created FLINK-16439:


 Summary: Make KubernetesResourceManager starts workers using 
WorkerResourceSpec requested by SlotManager
 Key: FLINK-16439
 URL: https://issues.apache.org/jira/browse/FLINK-16439
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Reporter: Xintong Song
 Fix For: 1.11.0


This means KubernetesResourceManager no longer:
 - be aware of the default task executor resources
 - assumes all workers are identical



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16440) Extend SlotManager metrics and status for dynamic slot allocation.

2020-03-05 Thread Xintong Song (Jira)
Xintong Song created FLINK-16440:


 Summary: Extend SlotManager metrics and status for dynamic slot 
allocation.
 Key: FLINK-16440
 URL: https://issues.apache.org/jira/browse/FLINK-16440
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Metrics
Reporter: Xintong Song
 Fix For: 1.11.0


* Create a slotManagerMetricGroup in resourceManagerMetricGroup, pass it into 
SM and register slot related metrics there.
 * This allows registering different metrics for different SM implementation.
 * For backwards compatibility, the slotManagerMetricGroup should have the same 
path as the resourceManagerMetricGroup.


 * Extend ResourceOverview and TaskManagerInfo to contain TM total / free / 
allocated resources.
 * Need to add methods to SM for getting TM resource status.
 * For SlotManagerImpl,
 * The existing methods for getting number of registered / free slots need no 
changes.
 * TM resource status can be computed from TaskExecutorProcessSpec, slot 
profiles and number of free slots.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Flink dev blog

2020-03-05 Thread Robert Metzger
+1 to Arvid's proposal.



On Thu, Mar 5, 2020 at 4:14 AM Xingbo Huang  wrote:

> Thanks a for this proposal.
>
> As a new contributor to Flink, it would be very helpful to have such blogs
> for us to understand the future of Flink and get involved
>
> BTW, I have a question whether the dev blog needs a template like FLIP.
>
> Of course, There is no doubt that dev blogs do not need to be as formal as
> FLIP, but templates can be more helpful for developers to understand
> articles.
>
> Best,
>
> Xingbo
>
>
> Arvid Heise  于2020年3月5日周四 上午2:55写道:
>
> > I see that the majority would like to have an uncomplicated process to
> > publish an article first to gather feedback and then like to have
> polished
> > versions on the blog with official review process.
> >
> > Then, the obvious solution is to have a process that is two-fold:
> > * First a draft is published and reviewed by peers. The draft could be
> > polished in smaller increments including proof-reading by native-level
> > writers.
> > * Second, when the draft converged enough, we would then make an official
> > pull request for the dev blog, which would (hopefully) be merged rather
> > quickly.
> >
> > For the draft, we would have a wiki subarea "Engine room", which would be
> > the default location for such drafts. Pages in the wiki would allow for a
> > gradual polishing and may even live comparably long if the author does
> not
> > find the time for polishing. The information is in a semi-published
> state,
> > where devs and experts can already find and use it, but it would not
> > attract as many views as in a blog.
> >
> > But I'd explicitly also allow drafts to go directly to a PR (with risk of
> > having many iterations). I'd even say that if someone feels more
> > comfortable to online editors such as google docs and has enough
> reviewers
> > for that, they could go with it. Here, the author needs to ensure a
> timely
> > progress or revert to the wiki, since all intermediate versions are
> > effectively hidden for non-reviewers.
> >
> > Would the community agree with this approach or do you have concerns? If
> no
> > major concerns are raised, I'd start preparation with the wiki on Monday
> > (03/09/2020).
> >
> > I'd raise the issue about wiki and blog structure, when we got some
> > articles to avoid too many concurrent discussions.
> >
> >
> > On Wed, Mar 4, 2020 at 5:54 PM Zhijiang  > .invalid>
> > wrote:
> >
> > > Big +1 for this proposal and second Ufuk's feeling!
> > >
> > > I guess "Engine room" section in Wiki would attract lots of technical
> > > fans.:)
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > > --
> > > From:Yu Li 
> > > Send Time:2020 Mar. 4 (Wed.) 14:42
> > > To:dev 
> > > Cc:vthinkxie 
> > > Subject:Re: Flink dev blog
> > >
> > > Big +1 on adding a dev blog and starting with wiki. And +1 to promote
> the
> > > fully polished articles to blog web with a formal process.
> > >
> > > The latter one also brings up another good-to-have improvement that
> > adding
> > > categories and navigation in our blog so people could easily find
> > different
> > > topics like release-announcement/events/tech-articles, etc. but I think
> > > we'd better open another thread to keep this one on track (smile).
> > >
> > > I'd also like to add one potential topic around in-production practice
> of
> > > using RocksDB state backend (which seems to be a popular topic in ML
> > > discussions), such as how to enable and monitor RocksDB metrics and do
> > > debugging/perf-tuning with the metrics/logs, and introduce
> > > internals/details around the RocksDB memory management mechanism.
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Wed, 4 Mar 2020 at 11:07, Xintong Song 
> wrote:
> > >
> > > > I also like Ufuk's idea.
> > > >
> > > > The wiki allows people to post on their works in a quick and easier
> > way.
> > > > For me and probably many other Chinese folks, writing and polishing a
> > > > formal article in English usually takes a long time, of which a
> > > significant
> > > > portion is spent on polishing the language. If the blog does not
> > require
> > > > such formal and high quality languages, I believe it will make
> things a
> > > lot
> > > > easier and encourage more people to share their ideas. Besides, it
> also
> > > > avoids putting more review workloads on committers.
> > > >
> > > > Regarding promoting wiki post to the main blog, I think the wiki
> > > feedbacks
> > > > (comment, likes, etc.) could be a great input. We can also contact
> the
> > > > original author before promoting posts to the main blog to refine the
> > > > article (responding to the wiki comments, polishing languages, adding
> > > > latest updates, etc.).
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Wed, Mar 4, 2020 at 10:25 AM Jark Wu  wrote:
> > > >
> > > > > +1 for this.
> > > > >
> > > > > Regarding to the place to hold blogs. Personal

[jira] [Created] (FLINK-16441) Allow users to override flink-conf parameters from SQL CLI environment

2020-03-05 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-16441:
--

 Summary: Allow users to override flink-conf parameters from SQL 
CLI environment
 Key: FLINK-16441
 URL: https://issues.apache.org/jira/browse/FLINK-16441
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Gyula Fora


There is currently no way of overriding flink configuration parameters when 
using the SQL CLI.
The configuration section of the env yaml should provide a way of doing so as 
this is a very important requirement for multi-user/multi-app flink client envs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16442) Make MesosResourceManager starts workers using WorkerResourceSpec requested by SlotManager

2020-03-05 Thread Xintong Song (Jira)
Xintong Song created FLINK-16442:


 Summary: Make MesosResourceManager starts workers using 
WorkerResourceSpec requested by SlotManager
 Key: FLINK-16442
 URL: https://issues.apache.org/jira/browse/FLINK-16442
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Mesos
Reporter: Xintong Song


This means MesosResourceManager no longer:
 - be aware of the default task executor resources
 - assumes all workers are identical

TBH, I'm not sure how many use cases do we have that needs to bring a different 
slot allocation strategy to the Mesos deployment. I think we can discuss 
whether we want to do this step or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Jark Wu
Hi everyone,

We just noticed that everytime a pull request gets merged with the "Squash
and merge" button,
GitHub drops the original authorship information and changes "authored" to
whoever merged the PR.

We found this happened in #11102 [1] and #11302 [2]. It seems that it is a
long outstanding issue
and GitHub is aware of it but doesn't make an attempt to fix it [3][4].

Before this behavior, "authored" is the original author and  "committed" is
the one who merged the PR,
which was pretty good to record the contributor's contribution and the
committed information.

>From the perspective of contributors, it’s really frustrated if their
authorship information gets lost.
Considering we don't know when GitHub will fix it, I propose to disable
"Squash and merge" button
(and also "Rebase and merge" button) before it is fixed.

However, I'm not sure how to disable it. Can it be disabled by GitHub UI if
who has administrator permission?
Or .asf.yaml [5] is the right way?

What do you think?

Best,
Jark

[1]: https://github.com/apache/flink/pull/11102
[2]: https://github.com/apache/flink/pull/11302
[3]: https://github.com/chdsbd/kodiak/issues/300#issuecomment-595016815
[4]: https://github.com/isaacs/github/issues/1750
[5]:
https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-Mergebuttons


Re: Flink dev blog

2020-03-05 Thread Jark Wu
+1 to Arvid's proposal.

On Thu, 5 Mar 2020 at 18:13, Robert Metzger  wrote:

> +1 to Arvid's proposal.
>
>
>
> On Thu, Mar 5, 2020 at 4:14 AM Xingbo Huang  wrote:
>
> > Thanks a for this proposal.
> >
> > As a new contributor to Flink, it would be very helpful to have such
> blogs
> > for us to understand the future of Flink and get involved
> >
> > BTW, I have a question whether the dev blog needs a template like FLIP.
> >
> > Of course, There is no doubt that dev blogs do not need to be as formal
> as
> > FLIP, but templates can be more helpful for developers to understand
> > articles.
> >
> > Best,
> >
> > Xingbo
> >
> >
> > Arvid Heise  于2020年3月5日周四 上午2:55写道:
> >
> > > I see that the majority would like to have an uncomplicated process to
> > > publish an article first to gather feedback and then like to have
> > polished
> > > versions on the blog with official review process.
> > >
> > > Then, the obvious solution is to have a process that is two-fold:
> > > * First a draft is published and reviewed by peers. The draft could be
> > > polished in smaller increments including proof-reading by native-level
> > > writers.
> > > * Second, when the draft converged enough, we would then make an
> official
> > > pull request for the dev blog, which would (hopefully) be merged rather
> > > quickly.
> > >
> > > For the draft, we would have a wiki subarea "Engine room", which would
> be
> > > the default location for such drafts. Pages in the wiki would allow
> for a
> > > gradual polishing and may even live comparably long if the author does
> > not
> > > find the time for polishing. The information is in a semi-published
> > state,
> > > where devs and experts can already find and use it, but it would not
> > > attract as many views as in a blog.
> > >
> > > But I'd explicitly also allow drafts to go directly to a PR (with risk
> of
> > > having many iterations). I'd even say that if someone feels more
> > > comfortable to online editors such as google docs and has enough
> > reviewers
> > > for that, they could go with it. Here, the author needs to ensure a
> > timely
> > > progress or revert to the wiki, since all intermediate versions are
> > > effectively hidden for non-reviewers.
> > >
> > > Would the community agree with this approach or do you have concerns?
> If
> > no
> > > major concerns are raised, I'd start preparation with the wiki on
> Monday
> > > (03/09/2020).
> > >
> > > I'd raise the issue about wiki and blog structure, when we got some
> > > articles to avoid too many concurrent discussions.
> > >
> > >
> > > On Wed, Mar 4, 2020 at 5:54 PM Zhijiang  > > .invalid>
> > > wrote:
> > >
> > > > Big +1 for this proposal and second Ufuk's feeling!
> > > >
> > > > I guess "Engine room" section in Wiki would attract lots of technical
> > > > fans.:)
> > > >
> > > > Best,
> > > > Zhijiang
> > > >
> > > >
> > > > --
> > > > From:Yu Li 
> > > > Send Time:2020 Mar. 4 (Wed.) 14:42
> > > > To:dev 
> > > > Cc:vthinkxie 
> > > > Subject:Re: Flink dev blog
> > > >
> > > > Big +1 on adding a dev blog and starting with wiki. And +1 to promote
> > the
> > > > fully polished articles to blog web with a formal process.
> > > >
> > > > The latter one also brings up another good-to-have improvement that
> > > adding
> > > > categories and navigation in our blog so people could easily find
> > > different
> > > > topics like release-announcement/events/tech-articles, etc. but I
> think
> > > > we'd better open another thread to keep this one on track (smile).
> > > >
> > > > I'd also like to add one potential topic around in-production
> practice
> > of
> > > > using RocksDB state backend (which seems to be a popular topic in ML
> > > > discussions), such as how to enable and monitor RocksDB metrics and
> do
> > > > debugging/perf-tuning with the metrics/logs, and introduce
> > > > internals/details around the RocksDB memory management mechanism.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Wed, 4 Mar 2020 at 11:07, Xintong Song 
> > wrote:
> > > >
> > > > > I also like Ufuk's idea.
> > > > >
> > > > > The wiki allows people to post on their works in a quick and easier
> > > way.
> > > > > For me and probably many other Chinese folks, writing and
> polishing a
> > > > > formal article in English usually takes a long time, of which a
> > > > significant
> > > > > portion is spent on polishing the language. If the blog does not
> > > require
> > > > > such formal and high quality languages, I believe it will make
> > things a
> > > > lot
> > > > > easier and encourage more people to share their ideas. Besides, it
> > also
> > > > > avoids putting more review workloads on committers.
> > > > >
> > > > > Regarding promoting wiki post to the main blog, I think the wiki
> > > > feedbacks
> > > > > (comment, likes, etc.) could be a great input. We can also contact
> > the
> > > > > original author before promoting posts to the main blo

Re: SerializableHadoopConfiguration

2020-03-05 Thread Stephan Ewen
Do we have more cases of "common Hadoop Utils"?

If yes, does it make sense to create a "flink-hadoop-utils" module with
exactly such classes? It would have an optional dependency on
"flink-shaded-hadoop".

On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann  wrote:

> Hi Sivaprasanna,
>
> we don't upload the source jars for the flink-shaded modules. However you
> can build them yourself and install by cloning the flink-shaded repository
> [1] and then call `mvn package -Dshade-sources`.
>
> [1] https://github.com/apache/flink-shaded
>
> Cheers,
> Till
>
> On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna 
> wrote:
>
> > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if any
> Flink
> > module is going to use Hadoop in any way, it will most probably include
> > flink-shaded-hadoop-2 as a dependency.
> > However, flink-shaded modules don't have any source files. Is that a
> strict
> > convention that the community follows?
> >
> > -
> > Sivaprasanna
> >
> > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna 
> > wrote:
> >
> > > Hi Arvid,
> > >
> > > Thanks for the quick reply. Yes, it actually makes sense to avoid
> Hadoop
> > > dependencies from getting into Flink's core modules but I also wonder
> if
> > it
> > > will be an overkill to add flink-hadoop-fs as a dependency just because
> > we
> > > want to use a utility class from that module.
> > >
> > > -
> > > Sivaprasanna
> > >
> > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise 
> wrote:
> > >
> > >> Hi Sivaprasanna,
> > >>
> > >> we actually want to remove Hadoop from all core modules, so we could
> not
> > >> place it in some very common place like flink-core.
> > >>
> > >> But I think the module flink-hadoop-fs could be a fitting place.
> > >>
> > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
> sivaprasanna...@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > Hi
> > >> >
> > >> > The flink-sequence-file module has a class named
> > >> > SerializableHadoopConfiguration[1] which is nothing but a wrapper
> > class
> > >> for
> > >> > Hadoop Configuration. I believe this class can be moved to a common
> > >> module
> > >> > since this is not necessarily tightly coupled with sequence-file
> > module,
> > >> > and also because it can be used by many other modules, for ex.
> > >> > flink-compress. Thoughts?
> > >> >
> > >> > -
> > >> > Sivaprasanna
> > >> >
> > >>
> > >
> >
>


Re: Flink dev blog

2020-03-05 Thread Dian Fu
+1 to Arvid's proposal

> 在 2020年3月5日,下午6:49,Jark Wu  写道:
> 
> +1 to Arvid's proposal.
> 
> On Thu, 5 Mar 2020 at 18:13, Robert Metzger  wrote:
> 
>> +1 to Arvid's proposal.
>> 
>> 
>> 
>> On Thu, Mar 5, 2020 at 4:14 AM Xingbo Huang  wrote:
>> 
>>> Thanks a for this proposal.
>>> 
>>> As a new contributor to Flink, it would be very helpful to have such
>> blogs
>>> for us to understand the future of Flink and get involved
>>> 
>>> BTW, I have a question whether the dev blog needs a template like FLIP.
>>> 
>>> Of course, There is no doubt that dev blogs do not need to be as formal
>> as
>>> FLIP, but templates can be more helpful for developers to understand
>>> articles.
>>> 
>>> Best,
>>> 
>>> Xingbo
>>> 
>>> 
>>> Arvid Heise  于2020年3月5日周四 上午2:55写道:
>>> 
 I see that the majority would like to have an uncomplicated process to
 publish an article first to gather feedback and then like to have
>>> polished
 versions on the blog with official review process.
 
 Then, the obvious solution is to have a process that is two-fold:
 * First a draft is published and reviewed by peers. The draft could be
 polished in smaller increments including proof-reading by native-level
 writers.
 * Second, when the draft converged enough, we would then make an
>> official
 pull request for the dev blog, which would (hopefully) be merged rather
 quickly.
 
 For the draft, we would have a wiki subarea "Engine room", which would
>> be
 the default location for such drafts. Pages in the wiki would allow
>> for a
 gradual polishing and may even live comparably long if the author does
>>> not
 find the time for polishing. The information is in a semi-published
>>> state,
 where devs and experts can already find and use it, but it would not
 attract as many views as in a blog.
 
 But I'd explicitly also allow drafts to go directly to a PR (with risk
>> of
 having many iterations). I'd even say that if someone feels more
 comfortable to online editors such as google docs and has enough
>>> reviewers
 for that, they could go with it. Here, the author needs to ensure a
>>> timely
 progress or revert to the wiki, since all intermediate versions are
 effectively hidden for non-reviewers.
 
 Would the community agree with this approach or do you have concerns?
>> If
>>> no
 major concerns are raised, I'd start preparation with the wiki on
>> Monday
 (03/09/2020).
 
 I'd raise the issue about wiki and blog structure, when we got some
 articles to avoid too many concurrent discussions.
 
 
 On Wed, Mar 4, 2020 at 5:54 PM Zhijiang >>> .invalid>
 wrote:
 
> Big +1 for this proposal and second Ufuk's feeling!
> 
> I guess "Engine room" section in Wiki would attract lots of technical
> fans.:)
> 
> Best,
> Zhijiang
> 
> 
> --
> From:Yu Li 
> Send Time:2020 Mar. 4 (Wed.) 14:42
> To:dev 
> Cc:vthinkxie 
> Subject:Re: Flink dev blog
> 
> Big +1 on adding a dev blog and starting with wiki. And +1 to promote
>>> the
> fully polished articles to blog web with a formal process.
> 
> The latter one also brings up another good-to-have improvement that
 adding
> categories and navigation in our blog so people could easily find
 different
> topics like release-announcement/events/tech-articles, etc. but I
>> think
> we'd better open another thread to keep this one on track (smile).
> 
> I'd also like to add one potential topic around in-production
>> practice
>>> of
> using RocksDB state backend (which seems to be a popular topic in ML
> discussions), such as how to enable and monitor RocksDB metrics and
>> do
> debugging/perf-tuning with the metrics/logs, and introduce
> internals/details around the RocksDB memory management mechanism.
> 
> Best Regards,
> Yu
> 
> 
> On Wed, 4 Mar 2020 at 11:07, Xintong Song 
>>> wrote:
> 
>> I also like Ufuk's idea.
>> 
>> The wiki allows people to post on their works in a quick and easier
 way.
>> For me and probably many other Chinese folks, writing and
>> polishing a
>> formal article in English usually takes a long time, of which a
> significant
>> portion is spent on polishing the language. If the blog does not
 require
>> such formal and high quality languages, I believe it will make
>>> things a
> lot
>> easier and encourage more people to share their ideas. Besides, it
>>> also
>> avoids putting more review workloads on committers.
>> 
>> Regarding promoting wiki post to the main blog, I think the wiki
> feedbacks
>> (comment, likes, etc.) could be a great input. We can also contact
>>> the
>> original author before promoting posts to the main blog to refine
>> the
>> article (resp

Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Xingbo Huang
Hi Jark,

Thanks for bringing up this discussion. Good catch. Agree that we can
disable "Squash and merge"(also the other buttons) for now.

There is a guideline on how to do that in
https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests
.

Best,
Xingbo

Jark Wu  于2020年3月5日周四 下午6:42写道:

> Hi everyone,
>
> We just noticed that everytime a pull request gets merged with the "Squash
> and merge" button,
> GitHub drops the original authorship information and changes "authored" to
> whoever merged the PR.
>
> We found this happened in #11102 [1] and #11302 [2]. It seems that it is a
> long outstanding issue
> and GitHub is aware of it but doesn't make an attempt to fix it [3][4].
>
> Before this behavior, "authored" is the original author and  "committed" is
> the one who merged the PR,
> which was pretty good to record the contributor's contribution and the
> committed information.
>
> From the perspective of contributors, it’s really frustrated if their
> authorship information gets lost.
> Considering we don't know when GitHub will fix it, I propose to disable
> "Squash and merge" button
> (and also "Rebase and merge" button) before it is fixed.
>
> However, I'm not sure how to disable it. Can it be disabled by GitHub UI if
> who has administrator permission?
> Or .asf.yaml [5] is the right way?
>
> What do you think?
>
> Best,
> Jark
>
> [1]: https://github.com/apache/flink/pull/11102
> [2]: https://github.com/apache/flink/pull/11302
> [3]: https://github.com/chdsbd/kodiak/issues/300#issuecomment-595016815
> [4]: https://github.com/isaacs/github/issues/1750
> [5]:
>
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-Mergebuttons
>


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Jingsong Li
Thanks for deep investigation.

+1 to disable "Squash and merge" button now.
But I think this is a very serious problem, It affects too many GitHub
workers. Github should deal with it quickly?

Best,
Jingsong Lee

On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang  wrote:

> Hi Jark,
>
> Thanks for bringing up this discussion. Good catch. Agree that we can
> disable "Squash and merge"(also the other buttons) for now.
>
> There is a guideline on how to do that in
>
> https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests
> .
>
> Best,
> Xingbo
>
> Jark Wu  于2020年3月5日周四 下午6:42写道:
>
> > Hi everyone,
> >
> > We just noticed that everytime a pull request gets merged with the
> "Squash
> > and merge" button,
> > GitHub drops the original authorship information and changes "authored"
> to
> > whoever merged the PR.
> >
> > We found this happened in #11102 [1] and #11302 [2]. It seems that it is
> a
> > long outstanding issue
> > and GitHub is aware of it but doesn't make an attempt to fix it [3][4].
> >
> > Before this behavior, "authored" is the original author and  "committed"
> is
> > the one who merged the PR,
> > which was pretty good to record the contributor's contribution and the
> > committed information.
> >
> > From the perspective of contributors, it’s really frustrated if their
> > authorship information gets lost.
> > Considering we don't know when GitHub will fix it, I propose to disable
> > "Squash and merge" button
> > (and also "Rebase and merge" button) before it is fixed.
> >
> > However, I'm not sure how to disable it. Can it be disabled by GitHub UI
> if
> > who has administrator permission?
> > Or .asf.yaml [5] is the right way?
> >
> > What do you think?
> >
> > Best,
> > Jark
> >
> > [1]: https://github.com/apache/flink/pull/11102
> > [2]: https://github.com/apache/flink/pull/11302
> > [3]: https://github.com/chdsbd/kodiak/issues/300#issuecomment-595016815
> > [4]: https://github.com/isaacs/github/issues/1750
> > [5]:
> >
> >
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-Mergebuttons
> >
>


-- 
Best, Jingsong Lee


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Yadong Xie
Hi Jark
There is a conversation about this here:
https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797
I think GitHub will fix it soon, it is a bug, not a feature :).

Jingsong Li  于2020年3月5日周四 下午8:32写道:

> Thanks for deep investigation.
>
> +1 to disable "Squash and merge" button now.
> But I think this is a very serious problem, It affects too many GitHub
> workers. Github should deal with it quickly?
>
> Best,
> Jingsong Lee
>
> On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang  wrote:
>
> > Hi Jark,
> >
> > Thanks for bringing up this discussion. Good catch. Agree that we can
> > disable "Squash and merge"(also the other buttons) for now.
> >
> > There is a guideline on how to do that in
> >
> >
> https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests
> > .
> >
> > Best,
> > Xingbo
> >
> > Jark Wu  于2020年3月5日周四 下午6:42写道:
> >
> > > Hi everyone,
> > >
> > > We just noticed that everytime a pull request gets merged with the
> > "Squash
> > > and merge" button,
> > > GitHub drops the original authorship information and changes "authored"
> > to
> > > whoever merged the PR.
> > >
> > > We found this happened in #11102 [1] and #11302 [2]. It seems that it
> is
> > a
> > > long outstanding issue
> > > and GitHub is aware of it but doesn't make an attempt to fix it [3][4].
> > >
> > > Before this behavior, "authored" is the original author and
> "committed"
> > is
> > > the one who merged the PR,
> > > which was pretty good to record the contributor's contribution and the
> > > committed information.
> > >
> > > From the perspective of contributors, it’s really frustrated if their
> > > authorship information gets lost.
> > > Considering we don't know when GitHub will fix it, I propose to disable
> > > "Squash and merge" button
> > > (and also "Rebase and merge" button) before it is fixed.
> > >
> > > However, I'm not sure how to disable it. Can it be disabled by GitHub
> UI
> > if
> > > who has administrator permission?
> > > Or .asf.yaml [5] is the right way?
> > >
> > > What do you think?
> > >
> > > Best,
> > > Jark
> > >
> > > [1]: https://github.com/apache/flink/pull/11102
> > > [2]: https://github.com/apache/flink/pull/11302
> > > [3]:
> https://github.com/chdsbd/kodiak/issues/300#issuecomment-595016815
> > > [4]: https://github.com/isaacs/github/issues/1750
> > > [5]:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-Mergebuttons
> > >
> >
>
>
> --
> Best, Jingsong Lee
>


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Stephan Ewen
Big +1 to disable it.

I have never been a fan, it has always caused problems:
  - Merge commits
  - weird alias emails
  - lost author information
  - commit message misses the "This closes #" line to track back
commits to PRs/reviews.

The button goes against best practice, it should go away.

Best,
Stephan


On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie  wrote:

> Hi Jark
> There is a conversation about this here:
>
> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797
> I think GitHub will fix it soon, it is a bug, not a feature :).
>
> Jingsong Li  于2020年3月5日周四 下午8:32写道:
>
> > Thanks for deep investigation.
> >
> > +1 to disable "Squash and merge" button now.
> > But I think this is a very serious problem, It affects too many GitHub
> > workers. Github should deal with it quickly?
> >
> > Best,
> > Jingsong Lee
> >
> > On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang  wrote:
> >
> > > Hi Jark,
> > >
> > > Thanks for bringing up this discussion. Good catch. Agree that we can
> > > disable "Squash and merge"(also the other buttons) for now.
> > >
> > > There is a guideline on how to do that in
> > >
> > >
> >
> https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests
> > > .
> > >
> > > Best,
> > > Xingbo
> > >
> > > Jark Wu  于2020年3月5日周四 下午6:42写道:
> > >
> > > > Hi everyone,
> > > >
> > > > We just noticed that everytime a pull request gets merged with the
> > > "Squash
> > > > and merge" button,
> > > > GitHub drops the original authorship information and changes
> "authored"
> > > to
> > > > whoever merged the PR.
> > > >
> > > > We found this happened in #11102 [1] and #11302 [2]. It seems that it
> > is
> > > a
> > > > long outstanding issue
> > > > and GitHub is aware of it but doesn't make an attempt to fix it
> [3][4].
> > > >
> > > > Before this behavior, "authored" is the original author and
> > "committed"
> > > is
> > > > the one who merged the PR,
> > > > which was pretty good to record the contributor's contribution and
> the
> > > > committed information.
> > > >
> > > > From the perspective of contributors, it’s really frustrated if their
> > > > authorship information gets lost.
> > > > Considering we don't know when GitHub will fix it, I propose to
> disable
> > > > "Squash and merge" button
> > > > (and also "Rebase and merge" button) before it is fixed.
> > > >
> > > > However, I'm not sure how to disable it. Can it be disabled by GitHub
> > UI
> > > if
> > > > who has administrator permission?
> > > > Or .asf.yaml [5] is the right way?
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [1]: https://github.com/apache/flink/pull/11102
> > > > [2]: https://github.com/apache/flink/pull/11302
> > > > [3]:
> > https://github.com/chdsbd/kodiak/issues/300#issuecomment-595016815
> > > > [4]: https://github.com/isaacs/github/issues/1750
> > > > [5]:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-Mergebuttons
> > > >
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >
>


Re: SerializableHadoopConfiguration

2020-03-05 Thread Sivaprasanna
Hi Stephen,

I guess it is a valid point to have something like 'flink-hadoop-utils'.
Maybe a [DISCUSS] thread can be started to understand what the community
thinks?

On Thu, Mar 5, 2020 at 4:22 PM Stephan Ewen  wrote:

> Do we have more cases of "common Hadoop Utils"?
>
> If yes, does it make sense to create a "flink-hadoop-utils" module with
> exactly such classes? It would have an optional dependency on
> "flink-shaded-hadoop".
>
> On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann  wrote:
>
> > Hi Sivaprasanna,
> >
> > we don't upload the source jars for the flink-shaded modules. However you
> > can build them yourself and install by cloning the flink-shaded
> repository
> > [1] and then call `mvn package -Dshade-sources`.
> >
> > [1] https://github.com/apache/flink-shaded
> >
> > Cheers,
> > Till
> >
> > On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna 
> > wrote:
> >
> > > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if any
> > Flink
> > > module is going to use Hadoop in any way, it will most probably include
> > > flink-shaded-hadoop-2 as a dependency.
> > > However, flink-shaded modules don't have any source files. Is that a
> > strict
> > > convention that the community follows?
> > >
> > > -
> > > Sivaprasanna
> > >
> > > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <
> sivaprasanna...@gmail.com>
> > > wrote:
> > >
> > > > Hi Arvid,
> > > >
> > > > Thanks for the quick reply. Yes, it actually makes sense to avoid
> > Hadoop
> > > > dependencies from getting into Flink's core modules but I also wonder
> > if
> > > it
> > > > will be an overkill to add flink-hadoop-fs as a dependency just
> because
> > > we
> > > > want to use a utility class from that module.
> > > >
> > > > -
> > > > Sivaprasanna
> > > >
> > > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise 
> > wrote:
> > > >
> > > >> Hi Sivaprasanna,
> > > >>
> > > >> we actually want to remove Hadoop from all core modules, so we could
> > not
> > > >> place it in some very common place like flink-core.
> > > >>
> > > >> But I think the module flink-hadoop-fs could be a fitting place.
> > > >>
> > > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
> > sivaprasanna...@gmail.com
> > > >
> > > >> wrote:
> > > >>
> > > >> > Hi
> > > >> >
> > > >> > The flink-sequence-file module has a class named
> > > >> > SerializableHadoopConfiguration[1] which is nothing but a wrapper
> > > class
> > > >> for
> > > >> > Hadoop Configuration. I believe this class can be moved to a
> common
> > > >> module
> > > >> > since this is not necessarily tightly coupled with sequence-file
> > > module,
> > > >> > and also because it can be used by many other modules, for ex.
> > > >> > flink-compress. Thoughts?
> > > >> >
> > > >> > -
> > > >> > Sivaprasanna
> > > >> >
> > > >>
> > > >
> > >
> >
>


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread tison
For implement it, file a JIRA ticket in INFRA [1]

Best,
tison.
[1] https://issues.apache.org/jira/projects/INFRA


Stephan Ewen  于2020年3月5日周四 下午8:57写道:

> Big +1 to disable it.
>
> I have never been a fan, it has always caused problems:
>   - Merge commits
>   - weird alias emails
>   - lost author information
>   - commit message misses the "This closes #" line to track back
> commits to PRs/reviews.
>
> The button goes against best practice, it should go away.
>
> Best,
> Stephan
>
>
> On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie  wrote:
>
> > Hi Jark
> > There is a conversation about this here:
> >
> >
> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797
> > I think GitHub will fix it soon, it is a bug, not a feature :).
> >
> > Jingsong Li  于2020年3月5日周四 下午8:32写道:
> >
> > > Thanks for deep investigation.
> > >
> > > +1 to disable "Squash and merge" button now.
> > > But I think this is a very serious problem, It affects too many GitHub
> > > workers. Github should deal with it quickly?
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang 
> wrote:
> > >
> > > > Hi Jark,
> > > >
> > > > Thanks for bringing up this discussion. Good catch. Agree that we can
> > > > disable "Squash and merge"(also the other buttons) for now.
> > > >
> > > > There is a guideline on how to do that in
> > > >
> > > >
> > >
> >
> https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests
> > > > .
> > > >
> > > > Best,
> > > > Xingbo
> > > >
> > > > Jark Wu  于2020年3月5日周四 下午6:42写道:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > We just noticed that everytime a pull request gets merged with the
> > > > "Squash
> > > > > and merge" button,
> > > > > GitHub drops the original authorship information and changes
> > "authored"
> > > > to
> > > > > whoever merged the PR.
> > > > >
> > > > > We found this happened in #11102 [1] and #11302 [2]. It seems that
> it
> > > is
> > > > a
> > > > > long outstanding issue
> > > > > and GitHub is aware of it but doesn't make an attempt to fix it
> > [3][4].
> > > > >
> > > > > Before this behavior, "authored" is the original author and
> > > "committed"
> > > > is
> > > > > the one who merged the PR,
> > > > > which was pretty good to record the contributor's contribution and
> > the
> > > > > committed information.
> > > > >
> > > > > From the perspective of contributors, it’s really frustrated if
> their
> > > > > authorship information gets lost.
> > > > > Considering we don't know when GitHub will fix it, I propose to
> > disable
> > > > > "Squash and merge" button
> > > > > (and also "Rebase and merge" button) before it is fixed.
> > > > >
> > > > > However, I'm not sure how to disable it. Can it be disabled by
> GitHub
> > > UI
> > > > if
> > > > > who has administrator permission?
> > > > > Or .asf.yaml [5] is the right way?
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > [1]: https://github.com/apache/flink/pull/11102
> > > > > [2]: https://github.com/apache/flink/pull/11302
> > > > > [3]:
> > > https://github.com/chdsbd/kodiak/issues/300#issuecomment-595016815
> > > > > [4]: https://github.com/isaacs/github/issues/1750
> > > > > [5]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-Mergebuttons
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> >
>


Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-05 Thread Wei Zhong
Hi Dawid,

Thanks for your suggestion. 

After some investigation, there are two designs in my mind about how to defer 
the instantiation of temporary system function and temporary catalog function 
to compile time.

1. FunctionCatalog accepts both FunctionDefinitions and uninstantiated 
temporary functions. The uninstantiated temporary functions will be 
instantiated when compiling. There is no public API change in this design, but 
the FunctionCatalog needs to store and process both FunctionDefinitions and 
uninstantiated temporary functions.

2. FunctionCatalog accepts only uninstantiated temporary functions. In this 
design we need to remove those APIs that accepts FunctionDefinitions from 
TableEnvironment, i.e. `void createTemporaryFunction(String path, 
UserDefinedFunction functionInstance)` and `void 
createTemporarySystemFunction(String name, UserDefinedFunction 
functionInstance)`. But the FunctionCatalog only needs to store and process 
uninstantiated temporary functions.

As I don't know the details about the plan to store temporary functions as 
catalog functions instead of FunctionDefinitions, I'm not sure which solution 
fits more. It would be great if you could share more details or share some 
thoughts on these two solutions?

Best,
Wei

> 在 2020年3月4日,16:17,Dawid Wysakowicz  写道:
> 
> Hi all,
> I had a really quick look and from my perspective the proposal looks fine.
> I share Jarks opinion that the instantiation could be done at a later
> stage. I agree with Wei it requires some changes in the internal
> implementation of the FunctionCatalog, to store temporary functions as
> catalog functions instead of FunctionDefinitions, but we have that on our
> agenda anyway. I would suggest investigating if we could do that as part of
> this flip already. Nevertheless this in theory can be also done later.
> 
> Best,
> Dawid
> 
> On Mon, 2 Mar 2020, 14:58 Jark Wu,  wrote:
> 
>> Thanks for the explanation, Wei!
>> 
>> On Mon, 2 Mar 2020 at 20:59, Wei Zhong  wrote:
>> 
>>> Hi Jark,
>>> 
>>> Thanks for your suggestion.
>>> 
>>> Actually, the timing of starting a Python process depends on the UDF
>> type,
>>> because the Python process is used to provide the necessary information
>> to
>>> instantiate the FunctionDefinition object of the Python UDF. For catalog
>>> function, the FunctionDefinition will be instantiated when compiling the
>>> job, which means the Python process is required during the compilation
>>> instead of the registeration. For temporary system function and temporary
>>> catalog function, the FunctionDefinition will be instantiated during the
>>> UDF registeration, so the Python process need to be started at that time.
>>> 
>>> But this FLIP will only support registering the temporary system function
>>> and temporary catalog function in SQL DDL because registering Python UDF
>> to
>>> catalog is not supported yet. We plan to support the registeration of
>>> Python catalog function (via Table API and SQL DDL) in a separate FLIP.
>>> I'll add a non-goal section to the FLIP page to illustrate this.
>>> 
>>> Best,
>>> Wei
>>> 
>>> 
 在 2020年3月2日,15:11,Jark Wu  写道:
 
 Hi Weizhong,
 
 Thanks for proposing this feature. In geneal, I'm +1 from the table's
>>> view.
 
 I have one suggestion: I think the register python function into
>> catalog
 doesn't need to startup python process (the "High Level Sequence
>> Diagram"
 in your FLIP).
 Because only meta-information is persisted into catalog, we don't need
>> to
 store "return type", "input types" into catalog.
 I guess the python process is required when compiling a SQL job.
 
 Best,
 Jark
 
 
 
 On Fri, 28 Feb 2020 at 19:04, Benchao Li  wrote:
 
> Big +1 for this feature.
> 
> We built our SQL platform on Java Table API, and most common UDF are
> implemented in Java. However some python developers are not familiar
>>> with
> Java/Scala, and it's very inconvenient for these users to use UDF in
>>> SQL.
> 
> Wei Zhong  于2020年2月28日周五 下午6:58写道:
> 
>> Thank for your reply Dan!
>> 
>> By the way, this FLIP is closely related to the SQL API.  @Jark Wu <
>> imj...@gmail.com> @Timo  could you please take a
>> look?
>> 
>> Thanks,
>> Wei
>> 
>>> 在 2020年2月25日,16:25,zoudan  写道:
>>> 
>>> +1 for supporting Python UDF in Java/Scala Table API.
>>> This is a great feature and would be helpful for python users!
>>> 
>>> Best,
>>> Dan Zou
>>> 
>>> 
>> 
>> 
> 
> --
> 
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking
>>> University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
> 
> 
>>> 
>>> 
>> 



Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Yadong Xie
Hi Jark

I think GitHub UI can not disable both the "Squash and merge" button and
"Rebase and merge" at the same time if there exists any protected branch in
the repository(according to github rules).

If we only left "merge and commits" button, it will against requiring a
linear commit history rules here
https://help.github.com/en/github/administering-a-repository/requiring-a-linear-commit-history

tison  于2020年3月5日周四 下午9:04写道:

> For implement it, file a JIRA ticket in INFRA [1]
>
> Best,
> tison.
> [1] https://issues.apache.org/jira/projects/INFRA
>
>
> Stephan Ewen  于2020年3月5日周四 下午8:57写道:
>
> > Big +1 to disable it.
> >
> > I have never been a fan, it has always caused problems:
> >   - Merge commits
> >   - weird alias emails
> >   - lost author information
> >   - commit message misses the "This closes #" line to track back
> > commits to PRs/reviews.
> >
> > The button goes against best practice, it should go away.
> >
> > Best,
> > Stephan
> >
> >
> > On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie  wrote:
> >
> > > Hi Jark
> > > There is a conversation about this here:
> > >
> > >
> >
> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797
> > > I think GitHub will fix it soon, it is a bug, not a feature :).
> > >
> > > Jingsong Li  于2020年3月5日周四 下午8:32写道:
> > >
> > > > Thanks for deep investigation.
> > > >
> > > > +1 to disable "Squash and merge" button now.
> > > > But I think this is a very serious problem, It affects too many
> GitHub
> > > > workers. Github should deal with it quickly?
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang 
> > wrote:
> > > >
> > > > > Hi Jark,
> > > > >
> > > > > Thanks for bringing up this discussion. Good catch. Agree that we
> can
> > > > > disable "Squash and merge"(also the other buttons) for now.
> > > > >
> > > > > There is a guideline on how to do that in
> > > > >
> > > > >
> > > >
> > >
> >
> https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests
> > > > > .
> > > > >
> > > > > Best,
> > > > > Xingbo
> > > > >
> > > > > Jark Wu  于2020年3月5日周四 下午6:42写道:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > We just noticed that everytime a pull request gets merged with
> the
> > > > > "Squash
> > > > > > and merge" button,
> > > > > > GitHub drops the original authorship information and changes
> > > "authored"
> > > > > to
> > > > > > whoever merged the PR.
> > > > > >
> > > > > > We found this happened in #11102 [1] and #11302 [2]. It seems
> that
> > it
> > > > is
> > > > > a
> > > > > > long outstanding issue
> > > > > > and GitHub is aware of it but doesn't make an attempt to fix it
> > > [3][4].
> > > > > >
> > > > > > Before this behavior, "authored" is the original author and
> > > > "committed"
> > > > > is
> > > > > > the one who merged the PR,
> > > > > > which was pretty good to record the contributor's contribution
> and
> > > the
> > > > > > committed information.
> > > > > >
> > > > > > From the perspective of contributors, it’s really frustrated if
> > their
> > > > > > authorship information gets lost.
> > > > > > Considering we don't know when GitHub will fix it, I propose to
> > > disable
> > > > > > "Squash and merge" button
> > > > > > (and also "Rebase and merge" button) before it is fixed.
> > > > > >
> > > > > > However, I'm not sure how to disable it. Can it be disabled by
> > GitHub
> > > > UI
> > > > > if
> > > > > > who has administrator permission?
> > > > > > Or .asf.yaml [5] is the right way?
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > [1]: https://github.com/apache/flink/pull/11102
> > > > > > [2]: https://github.com/apache/flink/pull/11302
> > > > > > [3]:
> > > > https://github.com/chdsbd/kodiak/issues/300#issuecomment-595016815
> > > > > > [4]: https://github.com/isaacs/github/issues/1750
> > > > > > [5]:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-Mergebuttons
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best, Jingsong Lee
> > > >
> > >
> >
>


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread tison
Hi Yadong,

Maybe we firstly reach out INFRA team and see the reply from their side.

Since the actual operator is INFRA team, in the dev mailing list we can
focus on motivation and
wait for the reply.

Best,
tison.


Yadong Xie  于2020年3月5日周四 下午9:29写道:

> Hi Jark
>
> I think GitHub UI can not disable both the "Squash and merge" button and
> "Rebase and merge" at the same time if there exists any protected branch in
> the repository(according to github rules).
>
> If we only left "merge and commits" button, it will against requiring a
> linear commit history rules here
>
> https://help.github.com/en/github/administering-a-repository/requiring-a-linear-commit-history
>
> tison  于2020年3月5日周四 下午9:04写道:
>
> > For implement it, file a JIRA ticket in INFRA [1]
> >
> > Best,
> > tison.
> > [1] https://issues.apache.org/jira/projects/INFRA
> >
> >
> > Stephan Ewen  于2020年3月5日周四 下午8:57写道:
> >
> > > Big +1 to disable it.
> > >
> > > I have never been a fan, it has always caused problems:
> > >   - Merge commits
> > >   - weird alias emails
> > >   - lost author information
> > >   - commit message misses the "This closes #" line to track back
> > > commits to PRs/reviews.
> > >
> > > The button goes against best practice, it should go away.
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie  wrote:
> > >
> > > > Hi Jark
> > > > There is a conversation about this here:
> > > >
> > > >
> > >
> >
> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797
> > > > I think GitHub will fix it soon, it is a bug, not a feature :).
> > > >
> > > > Jingsong Li  于2020年3月5日周四 下午8:32写道:
> > > >
> > > > > Thanks for deep investigation.
> > > > >
> > > > > +1 to disable "Squash and merge" button now.
> > > > > But I think this is a very serious problem, It affects too many
> > GitHub
> > > > > workers. Github should deal with it quickly?
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > > > On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang 
> > > wrote:
> > > > >
> > > > > > Hi Jark,
> > > > > >
> > > > > > Thanks for bringing up this discussion. Good catch. Agree that we
> > can
> > > > > > disable "Squash and merge"(also the other buttons) for now.
> > > > > >
> > > > > > There is a guideline on how to do that in
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests
> > > > > > .
> > > > > >
> > > > > > Best,
> > > > > > Xingbo
> > > > > >
> > > > > > Jark Wu  于2020年3月5日周四 下午6:42写道:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > We just noticed that everytime a pull request gets merged with
> > the
> > > > > > "Squash
> > > > > > > and merge" button,
> > > > > > > GitHub drops the original authorship information and changes
> > > > "authored"
> > > > > > to
> > > > > > > whoever merged the PR.
> > > > > > >
> > > > > > > We found this happened in #11102 [1] and #11302 [2]. It seems
> > that
> > > it
> > > > > is
> > > > > > a
> > > > > > > long outstanding issue
> > > > > > > and GitHub is aware of it but doesn't make an attempt to fix it
> > > > [3][4].
> > > > > > >
> > > > > > > Before this behavior, "authored" is the original author and
> > > > > "committed"
> > > > > > is
> > > > > > > the one who merged the PR,
> > > > > > > which was pretty good to record the contributor's contribution
> > and
> > > > the
> > > > > > > committed information.
> > > > > > >
> > > > > > > From the perspective of contributors, it’s really frustrated if
> > > their
> > > > > > > authorship information gets lost.
> > > > > > > Considering we don't know when GitHub will fix it, I propose to
> > > > disable
> > > > > > > "Squash and merge" button
> > > > > > > (and also "Rebase and merge" button) before it is fixed.
> > > > > > >
> > > > > > > However, I'm not sure how to disable it. Can it be disabled by
> > > GitHub
> > > > > UI
> > > > > > if
> > > > > > > who has administrator permission?
> > > > > > > Or .asf.yaml [5] is the right way?
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jark
> > > > > > >
> > > > > > > [1]: https://github.com/apache/flink/pull/11102
> > > > > > > [2]: https://github.com/apache/flink/pull/11302
> > > > > > > [3]:
> > > > > https://github.com/chdsbd/kodiak/issues/300#issuecomment-595016815
> > > > > > > [4]: https://github.com/isaacs/github/issues/1750
> > > > > > > [5]:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-Mergebuttons
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best, Jingsong Lee
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-16443) Fix wrong fix for user-code CheckpointExceptions

2020-03-05 Thread Stephan Ewen (Jira)
Stephan Ewen created FLINK-16443:


 Summary: Fix wrong fix for user-code CheckpointExceptions
 Key: FLINK-16443
 URL: https://issues.apache.org/jira/browse/FLINK-16443
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Reporter: Stephan Ewen
 Fix For: 1.11.0


The problem of having exceptions that are only in the user code classloader was 
fixed by proactively serializing them inside the {{CheckpointException}}. That 
means all consumers of  {{CheckpointException}} now need to be aware of that 
and unwrap the serializable exception.

I believe the right way to fix this would have been to use a 
SerializedException in the {{DeclineCheckpoint}} message instead, which would 
have localized the change to the actual problem: RPC transport.

I would suggest to revert https://github.com/apache/flink/pull/9742 and instead 
apply the above described change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16444) Count the read/write/seek/next latency of RocksDB as metrics

2020-03-05 Thread Yun Tang (Jira)
Yun Tang created FLINK-16444:


 Summary: Count the read/write/seek/next latency of RocksDB as 
metrics
 Key: FLINK-16444
 URL: https://issues.apache.org/jira/browse/FLINK-16444
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Reporter: Yun Tang
 Fix For: 1.11.0


Currently, user cannot know the read/write/seek/next latency of RocksDB, we 
could add these helpful metrics to know the overall state performance. To not 
affect the action performance much, we could introduce counter to only record 
the latency at interval of some actions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [PROPOSAL] Reverse the dependency from flink-streaming-java to flink-client

2020-03-05 Thread Aljoscha Krettek

Hi,

thanks for starting the discussion, Tison!

I'd like to fix this dependency mess rather sooner than later, but we do 
have to consider the fact that we are breaking the dependency setup of 
users. If they they only had a dependency on flink-streaming-java before 
but used classes from flink-clients they would have to explicitly add 
this dependency now.


Let's see what others think.

Best,
Aljoscha

On 05.03.20 02:53, tison wrote:

Hi devs,

Here is a proposal to reverse the dependency from flink-streaming-java to
flink-client, for a proper
module dependency graph. Since it changes current structure, it should be
discussed publicly.

The original idea comes from that flink-streaming-java acts as an API only
module just as what
we do in its batch companion flink-java. If a Flink user want to write a
minimum DataStream
program, the only dependency should be flink-streaming java.

However, currently as it is implemented, flink-client and even
flink-runtime are transitively polluted
in when user depends on flink-streaming-java. These dependencies polluted
in as

flink-client:
   - previously, ClusterClient, which is removed by FLIP-73 Executors
   - accidentally, ProgramInvocationException, we just throw in place as it
is accessible.
   - transitively, flink-optimizer, for one utility.
   - transitively, flink-java, for several utilities.
flink-runtime:
   - mainly for JobGraph generating.

With a previous discussion with @Aljoscha Krettek  our
goal is briefly making flink-streaming-java
an API only module. As a first step we can break the dependency from
flink-streaming-java to
flink-client[1][2].

With this first step, continuously we factor out common utilities in
flink-java to
flink-core and eventually eliminate dependencies from streaming to batch;
while
orthogonally, we factor out job compilation logic into
flink-streaming-compiler module and
break the dependency to flink-runtime. The final dependency graph will be:


flink-client -> flink-streaming-compiler -> flink-runtime
  \->
flink-streaming-java

Looking forward to your feedback. Basically whether or not it is in a right
direction, and if so,
how the community integrates this proposal.

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-15090
[2] https://issues.apache.org/jira/browse/FLINK-16427



Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

2020-03-05 Thread Kostas Kloudas
Also from my side +1  to start voting.

Cheers,
Kostas

On Thu, Mar 5, 2020 at 7:45 AM tison  wrote:
>
> +1 to star voting.
>
> Best,
> tison.
>
>
> Yang Wang  于2020年3月5日周四 下午2:29写道:
>>
>> Hi Peter,
>> Really thanks for your response.
>>
>> Hi all @Kostas Kloudas @Zili Chen @Peter Huang @Rong Rong
>> It seems that we have reached an agreement. The “application mode” is 
>> regarded as the enhanced “per-job”. It is
>> orthogonal with “cluster deploy”. Currently, we bind the “per-job” to 
>> `run-user-main-on-client` and “application mode”
>> to `run-user-main-on-cluster`.
>>
>> Do you have other concerns to moving FLIP-85 to voting?
>>
>>
>> Best,
>> Yang
>>
>> Peter Huang  于2020年3月5日周四 下午12:48写道:
>>>
>>> Hi Yang and Kostas,
>>>
>>> Thanks for the clarification. It makes more sense to me if the long term 
>>> goal is to replace per job mode to application mode
>>>  in the future (at the time that multiple execute can be supported). Before 
>>> that, It will be better to keep the concept of
>>>  application mode internally. As Yang suggested, User only need to use a 
>>> `-R/-- remote-deploy` cli option to launch
>>> a per job cluster with the main function executed in cluster entry-point.  
>>> +1 for the execution plan.
>>>
>>>
>>>
>>> Best Regards
>>> Peter Huang
>>>
>>>
>>>
>>>
>>> On Tue, Mar 3, 2020 at 7:11 AM Yang Wang  wrote:

 Hi Peter,

 Having the application mode does not mean we will drop the cluster-deploy
 option. I just want to share some thoughts about “Application Mode”.


 1. The application mode could cover the per-job sematic. Its lifecyle is 
 bound
 to the user `main()`. And all the jobs in the user main will be executed 
 in a same
 Flink cluster. In first phase of FLIP-85 implementation, running user main 
 on the
 cluster side could be supported in application mode.

 2. Maybe in the future, we also need to support multiple `execute()` on 
 client side
 in a same Flink cluster. Then the per-job mode will evolve to application 
 mode.

 3. From user perspective, only a `-R/-- remote-deploy` cli option is 
 visible. They
 are not aware of the application mode.

 4. In the first phase, the application mode is working as “per-job”(only 
 one job in
 the user main). We just leave more potential for the future.


 I am not against with calling it “cluster deploy mode” if you all think it 
 is clearer for users.



 Best,
 Yang

 Kostas Kloudas  于2020年3月3日周二 下午6:49写道:
>
> Hi Peter,
>
> I understand your point. This is why I was also a bit torn about the
> name and my proposal was a bit aligned with yours (something along the
> lines of "cluster deploy" mode).
>
> But many of the other participants in the discussion suggested the
> "Application Mode". I think that the reasoning is that now the user's
> Application is more self-contained.
> It will be submitted to the cluster and the user can just disconnect.
> In addition, as discussed briefly in the doc, in the future there may
> be better support for multi-execute applications which will bring us
> one step closer to the true "Application Mode". But this is how I
> interpreted their arguments, of course they can also express their
> thoughts on the topic :)
>
> Cheers,
> Kostas
>
> On Mon, Mar 2, 2020 at 6:15 PM Peter Huang  
> wrote:
> >
> > Hi Kostas,
> >
> > Thanks for updating the wiki. We have aligned with the implementations 
> > in the doc. But I feel it is still a little bit confusing of the naming 
> > from a user's perspective. It is well known that Flink support per job 
> > cluster and session cluster. The concept is in the layer of how a job 
> > is managed within Flink. The method introduced util now is a kind of 
> > mixing job and session cluster to promising the implementation 
> > complexity. We probably don't need to label it as Application Model as 
> > the same layer of per job cluster and session cluster. Conceptually, I 
> > think it is still a cluster mode implementation for per job cluster.
> >
> > To minimize the confusion of users, I think it would be better just an 
> > option of per job cluster for each type of cluster manager. How do you 
> > think?
> >
> >
> > Best Regards
> > Peter Huang
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Mar 2, 2020 at 7:22 AM Kostas Kloudas  
> > wrote:
> >>
> >> Hi Yang,
> >>
> >> The difference between per-job and application mode is that, as you
> >> described, in the per-job mode the main is executed on the client
> >> while in the application mode, the main is executed on the cluster.
> >> I do not think we have to offer "application mode" with running the
> >> main on the client side as th

Re: [PROPOSAL] Reverse the dependency from flink-streaming-java to flink-client

2020-03-05 Thread Stephan Ewen
+1 to this fix, in general.

If the main issue is that users have to now add "flink-clients" explicitly,
then I think this is okay, if we spell it out prominently in the release
notes, and make sure quickstarts / etc are updated, and have a good error
message when client/runtime classes are not found.

On Thu, Mar 5, 2020 at 2:56 PM Aljoscha Krettek  wrote:

> Hi,
>
> thanks for starting the discussion, Tison!
>
> I'd like to fix this dependency mess rather sooner than later, but we do
> have to consider the fact that we are breaking the dependency setup of
> users. If they they only had a dependency on flink-streaming-java before
> but used classes from flink-clients they would have to explicitly add
> this dependency now.
>
> Let's see what others think.
>
> Best,
> Aljoscha
>
> On 05.03.20 02:53, tison wrote:
> > Hi devs,
> >
> > Here is a proposal to reverse the dependency from flink-streaming-java to
> > flink-client, for a proper
> > module dependency graph. Since it changes current structure, it should be
> > discussed publicly.
> >
> > The original idea comes from that flink-streaming-java acts as an API
> only
> > module just as what
> > we do in its batch companion flink-java. If a Flink user want to write a
> > minimum DataStream
> > program, the only dependency should be flink-streaming java.
> >
> > However, currently as it is implemented, flink-client and even
> > flink-runtime are transitively polluted
> > in when user depends on flink-streaming-java. These dependencies polluted
> > in as
> >
> > flink-client:
> >- previously, ClusterClient, which is removed by FLIP-73 Executors
> >- accidentally, ProgramInvocationException, we just throw in place as
> it
> > is accessible.
> >- transitively, flink-optimizer, for one utility.
> >- transitively, flink-java, for several utilities.
> > flink-runtime:
> >- mainly for JobGraph generating.
> >
> > With a previous discussion with @Aljoscha Krettek 
> our
> > goal is briefly making flink-streaming-java
> > an API only module. As a first step we can break the dependency from
> > flink-streaming-java to
> > flink-client[1][2].
> >
> > With this first step, continuously we factor out common utilities in
> > flink-java to
> > flink-core and eventually eliminate dependencies from streaming to batch;
> > while
> > orthogonally, we factor out job compilation logic into
> > flink-streaming-compiler module and
> > break the dependency to flink-runtime. The final dependency graph will
> be:
> >
> >
> > flink-client -> flink-streaming-compiler -> flink-runtime
> >   \->
> > flink-streaming-java
> >
> > Looking forward to your feedback. Basically whether or not it is in a
> right
> > direction, and if so,
> > how the community integrates this proposal.
> >
> > Best,
> > tison.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-15090
> > [2] https://issues.apache.org/jira/browse/FLINK-16427
> >
>


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Zhijiang
+1 for disabling "Squash and merge" if feasible to do that.

The possible benefit to use this button is for saving some efforts to squash 
some intermediate "[fixup]" commits during PR review.
But it would bring more potential problems as mentioned below, missing author 
information and message of "This closes #", etc. 
Even it might cause unexpected format of long commit content description if not 
handled carefully in the text box.

Best,
Zhijiang


--
From:tison 
Send Time:2020 Mar. 5 (Thu.) 21:34
To:dev 
Subject:Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on 
GitHub

Hi Yadong,

Maybe we firstly reach out INFRA team and see the reply from their side.

Since the actual operator is INFRA team, in the dev mailing list we can
focus on motivation and
wait for the reply.

Best,
tison.


Yadong Xie  于2020年3月5日周四 下午9:29写道:

> Hi Jark
>
> I think GitHub UI can not disable both the "Squash and merge" button and
> "Rebase and merge" at the same time if there exists any protected branch in
> the repository(according to github rules).
>
> If we only left "merge and commits" button, it will against requiring a
> linear commit history rules here
>
> https://help.github.com/en/github/administering-a-repository/requiring-a-linear-commit-history
>
> tison  于2020年3月5日周四 下午9:04写道:
>
> > For implement it, file a JIRA ticket in INFRA [1]
> >
> > Best,
> > tison.
> > [1] https://issues.apache.org/jira/projects/INFRA
> >
> >
> > Stephan Ewen  于2020年3月5日周四 下午8:57写道:
> >
> > > Big +1 to disable it.
> > >
> > > I have never been a fan, it has always caused problems:
> > >   - Merge commits
> > >   - weird alias emails
> > >   - lost author information
> > >   - commit message misses the "This closes #" line to track back
> > > commits to PRs/reviews.
> > >
> > > The button goes against best practice, it should go away.
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie  wrote:
> > >
> > > > Hi Jark
> > > > There is a conversation about this here:
> > > >
> > > >
> > >
> >
> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797
> > > > I think GitHub will fix it soon, it is a bug, not a feature :).
> > > >
> > > > Jingsong Li  于2020年3月5日周四 下午8:32写道:
> > > >
> > > > > Thanks for deep investigation.
> > > > >
> > > > > +1 to disable "Squash and merge" button now.
> > > > > But I think this is a very serious problem, It affects too many
> > GitHub
> > > > > workers. Github should deal with it quickly?
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > > > On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang 
> > > wrote:
> > > > >
> > > > > > Hi Jark,
> > > > > >
> > > > > > Thanks for bringing up this discussion. Good catch. Agree that we
> > can
> > > > > > disable "Squash and merge"(also the other buttons) for now.
> > > > > >
> > > > > > There is a guideline on how to do that in
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests
> > > > > > .
> > > > > >
> > > > > > Best,
> > > > > > Xingbo
> > > > > >
> > > > > > Jark Wu  于2020年3月5日周四 下午6:42写道:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > We just noticed that everytime a pull request gets merged with
> > the
> > > > > > "Squash
> > > > > > > and merge" button,
> > > > > > > GitHub drops the original authorship information and changes
> > > > "authored"
> > > > > > to
> > > > > > > whoever merged the PR.
> > > > > > >
> > > > > > > We found this happened in #11102 [1] and #11302 [2]. It seems
> > that
> > > it
> > > > > is
> > > > > > a
> > > > > > > long outstanding issue
> > > > > > > and GitHub is aware of it but doesn't make an attempt to fix it
> > > > [3][4].
> > > > > > >
> > > > > > > Before this behavior, "authored" is the original author and
> > > > > "committed"
> > > > > > is
> > > > > > > the one who merged the PR,
> > > > > > > which was pretty good to record the contributor's contribution
> > and
> > > > the
> > > > > > > committed information.
> > > > > > >
> > > > > > > From the perspective of contributors, it’s really frustrated if
> > > their
> > > > > > > authorship information gets lost.
> > > > > > > Considering we don't know when GitHub will fix it, I propose to
> > > > disable
> > > > > > > "Squash and merge" button
> > > > > > > (and also "Rebase and merge" button) before it is fixed.
> > > > > > >
> > > > > > > However, I'm not sure how to disable it. Can it be disabled by
> > > GitHub
> > > > > UI
> > > > > > if
> > > > > > > who has administrator permission?
> > > > > > > Or .asf.yaml [5] is the right way?
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jark
> > > > > > >
> > > > > > > [1]: https://github.com/apache/flink/pull/11102
> > > > > > > [2]: https://github.com/apache

Re: Flink dev blog

2020-03-05 Thread Zhijiang
Thanks for this proposal Arvid!
+1 and looking forward to the wiki structure and more following blogs.

Best,
Zhijiang


--
From:Dian Fu 
Send Time:2020 Mar. 5 (Thu.) 19:08
To:dev 
Subject:Re: Flink dev blog

+1 to Arvid's proposal

> 在 2020年3月5日,下午6:49,Jark Wu  写道:
> 
> +1 to Arvid's proposal.
> 
> On Thu, 5 Mar 2020 at 18:13, Robert Metzger  wrote:
> 
>> +1 to Arvid's proposal.
>> 
>> 
>> 
>> On Thu, Mar 5, 2020 at 4:14 AM Xingbo Huang  wrote:
>> 
>>> Thanks a for this proposal.
>>> 
>>> As a new contributor to Flink, it would be very helpful to have such
>> blogs
>>> for us to understand the future of Flink and get involved
>>> 
>>> BTW, I have a question whether the dev blog needs a template like FLIP.
>>> 
>>> Of course, There is no doubt that dev blogs do not need to be as formal
>> as
>>> FLIP, but templates can be more helpful for developers to understand
>>> articles.
>>> 
>>> Best,
>>> 
>>> Xingbo
>>> 
>>> 
>>> Arvid Heise  于2020年3月5日周四 上午2:55写道:
>>> 
 I see that the majority would like to have an uncomplicated process to
 publish an article first to gather feedback and then like to have
>>> polished
 versions on the blog with official review process.
 
 Then, the obvious solution is to have a process that is two-fold:
 * First a draft is published and reviewed by peers. The draft could be
 polished in smaller increments including proof-reading by native-level
 writers.
 * Second, when the draft converged enough, we would then make an
>> official
 pull request for the dev blog, which would (hopefully) be merged rather
 quickly.
 
 For the draft, we would have a wiki subarea "Engine room", which would
>> be
 the default location for such drafts. Pages in the wiki would allow
>> for a
 gradual polishing and may even live comparably long if the author does
>>> not
 find the time for polishing. The information is in a semi-published
>>> state,
 where devs and experts can already find and use it, but it would not
 attract as many views as in a blog.
 
 But I'd explicitly also allow drafts to go directly to a PR (with risk
>> of
 having many iterations). I'd even say that if someone feels more
 comfortable to online editors such as google docs and has enough
>>> reviewers
 for that, they could go with it. Here, the author needs to ensure a
>>> timely
 progress or revert to the wiki, since all intermediate versions are
 effectively hidden for non-reviewers.
 
 Would the community agree with this approach or do you have concerns?
>> If
>>> no
 major concerns are raised, I'd start preparation with the wiki on
>> Monday
 (03/09/2020).
 
 I'd raise the issue about wiki and blog structure, when we got some
 articles to avoid too many concurrent discussions.
 
 
 On Wed, Mar 4, 2020 at 5:54 PM Zhijiang >>> .invalid>
 wrote:
 
> Big +1 for this proposal and second Ufuk's feeling!
> 
> I guess "Engine room" section in Wiki would attract lots of technical
> fans.:)
> 
> Best,
> Zhijiang
> 
> 
> --
> From:Yu Li 
> Send Time:2020 Mar. 4 (Wed.) 14:42
> To:dev 
> Cc:vthinkxie 
> Subject:Re: Flink dev blog
> 
> Big +1 on adding a dev blog and starting with wiki. And +1 to promote
>>> the
> fully polished articles to blog web with a formal process.
> 
> The latter one also brings up another good-to-have improvement that
 adding
> categories and navigation in our blog so people could easily find
 different
> topics like release-announcement/events/tech-articles, etc. but I
>> think
> we'd better open another thread to keep this one on track (smile).
> 
> I'd also like to add one potential topic around in-production
>> practice
>>> of
> using RocksDB state backend (which seems to be a popular topic in ML
> discussions), such as how to enable and monitor RocksDB metrics and
>> do
> debugging/perf-tuning with the metrics/logs, and introduce
> internals/details around the RocksDB memory management mechanism.
> 
> Best Regards,
> Yu
> 
> 
> On Wed, 4 Mar 2020 at 11:07, Xintong Song 
>>> wrote:
> 
>> I also like Ufuk's idea.
>> 
>> The wiki allows people to post on their works in a quick and easier
 way.
>> For me and probably many other Chinese folks, writing and
>> polishing a
>> formal article in English usually takes a long time, of which a
> significant
>> portion is spent on polishing the language. If the blog does not
 require
>> such formal and high quality languages, I believe it will make
>>> things a
> lot
>> easier and encourage more people to share their ideas. Besides, it
>>> also
>> avoids putting more review workloads on committers.
>

Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Robert Metzger
+1 for disabling this feature for now.

Thanks a lot for spotting this!

On Thu, Mar 5, 2020 at 3:54 PM Zhijiang 
wrote:

> +1 for disabling "Squash and merge" if feasible to do that.
>
> The possible benefit to use this button is for saving some efforts to
> squash some intermediate "[fixup]" commits during PR review.
> But it would bring more potential problems as mentioned below, missing
> author information and message of "This closes #", etc.
> Even it might cause unexpected format of long commit content description
> if not handled carefully in the text box.
>
> Best,
> Zhijiang
>
>
> --
> From:tison 
> Send Time:2020 Mar. 5 (Thu.) 21:34
> To:dev 
> Subject:Re: [DISCUSS] Disable "Squash and merge" button for Flink
> repository on GitHub
>
> Hi Yadong,
>
> Maybe we firstly reach out INFRA team and see the reply from their side.
>
> Since the actual operator is INFRA team, in the dev mailing list we can
> focus on motivation and
> wait for the reply.
>
> Best,
> tison.
>
>
> Yadong Xie  于2020年3月5日周四 下午9:29写道:
>
> > Hi Jark
> >
> > I think GitHub UI can not disable both the "Squash and merge" button and
> > "Rebase and merge" at the same time if there exists any protected branch
> in
> > the repository(according to github rules).
> >
> > If we only left "merge and commits" button, it will against requiring a
> > linear commit history rules here
> >
> >
> https://help.github.com/en/github/administering-a-repository/requiring-a-linear-commit-history
> >
> > tison  于2020年3月5日周四 下午9:04写道:
> >
> > > For implement it, file a JIRA ticket in INFRA [1]
> > >
> > > Best,
> > > tison.
> > > [1] https://issues.apache.org/jira/projects/INFRA
> > >
> > >
> > > Stephan Ewen  于2020年3月5日周四 下午8:57写道:
> > >
> > > > Big +1 to disable it.
> > > >
> > > > I have never been a fan, it has always caused problems:
> > > >   - Merge commits
> > > >   - weird alias emails
> > > >   - lost author information
> > > >   - commit message misses the "This closes #" line to track back
> > > > commits to PRs/reviews.
> > > >
> > > > The button goes against best practice, it should go away.
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > > >
> > > > On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie 
> wrote:
> > > >
> > > > > Hi Jark
> > > > > There is a conversation about this here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797
> > > > > I think GitHub will fix it soon, it is a bug, not a feature :).
> > > > >
> > > > > Jingsong Li  于2020年3月5日周四 下午8:32写道:
> > > > >
> > > > > > Thanks for deep investigation.
> > > > > >
> > > > > > +1 to disable "Squash and merge" button now.
> > > > > > But I think this is a very serious problem, It affects too many
> > > GitHub
> > > > > > workers. Github should deal with it quickly?
> > > > > >
> > > > > > Best,
> > > > > > Jingsong Lee
> > > > > >
> > > > > > On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang 
> > > > wrote:
> > > > > >
> > > > > > > Hi Jark,
> > > > > > >
> > > > > > > Thanks for bringing up this discussion. Good catch. Agree that
> we
> > > can
> > > > > > > disable "Squash and merge"(also the other buttons) for now.
> > > > > > >
> > > > > > > There is a guideline on how to do that in
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests
> > > > > > > .
> > > > > > >
> > > > > > > Best,
> > > > > > > Xingbo
> > > > > > >
> > > > > > > Jark Wu  于2020年3月5日周四 下午6:42写道:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > We just noticed that everytime a pull request gets merged
> with
> > > the
> > > > > > > "Squash
> > > > > > > > and merge" button,
> > > > > > > > GitHub drops the original authorship information and changes
> > > > > "authored"
> > > > > > > to
> > > > > > > > whoever merged the PR.
> > > > > > > >
> > > > > > > > We found this happened in #11102 [1] and #11302 [2]. It seems
> > > that
> > > > it
> > > > > > is
> > > > > > > a
> > > > > > > > long outstanding issue
> > > > > > > > and GitHub is aware of it but doesn't make an attempt to fix
> it
> > > > > [3][4].
> > > > > > > >
> > > > > > > > Before this behavior, "authored" is the original author and
> > > > > > "committed"
> > > > > > > is
> > > > > > > > the one who merged the PR,
> > > > > > > > which was pretty good to record the contributor's
> contribution
> > > and
> > > > > the
> > > > > > > > committed information.
> > > > > > > >
> > > > > > > > From the perspective of contributors, it’s really frustrated
> if
> > > > their
> > > > > > > > authorship information gets lost.
> > > > > > > > Considering we don't know when GitHub will fix it, I propose
> to
> > > > > disable
> > > > > > > > "Squash and merge" button
> > > > > > > > (and also "Rebase and merge" button) before it is fixed.
> > > > > > >

Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Jark Wu
Hi all,

Thanks for the feedbacks. But I want to clarify the motivation to disable
"Squash and merge" is just because of the regression/bug of the missing
author information.
If GitHub fixes this later, I think it makes sense to bring this button
back.

Hi Stephan & Zhijiang,

To be honest, I love the "Squash and merge" button and often use it. It
saves me a lot of time to merge PRs, because pulling and pushing commits in
China is very unstable.

I don't think the potential problems you mentioned is a "problem".
For "Squash and merge",
 - "Merge commits": there is no "merge" commits, because GitHub will squash
commits and rebase the commit and then add to the master branch.
 - "This closes #" line to track back: when you click "Squash and
merge", it allows you to edit the title and description, so you can
add "This closes #" message to the description the same with in the
local git. Besides, GitHub automatically append "(#)" after the title,
which is also helpful to track.

Best,
Jark

On Thu, 5 Mar 2020 at 23:36, Robert Metzger  wrote:

> +1 for disabling this feature for now.
>
> Thanks a lot for spotting this!
>
> On Thu, Mar 5, 2020 at 3:54 PM Zhijiang  .invalid>
> wrote:
>
> > +1 for disabling "Squash and merge" if feasible to do that.
> >
> > The possible benefit to use this button is for saving some efforts to
> > squash some intermediate "[fixup]" commits during PR review.
> > But it would bring more potential problems as mentioned below, missing
> > author information and message of "This closes #", etc.
> > Even it might cause unexpected format of long commit content description
> > if not handled carefully in the text box.
> >
> > Best,
> > Zhijiang
> >
> >
> > --
> > From:tison 
> > Send Time:2020 Mar. 5 (Thu.) 21:34
> > To:dev 
> > Subject:Re: [DISCUSS] Disable "Squash and merge" button for Flink
> > repository on GitHub
> >
> > Hi Yadong,
> >
> > Maybe we firstly reach out INFRA team and see the reply from their side.
> >
> > Since the actual operator is INFRA team, in the dev mailing list we can
> > focus on motivation and
> > wait for the reply.
> >
> > Best,
> > tison.
> >
> >
> > Yadong Xie  于2020年3月5日周四 下午9:29写道:
> >
> > > Hi Jark
> > >
> > > I think GitHub UI can not disable both the "Squash and merge" button
> and
> > > "Rebase and merge" at the same time if there exists any protected
> branch
> > in
> > > the repository(according to github rules).
> > >
> > > If we only left "merge and commits" button, it will against requiring a
> > > linear commit history rules here
> > >
> > >
> >
> https://help.github.com/en/github/administering-a-repository/requiring-a-linear-commit-history
> > >
> > > tison  于2020年3月5日周四 下午9:04写道:
> > >
> > > > For implement it, file a JIRA ticket in INFRA [1]
> > > >
> > > > Best,
> > > > tison.
> > > > [1] https://issues.apache.org/jira/projects/INFRA
> > > >
> > > >
> > > > Stephan Ewen  于2020年3月5日周四 下午8:57写道:
> > > >
> > > > > Big +1 to disable it.
> > > > >
> > > > > I have never been a fan, it has always caused problems:
> > > > >   - Merge commits
> > > > >   - weird alias emails
> > > > >   - lost author information
> > > > >   - commit message misses the "This closes #" line to track
> back
> > > > > commits to PRs/reviews.
> > > > >
> > > > > The button goes against best practice, it should go away.
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie 
> > wrote:
> > > > >
> > > > > > Hi Jark
> > > > > > There is a conversation about this here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797
> > > > > > I think GitHub will fix it soon, it is a bug, not a feature :).
> > > > > >
> > > > > > Jingsong Li  于2020年3月5日周四 下午8:32写道:
> > > > > >
> > > > > > > Thanks for deep investigation.
> > > > > > >
> > > > > > > +1 to disable "Squash and merge" button now.
> > > > > > > But I think this is a very serious problem, It affects too many
> > > > GitHub
> > > > > > > workers. Github should deal with it quickly?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong Lee
> > > > > > >
> > > > > > > On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang <
> hxbks...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jark,
> > > > > > > >
> > > > > > > > Thanks for bringing up this discussion. Good catch. Agree
> that
> > we
> > > > can
> > > > > > > > disable "Squash and merge"(also the other buttons) for now.
> > > > > > > >
> > > > > > > > There is a guideline on how to do that in
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests
> > > > > > > > .
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Xingbo
> > > > > > > >
> > > > > > > > Jark Wu  于2020年3月5日周四 下午6:42写道:

[jira] [Created] (FLINK-16445) Raise japicmp.referenceVersion to 1.10.0

2020-03-05 Thread Gary Yao (Jira)
Gary Yao created FLINK-16445:


 Summary: Raise japicmp.referenceVersion to 1.10.0
 Key: FLINK-16445
 URL: https://issues.apache.org/jira/browse/FLINK-16445
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.11.0
Reporter: Gary Yao
Assignee: Gary Yao
 Fix For: 1.11.0


In {{pom.xml}}, change property {{japicmp.referenceVersion}} to {{1.10.0}}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Piotr Nowojski
Hi,

If it’s really not preserving ownership (I didn’t notice the problem before), 
+1 for removing “squash and merge”.

However -1 for removing “rebase and merge”. I didn’t see any issues with it and 
I’m using it constantly.

Piotrek

> On 5 Mar 2020, at 16:40, Jark Wu  wrote:
> 
> Hi all,
> 
> Thanks for the feedbacks. But I want to clarify the motivation to disable
> "Squash and merge" is just because of the regression/bug of the missing
> author information.
> If GitHub fixes this later, I think it makes sense to bring this button
> back.
> 
> Hi Stephan & Zhijiang,
> 
> To be honest, I love the "Squash and merge" button and often use it. It
> saves me a lot of time to merge PRs, because pulling and pushing commits in
> China is very unstable.
> 
> I don't think the potential problems you mentioned is a "problem".
> For "Squash and merge",
> - "Merge commits": there is no "merge" commits, because GitHub will squash
> commits and rebase the commit and then add to the master branch.
> - "This closes #" line to track back: when you click "Squash and
> merge", it allows you to edit the title and description, so you can
> add "This closes #" message to the description the same with in the
> local git. Besides, GitHub automatically append "(#)" after the title,
> which is also helpful to track.
> 
> Best,
> Jark
> 
> On Thu, 5 Mar 2020 at 23:36, Robert Metzger  wrote:
> 
>> +1 for disabling this feature for now.
>> 
>> Thanks a lot for spotting this!
>> 
>> On Thu, Mar 5, 2020 at 3:54 PM Zhijiang > .invalid>
>> wrote:
>> 
>>> +1 for disabling "Squash and merge" if feasible to do that.
>>> 
>>> The possible benefit to use this button is for saving some efforts to
>>> squash some intermediate "[fixup]" commits during PR review.
>>> But it would bring more potential problems as mentioned below, missing
>>> author information and message of "This closes #", etc.
>>> Even it might cause unexpected format of long commit content description
>>> if not handled carefully in the text box.
>>> 
>>> Best,
>>> Zhijiang
>>> 
>>> 
>>> --
>>> From:tison 
>>> Send Time:2020 Mar. 5 (Thu.) 21:34
>>> To:dev 
>>> Subject:Re: [DISCUSS] Disable "Squash and merge" button for Flink
>>> repository on GitHub
>>> 
>>> Hi Yadong,
>>> 
>>> Maybe we firstly reach out INFRA team and see the reply from their side.
>>> 
>>> Since the actual operator is INFRA team, in the dev mailing list we can
>>> focus on motivation and
>>> wait for the reply.
>>> 
>>> Best,
>>> tison.
>>> 
>>> 
>>> Yadong Xie  于2020年3月5日周四 下午9:29写道:
>>> 
 Hi Jark
 
 I think GitHub UI can not disable both the "Squash and merge" button
>> and
 "Rebase and merge" at the same time if there exists any protected
>> branch
>>> in
 the repository(according to github rules).
 
 If we only left "merge and commits" button, it will against requiring a
 linear commit history rules here
 
 
>>> 
>> https://help.github.com/en/github/administering-a-repository/requiring-a-linear-commit-history
 
 tison  于2020年3月5日周四 下午9:04写道:
 
> For implement it, file a JIRA ticket in INFRA [1]
> 
> Best,
> tison.
> [1] https://issues.apache.org/jira/projects/INFRA
> 
> 
> Stephan Ewen  于2020年3月5日周四 下午8:57写道:
> 
>> Big +1 to disable it.
>> 
>> I have never been a fan, it has always caused problems:
>>  - Merge commits
>>  - weird alias emails
>>  - lost author information
>>  - commit message misses the "This closes #" line to track
>> back
>> commits to PRs/reviews.
>> 
>> The button goes against best practice, it should go away.
>> 
>> Best,
>> Stephan
>> 
>> 
>> On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie 
>>> wrote:
>> 
>>> Hi Jark
>>> There is a conversation about this here:
>>> 
>>> 
>> 
> 
 
>>> 
>> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797
>>> I think GitHub will fix it soon, it is a bug, not a feature :).
>>> 
>>> Jingsong Li  于2020年3月5日周四 下午8:32写道:
>>> 
 Thanks for deep investigation.
 
 +1 to disable "Squash and merge" button now.
 But I think this is a very serious problem, It affects too many
> GitHub
 workers. Github should deal with it quickly?
 
 Best,
 Jingsong Lee
 
 On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang <
>> hxbks...@gmail.com>
>> wrote:
 
> Hi Jark,
> 
> Thanks for bringing up this discussion. Good catch. Agree
>> that
>>> we
> can
> disable "Squash and merge"(also the other buttons) for now.
> 
> There is a guideline on how to do that in
> 
> 
 
>>> 
>> 
> 
 
>>> 
>> https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-f

[jira] [Created] (FLINK-16446) Add rate limiting feature for FlinkKafkaConsumer

2020-03-05 Thread Zou (Jira)
Zou created FLINK-16446:
---

 Summary: Add rate limiting feature for FlinkKafkaConsumer
 Key: FLINK-16446
 URL: https://issues.apache.org/jira/browse/FLINK-16446
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Reporter: Zou


There is a rate limiting feature in FlinkKafkaConsumer010 and 
FlinkKafkaConsumer011, but not in FlinkKafkaConsumer.  We could also add this 
feature in FlinkKafkaConsumer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Stephan Ewen
It looks like this feature still messes up email addresses, for example if
you do a "git log | grep noreply" in the repo.

Don't most PRs consist anyways of multiple commits where we want to
preserve "refactor" and "feature" differentiation in the history, rather
than squash everything?

On Thu, Mar 5, 2020 at 4:54 PM Piotr Nowojski  wrote:

> Hi,
>
> If it’s really not preserving ownership (I didn’t notice the problem
> before), +1 for removing “squash and merge”.
>
> However -1 for removing “rebase and merge”. I didn’t see any issues with
> it and I’m using it constantly.
>
> Piotrek
>
> > On 5 Mar 2020, at 16:40, Jark Wu  wrote:
> >
> > Hi all,
> >
> > Thanks for the feedbacks. But I want to clarify the motivation to disable
> > "Squash and merge" is just because of the regression/bug of the missing
> > author information.
> > If GitHub fixes this later, I think it makes sense to bring this button
> > back.
> >
> > Hi Stephan & Zhijiang,
> >
> > To be honest, I love the "Squash and merge" button and often use it. It
> > saves me a lot of time to merge PRs, because pulling and pushing commits
> in
> > China is very unstable.
> >
> > I don't think the potential problems you mentioned is a "problem".
> > For "Squash and merge",
> > - "Merge commits": there is no "merge" commits, because GitHub will
> squash
> > commits and rebase the commit and then add to the master branch.
> > - "This closes #" line to track back: when you click "Squash and
> > merge", it allows you to edit the title and description, so you can
> > add "This closes #" message to the description the same with in the
> > local git. Besides, GitHub automatically append "(#)" after the
> title,
> > which is also helpful to track.
> >
> > Best,
> > Jark
> >
> > On Thu, 5 Mar 2020 at 23:36, Robert Metzger  wrote:
> >
> >> +1 for disabling this feature for now.
> >>
> >> Thanks a lot for spotting this!
> >>
> >> On Thu, Mar 5, 2020 at 3:54 PM Zhijiang  >> .invalid>
> >> wrote:
> >>
> >>> +1 for disabling "Squash and merge" if feasible to do that.
> >>>
> >>> The possible benefit to use this button is for saving some efforts to
> >>> squash some intermediate "[fixup]" commits during PR review.
> >>> But it would bring more potential problems as mentioned below, missing
> >>> author information and message of "This closes #", etc.
> >>> Even it might cause unexpected format of long commit content
> description
> >>> if not handled carefully in the text box.
> >>>
> >>> Best,
> >>> Zhijiang
> >>>
> >>>
> >>> --
> >>> From:tison 
> >>> Send Time:2020 Mar. 5 (Thu.) 21:34
> >>> To:dev 
> >>> Subject:Re: [DISCUSS] Disable "Squash and merge" button for Flink
> >>> repository on GitHub
> >>>
> >>> Hi Yadong,
> >>>
> >>> Maybe we firstly reach out INFRA team and see the reply from their
> side.
> >>>
> >>> Since the actual operator is INFRA team, in the dev mailing list we can
> >>> focus on motivation and
> >>> wait for the reply.
> >>>
> >>> Best,
> >>> tison.
> >>>
> >>>
> >>> Yadong Xie  于2020年3月5日周四 下午9:29写道:
> >>>
>  Hi Jark
> 
>  I think GitHub UI can not disable both the "Squash and merge" button
> >> and
>  "Rebase and merge" at the same time if there exists any protected
> >> branch
> >>> in
>  the repository(according to github rules).
> 
>  If we only left "merge and commits" button, it will against requiring
> a
>  linear commit history rules here
> 
> 
> >>>
> >>
> https://help.github.com/en/github/administering-a-repository/requiring-a-linear-commit-history
> 
>  tison  于2020年3月5日周四 下午9:04写道:
> 
> > For implement it, file a JIRA ticket in INFRA [1]
> >
> > Best,
> > tison.
> > [1] https://issues.apache.org/jira/projects/INFRA
> >
> >
> > Stephan Ewen  于2020年3月5日周四 下午8:57写道:
> >
> >> Big +1 to disable it.
> >>
> >> I have never been a fan, it has always caused problems:
> >>  - Merge commits
> >>  - weird alias emails
> >>  - lost author information
> >>  - commit message misses the "This closes #" line to track
> >> back
> >> commits to PRs/reviews.
> >>
> >> The button goes against best practice, it should go away.
> >>
> >> Best,
> >> Stephan
> >>
> >>
> >> On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie 
> >>> wrote:
> >>
> >>> Hi Jark
> >>> There is a conversation about this here:
> >>>
> >>>
> >>
> >
> 
> >>>
> >>
> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797
> >>> I think GitHub will fix it soon, it is a bug, not a feature :).
> >>>
> >>> Jingsong Li  于2020年3月5日周四 下午8:32写道:
> >>>
>  Thanks for deep investigation.
> 
>  +1 to disable "Squash and merge" button now.
>  But I think this is a very serious problem, It affects too many
> > GitHub
>  workers. Github sho

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

2020-03-05 Thread Bowen Li
> I have some hesitation, because the actual version number can better
reflect the actual dependency. For example, if the user also knows the
field hiveVersion[1]. He may enter the wrong hiveVersion because of the
name, or he may have the wrong expectation for the hive built-in functions.

Sorry, I'm not sure if my proposal is understood correctly.

What I'm saying is, in your original proposal, taking an example, suggested
naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 -
1.2.2, a name including the highest Hive version it supports. I'm
suggesting to name it "flink-connector-hive-1.0", a name including the
lowest Hive version it supports.

What do you think?



On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li  wrote:

> Hi Bowen, thanks for your reply.
>
> > will there be a base module like "flink-connector-hive-base" which holds
> all the common logic of these proposed modules
>
> Maybe we don't need, their implementation is only "pom.xml". Different
> versions have different dependencies.
>
> > it's more common to set the version in module name to be the lowest
> version that this module supports
>
> I have some hesitation, because the actual version number can better
> reflect the actual dependency. For example, if the user also knows the
> field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> name, or he may have the wrong expectation for the hive built-in functions.
>
> [1] https://github.com/apache/flink/pull/11304
>
> Best,
> Jingsong Lee
>
> On Thu, Mar 5, 2020 at 2:34 PM Bowen Li  wrote:
>
> > Thanks Jingsong for your explanation! I'm +1 for this initiative.
> >
> > According to your description, I think it makes sense to incorporate
> > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges
> to
> > 4.
> >
> > A couple minor followup questions:
> > 1) will there be a base module like "flink-connector-hive-base" which
> holds
> > all the common logic of these proposed modules and is compiled into the
> > uber jar of "flink-connector-hive-xxx"?
> > 2) according to my observation, it's more common to set the version in
> > module name to be the lowest version that this module supports, e.g. for
> > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> > rather than "flink-connector-hive-1.2"
> >
> >
> > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li 
> > wrote:
> >
> > > Thanks Bowen for involving.
> > >
> > > > why you proposed segregating hive versions into the 5 ranges above? &
> > > what different Hive features are supported in the 5 ranges?
> > >
> > > For only higher client dependencies version support lower hive
> metastore
> > > versions:
> > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats,
> > we
> > > can throw exception for the unsupported feature.
> > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api
> > > change.
> > > - Hive 2.2 no thrift change.
> > > - Hive 2.3 change many things, lots of thrift change.
> > > - Hive 3+, not null. unique, timestamp, so many things.
> > >
> > > All these things can be found in hive_metastore.thrift.
> > >
> > > I think I can try do more effort in implementation to use Hive 2.2 to
> > > support Hive 2.0. So the range size will be 4.
> > >
> > > > have you tested that whether the proposed corresponding Flink module
> > will
> > > be fully compatible with each Hive version range?
> > >
> > > Yes, I have done some tests, not really for "fully", but it is a
> > technical
> > > judgment.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li  wrote:
> > >
> > > > Thanks, Jingsong, for bringing this up. We've received lots of
> > feedbacks
> > > in
> > > > the past few months that the complexity involved in different Hive
> > > versions
> > > > has been quite painful for users to start with. So it's great to step
> > > > forward and deal with such issue.
> > > >
> > > > Before getting on a decision, can you please explain:
> > > >
> > > > 1) why you proposed segregating hive versions into the 5 ranges
> above?
> > > > 2) what different Hive features are supported in the 5 ranges?
> > > > 3) have you tested that whether the proposed corresponding Flink
> module
> > > > will be fully compatible with each Hive version range?
> > > >
> > > > Thanks,
> > > > Bowen
> > > >
> > > >
> > > >
> > > > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee  >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to propose introduce flink-connector-hive-xx modules.
> > > > >
> > > > > We have documented the dependencies detailed information[2]. But
> > still
> > > > has
> > > > > some inconvenient:
> > > > > - Too many versions, users need to pick one version from 8
> versions.
> > > > > - Too many versions, It's not friendly to our developers either,
> > > because
> > > > > there's a problem/exception, we need to look at eight different
> > > versions
> > > > of
> > > > > hive client code, which are often various.
> 

[jira] [Created] (FLINK-16447) Non serializable field on CompressWriterFactory

2020-03-05 Thread Jira
João Boto created FLINK-16447:
-

 Summary: Non serializable field on CompressWriterFactory
 Key: FLINK-16447
 URL: https://issues.apache.org/jira/browse/FLINK-16447
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.10.0
Reporter: João Boto


CompressWriterFactory has a CompressionCodec that is not serializable..

this make that StreamingFileSink fails to with non serializable field.

 

extending codec and implementing serializable solves the problem, but its odd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [PROPOSAL] Reverse the dependency from flink-streaming-java to flink-client

2020-03-05 Thread Kostas Kloudas
Big +1 also from my side.

This will eliminate some work-arounds used so far to bypass the module
structure (like code using reflection to extract a JobGraph from a
Pipeline).

I agree with Stephan that with proper documentation, release notes and
tooling update, it will hopefully not be a big hassle for users to
migrate.
Also I think it should be done as early in the release as possible, so
that we can give it enough exposure and testing. In the past, such
deep changes late in the release have led to longer release-testing
periods and, eventually, longer release cycles.

Cheers,
Kostas

On Thu, Mar 5, 2020 at 3:35 PM Stephan Ewen  wrote:
>
> +1 to this fix, in general.
>
> If the main issue is that users have to now add "flink-clients" explicitly,
> then I think this is okay, if we spell it out prominently in the release
> notes, and make sure quickstarts / etc are updated, and have a good error
> message when client/runtime classes are not found.
>
> On Thu, Mar 5, 2020 at 2:56 PM Aljoscha Krettek  wrote:
>
> > Hi,
> >
> > thanks for starting the discussion, Tison!
> >
> > I'd like to fix this dependency mess rather sooner than later, but we do
> > have to consider the fact that we are breaking the dependency setup of
> > users. If they they only had a dependency on flink-streaming-java before
> > but used classes from flink-clients they would have to explicitly add
> > this dependency now.
> >
> > Let's see what others think.
> >
> > Best,
> > Aljoscha
> >
> > On 05.03.20 02:53, tison wrote:
> > > Hi devs,
> > >
> > > Here is a proposal to reverse the dependency from flink-streaming-java to
> > > flink-client, for a proper
> > > module dependency graph. Since it changes current structure, it should be
> > > discussed publicly.
> > >
> > > The original idea comes from that flink-streaming-java acts as an API
> > only
> > > module just as what
> > > we do in its batch companion flink-java. If a Flink user want to write a
> > > minimum DataStream
> > > program, the only dependency should be flink-streaming java.
> > >
> > > However, currently as it is implemented, flink-client and even
> > > flink-runtime are transitively polluted
> > > in when user depends on flink-streaming-java. These dependencies polluted
> > > in as
> > >
> > > flink-client:
> > >- previously, ClusterClient, which is removed by FLIP-73 Executors
> > >- accidentally, ProgramInvocationException, we just throw in place as
> > it
> > > is accessible.
> > >- transitively, flink-optimizer, for one utility.
> > >- transitively, flink-java, for several utilities.
> > > flink-runtime:
> > >- mainly for JobGraph generating.
> > >
> > > With a previous discussion with @Aljoscha Krettek 
> > our
> > > goal is briefly making flink-streaming-java
> > > an API only module. As a first step we can break the dependency from
> > > flink-streaming-java to
> > > flink-client[1][2].
> > >
> > > With this first step, continuously we factor out common utilities in
> > > flink-java to
> > > flink-core and eventually eliminate dependencies from streaming to batch;
> > > while
> > > orthogonally, we factor out job compilation logic into
> > > flink-streaming-compiler module and
> > > break the dependency to flink-runtime. The final dependency graph will
> > be:
> > >
> > >
> > > flink-client -> flink-streaming-compiler -> flink-runtime
> > >   \->
> > > flink-streaming-java
> > >
> > > Looking forward to your feedback. Basically whether or not it is in a
> > right
> > > direction, and if so,
> > > how the community integrates this proposal.
> > >
> > > Best,
> > > tison.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-15090
> > > [2] https://issues.apache.org/jira/browse/FLINK-16427
> > >
> >


[jira] [Created] (FLINK-16448) add documentation for Hive table sink parallelism setting strategy

2020-03-05 Thread Bowen Li (Jira)
Bowen Li created FLINK-16448:


 Summary: add documentation for Hive table sink parallelism setting 
strategy
 Key: FLINK-16448
 URL: https://issues.apache.org/jira/browse/FLINK-16448
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Jingsong Lee
 Fix For: 1.11.0


per user-zh mailing list question, would be beneficial to add documentation for 
Hive table sink parallelism setting strategy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: SerializableHadoopConfiguration

2020-03-05 Thread João Boto
We could merge the two modules into one?
sequence-files its another way of compressing files..


On 2020/03/05 13:02:46, Sivaprasanna  wrote: 
> Hi Stephen,
> 
> I guess it is a valid point to have something like 'flink-hadoop-utils'.
> Maybe a [DISCUSS] thread can be started to understand what the community
> thinks?
> 
> On Thu, Mar 5, 2020 at 4:22 PM Stephan Ewen  wrote:
> 
> > Do we have more cases of "common Hadoop Utils"?
> >
> > If yes, does it make sense to create a "flink-hadoop-utils" module with
> > exactly such classes? It would have an optional dependency on
> > "flink-shaded-hadoop".
> >
> > On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann  wrote:
> >
> > > Hi Sivaprasanna,
> > >
> > > we don't upload the source jars for the flink-shaded modules. However you
> > > can build them yourself and install by cloning the flink-shaded
> > repository
> > > [1] and then call `mvn package -Dshade-sources`.
> > >
> > > [1] https://github.com/apache/flink-shaded
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna 
> > > wrote:
> > >
> > > > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if any
> > > Flink
> > > > module is going to use Hadoop in any way, it will most probably include
> > > > flink-shaded-hadoop-2 as a dependency.
> > > > However, flink-shaded modules don't have any source files. Is that a
> > > strict
> > > > convention that the community follows?
> > > >
> > > > -
> > > > Sivaprasanna
> > > >
> > > > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <
> > sivaprasanna...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Arvid,
> > > > >
> > > > > Thanks for the quick reply. Yes, it actually makes sense to avoid
> > > Hadoop
> > > > > dependencies from getting into Flink's core modules but I also wonder
> > > if
> > > > it
> > > > > will be an overkill to add flink-hadoop-fs as a dependency just
> > because
> > > > we
> > > > > want to use a utility class from that module.
> > > > >
> > > > > -
> > > > > Sivaprasanna
> > > > >
> > > > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise 
> > > wrote:
> > > > >
> > > > >> Hi Sivaprasanna,
> > > > >>
> > > > >> we actually want to remove Hadoop from all core modules, so we could
> > > not
> > > > >> place it in some very common place like flink-core.
> > > > >>
> > > > >> But I think the module flink-hadoop-fs could be a fitting place.
> > > > >>
> > > > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
> > > sivaprasanna...@gmail.com
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi
> > > > >> >
> > > > >> > The flink-sequence-file module has a class named
> > > > >> > SerializableHadoopConfiguration[1] which is nothing but a wrapper
> > > > class
> > > > >> for
> > > > >> > Hadoop Configuration. I believe this class can be moved to a
> > common
> > > > >> module
> > > > >> > since this is not necessarily tightly coupled with sequence-file
> > > > module,
> > > > >> > and also because it can be used by many other modules, for ex.
> > > > >> > flink-compress. Thoughts?
> > > > >> >
> > > > >> > -
> > > > >> > Sivaprasanna
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
> 


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Seems, this will be fixed today:

https://twitter.com/natfriedman/status/1235613840659767298?s=19


- -Matthias

On 3/5/20 8:37 AM, Stephan Ewen wrote:
> It looks like this feature still messes up email addresses, for
> example if you do a "git log | grep noreply" in the repo.
>
> Don't most PRs consist anyways of multiple commits where we want
> to preserve "refactor" and "feature" differentiation in the
> history, rather than squash everything?
>
> On Thu, Mar 5, 2020 at 4:54 PM Piotr Nowojski 
> wrote:
>
>> Hi,
>>
>> If it’s really not preserving ownership (I didn’t notice the
>> problem before), +1 for removing “squash and merge”.
>>
>> However -1 for removing “rebase and merge”. I didn’t see any
>> issues with it and I’m using it constantly.
>>
>> Piotrek
>>
>>> On 5 Mar 2020, at 16:40, Jark Wu  wrote:
>>>
>>> Hi all,
>>>
>>> Thanks for the feedbacks. But I want to clarify the motivation
>>> to disable "Squash and merge" is just because of the
>>> regression/bug of the missing author information. If GitHub
>>> fixes this later, I think it makes sense to bring this button
>>> back.
>>>
>>> Hi Stephan & Zhijiang,
>>>
>>> To be honest, I love the "Squash and merge" button and often
>>> use it. It saves me a lot of time to merge PRs, because pulling
>>> and pushing commits
>> in
>>> China is very unstable.
>>>
>>> I don't think the potential problems you mentioned is a
>>> "problem". For "Squash and merge", - "Merge commits": there is
>>> no "merge" commits, because GitHub will
>> squash
>>> commits and rebase the commit and then add to the master
>>> branch. - "This closes #" line to track back: when you
>>> click "Squash and merge", it allows you to edit the title and
>>> description, so you can add "This closes #" message to the
>>> description the same with in the local git. Besides, GitHub
>>> automatically append "(#)" after the
>> title,
>>> which is also helpful to track.
>>>
>>> Best, Jark
>>>
>>> On Thu, 5 Mar 2020 at 23:36, Robert Metzger
>>>  wrote:
>>>
 +1 for disabling this feature for now.

 Thanks a lot for spotting this!

 On Thu, Mar 5, 2020 at 3:54 PM Zhijiang
  wrote:

> +1 for disabling "Squash and merge" if feasible to do
> that.
>
> The possible benefit to use this button is for saving some
> efforts to squash some intermediate "[fixup]" commits
> during PR review. But it would bring more potential
> problems as mentioned below, missing author information and
> message of "This closes #", etc. Even it might cause
> unexpected format of long commit content
>> description
> if not handled carefully in the text box.
>
> Best, Zhijiang
>
>
> --
>
>
From:tison 
> Send Time:2020 Mar. 5 (Thu.) 21:34 To:dev
>  Subject:Re: [DISCUSS] Disable
> "Squash and merge" button for Flink repository on GitHub
>
> Hi Yadong,
>
> Maybe we firstly reach out INFRA team and see the reply
> from their
>> side.
>
> Since the actual operator is INFRA team, in the dev mailing
> list we can focus on motivation and wait for the reply.
>
> Best, tison.
>
>
> Yadong Xie  于2020年3月5日周四 下午9:29写道:
>
>> Hi Jark
>>
>> I think GitHub UI can not disable both the "Squash and
>> merge" button
 and
>> "Rebase and merge" at the same time if there exists any
>> protected
 branch
> in
>> the repository(according to github rules).
>>
>> If we only left "merge and commits" button, it will
>> against requiring
>> a
>> linear commit history rules here
>>
>>
>

>> https://help.github.com/en/github/administering-a-repository/requirin
g-a-linear-commit-history
>>
>>
>>
tison  于2020年3月5日周四 下午9:04写道:
>>
>>> For implement it, file a JIRA ticket in INFRA [1]
>>>
>>> Best, tison. [1]
>>> https://issues.apache.org/jira/projects/INFRA
>>>
>>>
>>> Stephan Ewen  于2020年3月5日周四 下午8:57写道:
>>>
 Big +1 to disable it.

 I have never been a fan, it has always caused
 problems: - Merge commits - weird alias emails - lost
 author information - commit message misses the "This
 closes #" line to track
 back
 commits to PRs/reviews.

 The button goes against best practice, it should go
 away.

 Best, Stephan


 On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie
 
> wrote:

> Hi Jark There is a conversation about this here:
>
>

>>>
>>
>

>> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-m
erge-commits-made-by-Github-Apps-changed/td-p/48797
>
>>
I think GitHub will fix it soon, it is a bug, not a feature :).
>
> Jingsong Li  于2020年3月5日周四 下
>

[jira] [Created] (FLINK-16449) Deprecated methods in the Table API walkthrough.

2020-03-05 Thread Marta Paes Moreira (Jira)
Marta Paes Moreira created FLINK-16449:
--

 Summary: Deprecated methods in the Table API walkthrough.
 Key: FLINK-16449
 URL: https://issues.apache.org/jira/browse/FLINK-16449
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts, Table SQL / API
Affects Versions: 1.10.0
Reporter: Marta Paes Moreira


The sample code provided for the Table API walkthrough [1] includes methods 
that have been deprecated in Flink 1.10 (FLIP-64): registerTableSource, 
registerTableSink, scan.

This can be confusing to users trying to get started, also because finding how 
to use the alternative methods is not very intuitive.

[1] 
[https://ci.apache.org/projects/flink/flink-docs-release-1.10/getting-started/walkthroughs/table_api.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Jingsong Li
Hi,

I agree with Jark. The tool is useful. If there are some problem, I think
we can reach an agreement to form certain terms?

Github provides:
- "rebase and merge" keep all commits.
- "squash and merge" squash all commits to one commits, pull request
authors used to be multiple commits, like "address comments", "Fix
comments", "Fix checkstyle". I think we can help authors to squash these
useless commits.

Best,
Jingsong Lee

On Fri, Mar 6, 2020 at 4:46 AM Matthias J. Sax  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Seems, this will be fixed today:
>
> https://twitter.com/natfriedman/status/1235613840659767298?s=19
>
>
> - -Matthias
>
> On 3/5/20 8:37 AM, Stephan Ewen wrote:
> > It looks like this feature still messes up email addresses, for
> > example if you do a "git log | grep noreply" in the repo.
> >
> > Don't most PRs consist anyways of multiple commits where we want
> > to preserve "refactor" and "feature" differentiation in the
> > history, rather than squash everything?
> >
> > On Thu, Mar 5, 2020 at 4:54 PM Piotr Nowojski 
> > wrote:
> >
> >> Hi,
> >>
> >> If it’s really not preserving ownership (I didn’t notice the
> >> problem before), +1 for removing “squash and merge”.
> >>
> >> However -1 for removing “rebase and merge”. I didn’t see any
> >> issues with it and I’m using it constantly.
> >>
> >> Piotrek
> >>
> >>> On 5 Mar 2020, at 16:40, Jark Wu  wrote:
> >>>
> >>> Hi all,
> >>>
> >>> Thanks for the feedbacks. But I want to clarify the motivation
> >>> to disable "Squash and merge" is just because of the
> >>> regression/bug of the missing author information. If GitHub
> >>> fixes this later, I think it makes sense to bring this button
> >>> back.
> >>>
> >>> Hi Stephan & Zhijiang,
> >>>
> >>> To be honest, I love the "Squash and merge" button and often
> >>> use it. It saves me a lot of time to merge PRs, because pulling
> >>> and pushing commits
> >> in
> >>> China is very unstable.
> >>>
> >>> I don't think the potential problems you mentioned is a
> >>> "problem". For "Squash and merge", - "Merge commits": there is
> >>> no "merge" commits, because GitHub will
> >> squash
> >>> commits and rebase the commit and then add to the master
> >>> branch. - "This closes #" line to track back: when you
> >>> click "Squash and merge", it allows you to edit the title and
> >>> description, so you can add "This closes #" message to the
> >>> description the same with in the local git. Besides, GitHub
> >>> automatically append "(#)" after the
> >> title,
> >>> which is also helpful to track.
> >>>
> >>> Best, Jark
> >>>
> >>> On Thu, 5 Mar 2020 at 23:36, Robert Metzger
> >>>  wrote:
> >>>
>  +1 for disabling this feature for now.
> 
>  Thanks a lot for spotting this!
> 
>  On Thu, Mar 5, 2020 at 3:54 PM Zhijiang
>   wrote:
> 
> > +1 for disabling "Squash and merge" if feasible to do
> > that.
> >
> > The possible benefit to use this button is for saving some
> > efforts to squash some intermediate "[fixup]" commits
> > during PR review. But it would bring more potential
> > problems as mentioned below, missing author information and
> > message of "This closes #", etc. Even it might cause
> > unexpected format of long commit content
> >> description
> > if not handled carefully in the text box.
> >
> > Best, Zhijiang
> >
> >
> > --
> >
> >
> From:tison 
> > Send Time:2020 Mar. 5 (Thu.) 21:34 To:dev
> >  Subject:Re: [DISCUSS] Disable
> > "Squash and merge" button for Flink repository on GitHub
> >
> > Hi Yadong,
> >
> > Maybe we firstly reach out INFRA team and see the reply
> > from their
> >> side.
> >
> > Since the actual operator is INFRA team, in the dev mailing
> > list we can focus on motivation and wait for the reply.
> >
> > Best, tison.
> >
> >
> > Yadong Xie  于2020年3月5日周四 下午9:29写道:
> >
> >> Hi Jark
> >>
> >> I think GitHub UI can not disable both the "Squash and
> >> merge" button
>  and
> >> "Rebase and merge" at the same time if there exists any
> >> protected
>  branch
> > in
> >> the repository(according to github rules).
> >>
> >> If we only left "merge and commits" button, it will
> >> against requiring
> >> a
> >> linear commit history rules here
> >>
> >>
> >
> 
> >> https://help.github.com/en/github/administering-a-repository/requirin
> g-a-linear-commit-history
> 
> >>
> >>
> >>
> tison  于2020年3月5日周四 下午9:04写道:
> >>
> >>> For implement it, file a JIRA ticket in INFRA [1]
> >>>
> >>> Best, tison. [1]
> >>> https://issues.apache.org/jira/projects/INFRA
> >>>
> >>>
> >>> Stephan Ewen  于2020年3月5日周四 下午8:57写道:
> >>>
>  Big 

[jira] [Created] (FLINK-16450) Integrate parquet columnar row reader to hive

2020-03-05 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-16450:


 Summary: Integrate parquet columnar row reader to hive
 Key: FLINK-16450
 URL: https://issues.apache.org/jira/browse/FLINK-16450
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: 1.11.0


Use parquet columnar row reader in hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[Discuss] IntervalJoin one side sorted cache

2020-03-05 Thread Chen Qin
Hi there,

I would like kick off discussion on
https://issues.apache.org/jira/browse/FLINK-16392 and discuss what is best
way moving forward. Here is problem statement and proposal we have in mind.
Please kindly provide feedback.

Native intervaljoin rely on statebackend(e.g rocksdb) to insert/fetch left
and right buffer. This design choice reduce minimize heap memory footprint
while bounded process throughput of single taskmanager iops to rocksdb
access speed. Here at Pinterest, we have some large use cases where
developers join large and slow evolving data stream (e.g post updates in
last 28 days) with web traffic datastream (e.g post views up to 28 days
after given update).

This post some challenge to current implementation of intervaljoin

   - partitioned rocksdb needs to keep both updates and views for 28 days,
   large buffer(especially view stream side) cause rocksdb slow down and lead
   to overall interval join performance degregate quickly as state build up.


   - view stream is web scale, even after setting large parallelism it can
   put lot of pressure on each subtask and backpressure entire job

In proposed implementation, we plan to introduce two changes

   - support ProcessJoinFunction settings to opt-in earlier cleanup time of
   right stream(e.g view stream don't have to stay in buffer for 28 days and
   wait for update stream to join, related post views happens after update in
   event time semantic) This optimization can reduce state size to improve
   rocksdb throughput. If extreme case, user can opt-in in flight join and
   skip write into right view stream buffer to save iops budget on each subtask


   - support ProcessJoinFunction settings to expedite keyed lookup of slow
   changing stream. Instead of every post view pull post updates from rocksdb.
   user can opt-in and having one side buffer cache available in memory. If a
   given post update, cache load recent views from right buffer and use
   sortedMap to find buckets. If a given post view, cache load recent updates
   from left buffer to memory. When another view for that post arrives, flink
   save cost of rocksdb access.

Thanks,
Chen Qin


[jira] [Created] (FLINK-16451) listagg with distinct for over window

2020-03-05 Thread jinfeng (Jira)
jinfeng created FLINK-16451:
---

 Summary: listagg with distinct for over window 
 Key: FLINK-16451
 URL: https://issues.apache.org/jira/browse/FLINK-16451
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.10.0, 1.9.2
Reporter: jinfeng


When I use lisgagg with distinct and over window.
{code:java}
//代码占位符
"select listagg(distinct product, '|') over(partition by user order by proctime 
rows between 200 preceding and current row) as product, user from " + testTable
{code}
I got the follwing exception
{code:java}
//代码占位符

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 3, Size: 
3 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at 
java.util.ArrayList.get(ArrayList.java:433) at 
java.util.Collections$UnmodifiableList.get(Collections.java:1311) at 
org.apache.flink.table.types.logical.RowType.getTypeAt(RowType.java:174) at 
org.apache.flink.table.planner.codegen.GenerateUtils$.generateFieldAccess(GenerateUtils.scala:635)
 at 
org.apache.flink.table.planner.codegen.GenerateUtils$.generateFieldAccess(GenerateUtils.scala:620)
 at 
org.apache.flink.table.planner.codegen.GenerateUtils$.generateInputAccess(GenerateUtils.scala:524)
 at 
org.apache.flink.table.planner.codegen.agg.DistinctAggCodeGen$$anonfun$10.apply(DistinctAggCodeGen.scala:374)
 at 
org.apache.flink.table.planner.codegen.agg.DistinctAggCodeGen$$anonfun$10.apply(DistinctAggCodeGen.scala:374)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofInt.foreach(ArrayOps.scala:234) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.mutable.ArrayOps$ofInt.map(ArrayOps.scala:234) at 
org.apache.flink.table.planner.codegen.agg.DistinctAggCodeGen.generateKeyExpression(DistinctAggCodeGen.scala:374)
 at 
org.apache.flink.table.planner.codegen.agg.DistinctAggCodeGen.accumulate(DistinctAggCodeGen.scala:192)
 at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$12.apply(AggsHandlerCodeGenerator.scala:871)
 at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$12.apply(AggsHandlerCodeGenerator.scala:871)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator.genAccumulate(AggsHandlerCodeGenerator.scala:871)
 at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator.generateAggsHandler(AggsHandlerCodeGenerator.scala:329)
 at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.createBoundedOverProcessFunction(StreamExecOverAggregate.scala:425)
 at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:255)
 at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:56)
 at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58)
 at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlan(StreamExecOverAggregate.scala:56)
 at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:54)
 at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:39)
{code}
But It worked with 
{code:java}
//代码占位符
select listagg(distinct product) over(partition by user order by proctime rows 
between 200 preceding and current row) as product, user from " + testTable
{code}
 
{code:java}
//代码占位符
private def generateKeyExpression(
ctx: CodeGeneratorContext,
generator: ExprCodeGenerator): GeneratedExpression = {
  val fieldExprs = distinctInfo.argIndexes.map(generateInputAccess(
ctx,
generator.input1Type,
generator.input1Term,
_,
nullableInput = false,
deepCopy = inputFieldCopy))
{code}
The exception will be throw  at the below code.

The distinctInfo.argIndexs is  [1, 3] .  But the index 3 is a logical index. It 
will be replaced by  '|' . And should not  generate Input Access for  index 3 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Introduce flink-connector-hive-xx modules

2020-03-05 Thread Jingsong Li
Hi Bowen,

My idea is to directly provide the really dependent version, such as hive
1.2.2, our jar name is hive 1.2.2, so that users can directly and clearly
know the version. As for which metastore is supported, we can guide it in
the document, otherwise, write 1.0, and the result version is indeed 1.2.2,
which will make users have wrong expectations.

Another, maybe 2.3.6 can support 2.0-2.2 after some efforts.

Best,
Jingsong Lee

On Fri, Mar 6, 2020 at 1:00 AM Bowen Li  wrote:

> > I have some hesitation, because the actual version number can better
> reflect the actual dependency. For example, if the user also knows the
> field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> name, or he may have the wrong expectation for the hive built-in functions.
>
> Sorry, I'm not sure if my proposal is understood correctly.
>
> What I'm saying is, in your original proposal, taking an example, suggested
> naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 -
> 1.2.2, a name including the highest Hive version it supports. I'm
> suggesting to name it "flink-connector-hive-1.0", a name including the
> lowest Hive version it supports.
>
> What do you think?
>
>
>
> On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li 
> wrote:
>
> > Hi Bowen, thanks for your reply.
> >
> > > will there be a base module like "flink-connector-hive-base" which
> holds
> > all the common logic of these proposed modules
> >
> > Maybe we don't need, their implementation is only "pom.xml". Different
> > versions have different dependencies.
> >
> > > it's more common to set the version in module name to be the lowest
> > version that this module supports
> >
> > I have some hesitation, because the actual version number can better
> > reflect the actual dependency. For example, if the user also knows the
> > field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> > name, or he may have the wrong expectation for the hive built-in
> functions.
> >
> > [1] https://github.com/apache/flink/pull/11304
> >
> > Best,
> > Jingsong Lee
> >
> > On Thu, Mar 5, 2020 at 2:34 PM Bowen Li  wrote:
> >
> > > Thanks Jingsong for your explanation! I'm +1 for this initiative.
> > >
> > > According to your description, I think it makes sense to incorporate
> > > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of
> ranges
> > to
> > > 4.
> > >
> > > A couple minor followup questions:
> > > 1) will there be a base module like "flink-connector-hive-base" which
> > holds
> > > all the common logic of these proposed modules and is compiled into the
> > > uber jar of "flink-connector-hive-xxx"?
> > > 2) according to my observation, it's more common to set the version in
> > > module name to be the lowest version that this module supports, e.g.
> for
> > > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> > > rather than "flink-connector-hive-1.2"
> > >
> > >
> > > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li 
> > > wrote:
> > >
> > > > Thanks Bowen for involving.
> > > >
> > > > > why you proposed segregating hive versions into the 5 ranges
> above? &
> > > > what different Hive features are supported in the 5 ranges?
> > > >
> > > > For only higher client dependencies version support lower hive
> > metastore
> > > > versions:
> > > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column
> stats,
> > > we
> > > > can throw exception for the unsupported feature.
> > > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api
> > > > change.
> > > > - Hive 2.2 no thrift change.
> > > > - Hive 2.3 change many things, lots of thrift change.
> > > > - Hive 3+, not null. unique, timestamp, so many things.
> > > >
> > > > All these things can be found in hive_metastore.thrift.
> > > >
> > > > I think I can try do more effort in implementation to use Hive 2.2 to
> > > > support Hive 2.0. So the range size will be 4.
> > > >
> > > > > have you tested that whether the proposed corresponding Flink
> module
> > > will
> > > > be fully compatible with each Hive version range?
> > > >
> > > > Yes, I have done some tests, not really for "fully", but it is a
> > > technical
> > > > judgment.
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li  wrote:
> > > >
> > > > > Thanks, Jingsong, for bringing this up. We've received lots of
> > > feedbacks
> > > > in
> > > > > the past few months that the complexity involved in different Hive
> > > > versions
> > > > > has been quite painful for users to start with. So it's great to
> step
> > > > > forward and deal with such issue.
> > > > >
> > > > > Before getting on a decision, can you please explain:
> > > > >
> > > > > 1) why you proposed segregating hive versions into the 5 ranges
> > above?
> > > > > 2) what different Hive features are supported in the 5 ranges?
> > > > > 3) have you tested that whether the proposed corresponding Flink
> > module
> > > > > will be fully compatible with 

[jira] [Created] (FLINK-16452) Insert into static partition doesn't support order by or limit

2020-03-05 Thread Rui Li (Jira)
Rui Li created FLINK-16452:
--

 Summary: Insert into static partition doesn't support order by or 
limit
 Key: FLINK-16452
 URL: https://issues.apache.org/jira/browse/FLINK-16452
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Jark Wu
Hi Stephan,

> noreply email address.
I investigated this and found some x...@users.noreply.github.com address. I
think that's because they enabled "kepp email addresses private" on GitHub
[1].

> Don't most PRs consist anyways of multiple commits where we want to
preserve "refactor" and "feature" differentiation in the history, rather
than squash everything?
For multiple commits, GitHub provides another button called "rebase and
merge" which is mentioned by Piotr. But I usually operate in local if want
to preserve multiple commits.

It seems that GitHub is fixing it in 24 hours:
https://twitter.com/yadong_xie/status/1235554461256302593

Best,
Jark

[1]:
https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/setting-your-commit-email-address

On Fri, 6 Mar 2020 at 10:05, Jingsong Li  wrote:

> Hi,
>
> I agree with Jark. The tool is useful. If there are some problem, I think
> we can reach an agreement to form certain terms?
>
> Github provides:
> - "rebase and merge" keep all commits.
> - "squash and merge" squash all commits to one commits, pull request
> authors used to be multiple commits, like "address comments", "Fix
> comments", "Fix checkstyle". I think we can help authors to squash these
> useless commits.
>
> Best,
> Jingsong Lee
>
> On Fri, Mar 6, 2020 at 4:46 AM Matthias J. Sax  wrote:
>
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > Seems, this will be fixed today:
> >
> > https://twitter.com/natfriedman/status/1235613840659767298?s=19
> >
> >
> > - -Matthias
> >
> > On 3/5/20 8:37 AM, Stephan Ewen wrote:
> > > It looks like this feature still messes up email addresses, for
> > > example if you do a "git log | grep noreply" in the repo.
> > >
> > > Don't most PRs consist anyways of multiple commits where we want
> > > to preserve "refactor" and "feature" differentiation in the
> > > history, rather than squash everything?
> > >
> > > On Thu, Mar 5, 2020 at 4:54 PM Piotr Nowojski 
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> If it’s really not preserving ownership (I didn’t notice the
> > >> problem before), +1 for removing “squash and merge”.
> > >>
> > >> However -1 for removing “rebase and merge”. I didn’t see any
> > >> issues with it and I’m using it constantly.
> > >>
> > >> Piotrek
> > >>
> > >>> On 5 Mar 2020, at 16:40, Jark Wu  wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> Thanks for the feedbacks. But I want to clarify the motivation
> > >>> to disable "Squash and merge" is just because of the
> > >>> regression/bug of the missing author information. If GitHub
> > >>> fixes this later, I think it makes sense to bring this button
> > >>> back.
> > >>>
> > >>> Hi Stephan & Zhijiang,
> > >>>
> > >>> To be honest, I love the "Squash and merge" button and often
> > >>> use it. It saves me a lot of time to merge PRs, because pulling
> > >>> and pushing commits
> > >> in
> > >>> China is very unstable.
> > >>>
> > >>> I don't think the potential problems you mentioned is a
> > >>> "problem". For "Squash and merge", - "Merge commits": there is
> > >>> no "merge" commits, because GitHub will
> > >> squash
> > >>> commits and rebase the commit and then add to the master
> > >>> branch. - "This closes #" line to track back: when you
> > >>> click "Squash and merge", it allows you to edit the title and
> > >>> description, so you can add "This closes #" message to the
> > >>> description the same with in the local git. Besides, GitHub
> > >>> automatically append "(#)" after the
> > >> title,
> > >>> which is also helpful to track.
> > >>>
> > >>> Best, Jark
> > >>>
> > >>> On Thu, 5 Mar 2020 at 23:36, Robert Metzger
> > >>>  wrote:
> > >>>
> >  +1 for disabling this feature for now.
> > 
> >  Thanks a lot for spotting this!
> > 
> >  On Thu, Mar 5, 2020 at 3:54 PM Zhijiang
> >   wrote:
> > 
> > > +1 for disabling "Squash and merge" if feasible to do
> > > that.
> > >
> > > The possible benefit to use this button is for saving some
> > > efforts to squash some intermediate "[fixup]" commits
> > > during PR review. But it would bring more potential
> > > problems as mentioned below, missing author information and
> > > message of "This closes #", etc. Even it might cause
> > > unexpected format of long commit content
> > >> description
> > > if not handled carefully in the text box.
> > >
> > > Best, Zhijiang
> > >
> > >
> > > --
> > >
> > >
> > From:tison 
> > > Send Time:2020 Mar. 5 (Thu.) 21:34 To:dev
> > >  Subject:Re: [DISCUSS] Disable
> > > "Squash and merge" button for Flink repository on GitHub
> > >
> > > Hi Yadong,
> > >
> > > Maybe we firstly reach out INFRA team and see the reply
> > > from their
> > >> side.
> > >
> > > Since the actual operator is INFRA team, in the dev mailing
> > > list we can focus on motivation and wait for the reply.
>

Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Dian Fu
Hi Jark,

Thanks for starting this discussion. Personally I also love the "squash and 
merge" button. It's very convenient.

Regarding to the email address "noreply", it seems that there are two cases:
- The email address in the original commit is already "noreply". In this case, 
this issue will still exist even if the PR is merged via command line, e.g. [1].
- The email address in the original commit is correct and it becomes "noreply" 
when merged via web page button because the author has not correctly set the 
commit email address[2] in his personal github setting, e.g.[3]. In this case, 
it's indeed a problem. However, I have checked that there are only 75 such kind 
of commits out of 5375 commits since Jan 1, 2019. So maybe it's acceptable 
compared to the benefits we could gain.

Regards,
Dian

[1] 
https://github.com/apache/flink/commit/c4db7052c78d6b8204170e17a80a2416fa760523 

[2] 
https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/adding-an-email-address-to-your-github-account
 

[3] 
https://github.com/apache/flink/commit/9b5232d79a945607a83b02b0025b3206b06c27bd 

> 在 2020年3月6日,下午12:18,Jark Wu  写道:
> 
> Hi Stephan,
> 
>> noreply email address.
> I investigated this and found some x...@users.noreply.github.com address. I
> think that's because they enabled "kepp email addresses private" on GitHub
> [1].
> 
>> Don't most PRs consist anyways of multiple commits where we want to
> preserve "refactor" and "feature" differentiation in the history, rather
> than squash everything?
> For multiple commits, GitHub provides another button called "rebase and
> merge" which is mentioned by Piotr. But I usually operate in local if want
> to preserve multiple commits.
> 
> It seems that GitHub is fixing it in 24 hours:
> https://twitter.com/yadong_xie/status/1235554461256302593
> 
> Best,
> Jark
> 
> [1]:
> https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/setting-your-commit-email-address
> 
> On Fri, 6 Mar 2020 at 10:05, Jingsong Li  wrote:
> 
>> Hi,
>> 
>> I agree with Jark. The tool is useful. If there are some problem, I think
>> we can reach an agreement to form certain terms?
>> 
>> Github provides:
>> - "rebase and merge" keep all commits.
>> - "squash and merge" squash all commits to one commits, pull request
>> authors used to be multiple commits, like "address comments", "Fix
>> comments", "Fix checkstyle". I think we can help authors to squash these
>> useless commits.
>> 
>> Best,
>> Jingsong Lee
>> 
>> On Fri, Mar 6, 2020 at 4:46 AM Matthias J. Sax  wrote:
>> 
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA512
>>> 
>>> Seems, this will be fixed today:
>>> 
>>> https://twitter.com/natfriedman/status/1235613840659767298?s=19
>>> 
>>> 
>>> - -Matthias
>>> 
>>> On 3/5/20 8:37 AM, Stephan Ewen wrote:
 It looks like this feature still messes up email addresses, for
 example if you do a "git log | grep noreply" in the repo.
 
 Don't most PRs consist anyways of multiple commits where we want
 to preserve "refactor" and "feature" differentiation in the
 history, rather than squash everything?
 
 On Thu, Mar 5, 2020 at 4:54 PM Piotr Nowojski 
 wrote:
 
> Hi,
> 
> If it’s really not preserving ownership (I didn’t notice the
> problem before), +1 for removing “squash and merge”.
> 
> However -1 for removing “rebase and merge”. I didn’t see any
> issues with it and I’m using it constantly.
> 
> Piotrek
> 
>> On 5 Mar 2020, at 16:40, Jark Wu  wrote:
>> 
>> Hi all,
>> 
>> Thanks for the feedbacks. But I want to clarify the motivation
>> to disable "Squash and merge" is just because of the
>> regression/bug of the missing author information. If GitHub
>> fixes this later, I think it makes sense to bring this button
>> back.
>> 
>> Hi Stephan & Zhijiang,
>> 
>> To be honest, I love the "Squash and merge" button and often
>> use it. It saves me a lot of time to merge PRs, because pulling
>> and pushing commits
> in
>> China is very unstable.
>> 
>> I don't think the potential problems you mentioned is a
>> "problem". For "Squash and merge", - "Merge commits": there is
>> no "merge" commits, because GitHub will
> squash
>> commits and rebase the commit and then add to the master
>> branch. - "This closes #" line to track back: when you
>> click "Squash and merge", it allows you to edit the title and
>> description, so you can add "This closes #" message to the
>> description the same with in the local git. Besides, GitHub
>> automatically append "(#)" after the
> t

[jira] [Created] (FLINK-16453) A test failure in KafkaTest

2020-03-05 Thread cpugputpu (Jira)
cpugputpu created FLINK-16453:
-

 Summary: A test failure in KafkaTest
 Key: FLINK-16453
 URL: https://issues.apache.org/jira/browse/FLINK-16453
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Reporter: cpugputpu


The tests in _org.apache.flink.table.descriptors.KafkaTest#testValidation_ may 
fail. The unexpected behaviour is presented as follows:

testValidation(org.apache.flink.table.descriptors.KafkaTest)
java.lang.AssertionError: 
expected:<{connector.property-version=1, 
connector.startup-mode=specific-offsets, 
connector.properties.kafka.stuff=42, connector.properties.zookeeper.stuff=12, 
connector.type=kafka, 
connector.specific-offsets=partition:0,offset:42;partition:1,offset:300, 
connector.topic=MyTable, connector.version=0.11}> 
but was:<{connector.property-version=1, 
connector.startup-mode=specific-offsets, 
connector.properties.kafka.stuff=42, 
connector.properties.zookeeper.stuff=12, connector.type=kafka, 
connector.specific-offsets=partition:1,
offset:300;partition:0,offset:42, connector.topic=MyTable, 
connector.version=0.11}>

 

The root cause of this order issue is regarding two HashMap variables 
initialized:

this.specificOffsets = new HashMap<>(); 
(flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/table/descriptors/Kafka.java)

final Map offsets = new 
HashMap<>();(flink-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/table/descriptors/KafkaTest.java)

And the iteration of HashMap is here:

for (Map.Entry specificOffset : specificOffsets.entrySet())  (in 
function _toConnectorProperties_  __ at _Kafka.java_)

The specification about HashMap says that "this class makes no guarantees as to 
the order of the map; in particular, it does not guarantee that the order will 
remain constant over time". The documentation is here for your reference: 
https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html

 

If I change the two HashMap vairables into LinkedHashMap,the failure will be 
removed, making the test more stable. If you come up with better fix, please be 
free to discuss with me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Introduce flink-connector-hive-xx modules

2020-03-05 Thread Bowen Li
Hi Jingsong,

I think I misunderstood you. So your argument is that, to support hive
1.0.0 - 1.2.2, we are actually using Hive 1.2.2 and thus we name the flink
module as "flink-connector-hive-1.2", right? It makes sense to me now.

+1 for this change.

Cheers,
Bowen

On Thu, Mar 5, 2020 at 6:53 PM Jingsong Li  wrote:

> Hi Bowen,
>
> My idea is to directly provide the really dependent version, such as hive
> 1.2.2, our jar name is hive 1.2.2, so that users can directly and clearly
> know the version. As for which metastore is supported, we can guide it in
> the document, otherwise, write 1.0, and the result version is indeed 1.2.2,
> which will make users have wrong expectations.
>
> Another, maybe 2.3.6 can support 2.0-2.2 after some efforts.
>
> Best,
> Jingsong Lee
>
> On Fri, Mar 6, 2020 at 1:00 AM Bowen Li  wrote:
>
> > > I have some hesitation, because the actual version number can better
> > reflect the actual dependency. For example, if the user also knows the
> > field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> > name, or he may have the wrong expectation for the hive built-in
> functions.
> >
> > Sorry, I'm not sure if my proposal is understood correctly.
> >
> > What I'm saying is, in your original proposal, taking an example,
> suggested
> > naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 -
> > 1.2.2, a name including the highest Hive version it supports. I'm
> > suggesting to name it "flink-connector-hive-1.0", a name including the
> > lowest Hive version it supports.
> >
> > What do you think?
> >
> >
> >
> > On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li 
> > wrote:
> >
> > > Hi Bowen, thanks for your reply.
> > >
> > > > will there be a base module like "flink-connector-hive-base" which
> > holds
> > > all the common logic of these proposed modules
> > >
> > > Maybe we don't need, their implementation is only "pom.xml". Different
> > > versions have different dependencies.
> > >
> > > > it's more common to set the version in module name to be the lowest
> > > version that this module supports
> > >
> > > I have some hesitation, because the actual version number can better
> > > reflect the actual dependency. For example, if the user also knows the
> > > field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> > > name, or he may have the wrong expectation for the hive built-in
> > functions.
> > >
> > > [1] https://github.com/apache/flink/pull/11304
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Mar 5, 2020 at 2:34 PM Bowen Li  wrote:
> > >
> > > > Thanks Jingsong for your explanation! I'm +1 for this initiative.
> > > >
> > > > According to your description, I think it makes sense to incorporate
> > > > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of
> > ranges
> > > to
> > > > 4.
> > > >
> > > > A couple minor followup questions:
> > > > 1) will there be a base module like "flink-connector-hive-base" which
> > > holds
> > > > all the common logic of these proposed modules and is compiled into
> the
> > > > uber jar of "flink-connector-hive-xxx"?
> > > > 2) according to my observation, it's more common to set the version
> in
> > > > module name to be the lowest version that this module supports, e.g.
> > for
> > > > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> > > > rather than "flink-connector-hive-1.2"
> > > >
> > > >
> > > > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li 
> > > > wrote:
> > > >
> > > > > Thanks Bowen for involving.
> > > > >
> > > > > > why you proposed segregating hive versions into the 5 ranges
> > above? &
> > > > > what different Hive features are supported in the 5 ranges?
> > > > >
> > > > > For only higher client dependencies version support lower hive
> > > metastore
> > > > > versions:
> > > > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column
> > stats,
> > > > we
> > > > > can throw exception for the unsupported feature.
> > > > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition
> api
> > > > > change.
> > > > > - Hive 2.2 no thrift change.
> > > > > - Hive 2.3 change many things, lots of thrift change.
> > > > > - Hive 3+, not null. unique, timestamp, so many things.
> > > > >
> > > > > All these things can be found in hive_metastore.thrift.
> > > > >
> > > > > I think I can try do more effort in implementation to use Hive 2.2
> to
> > > > > support Hive 2.0. So the range size will be 4.
> > > > >
> > > > > > have you tested that whether the proposed corresponding Flink
> > module
> > > > will
> > > > > be fully compatible with each Hive version range?
> > > > >
> > > > > Yes, I have done some tests, not really for "fully", but it is a
> > > > technical
> > > > > judgment.
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li 
> wrote:
> > > > >
> > > > > > Thanks, Jingsong, for bringing this up. We've received lots of
> > > > feedbacks
> > > > > in
> > > > > >

Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Zhijiang
Hi Jark,

Thanks for the further investigation. 

If the bug of missing author can be solved by Github soon, I am generally 
neutral to disable "Squash and merge" button, even somehow preferring to keep 
it because it could bring a bit benefits sometimes and some committers are 
willing to rely on it.

My previously mentioned other side effects is not serious and can still work 
around.

Best
Zhijiang


--
From:Dian Fu 
Send Time:2020 Mar. 6 (Fri.) 12:31
To:dev 
Subject:Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on 
GitHub

Hi Jark,

Thanks for starting this discussion. Personally I also love the "squash and 
merge" button. It's very convenient.

Regarding to the email address "noreply", it seems that there are two cases:
- The email address in the original commit is already "noreply". In this case, 
this issue will still exist even if the PR is merged via command line, e.g. [1].
- The email address in the original commit is correct and it becomes "noreply" 
when merged via web page button because the author has not correctly set the 
commit email address[2] in his personal github setting, e.g.[3]. In this case, 
it's indeed a problem. However, I have checked that there are only 75 such kind 
of commits out of 5375 commits since Jan 1, 2019. So maybe it's acceptable 
compared to the benefits we could gain.

Regards,
Dian

[1] 
https://github.com/apache/flink/commit/c4db7052c78d6b8204170e17a80a2416fa760523 

[2] 
https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/adding-an-email-address-to-your-github-account
 

[3] 
https://github.com/apache/flink/commit/9b5232d79a945607a83b02b0025b3206b06c27bd 

> 在 2020年3月6日,下午12:18,Jark Wu  写道:
> 
> Hi Stephan,
> 
>> noreply email address.
> I investigated this and found some x...@users.noreply.github.com address. I
> think that's because they enabled "kepp email addresses private" on GitHub
> [1].
> 
>> Don't most PRs consist anyways of multiple commits where we want to
> preserve "refactor" and "feature" differentiation in the history, rather
> than squash everything?
> For multiple commits, GitHub provides another button called "rebase and
> merge" which is mentioned by Piotr. But I usually operate in local if want
> to preserve multiple commits.
> 
> It seems that GitHub is fixing it in 24 hours:
> https://twitter.com/yadong_xie/status/1235554461256302593
> 
> Best,
> Jark
> 
> [1]:
> https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/setting-your-commit-email-address
> 
> On Fri, 6 Mar 2020 at 10:05, Jingsong Li  wrote:
> 
>> Hi,
>> 
>> I agree with Jark. The tool is useful. If there are some problem, I think
>> we can reach an agreement to form certain terms?
>> 
>> Github provides:
>> - "rebase and merge" keep all commits.
>> - "squash and merge" squash all commits to one commits, pull request
>> authors used to be multiple commits, like "address comments", "Fix
>> comments", "Fix checkstyle". I think we can help authors to squash these
>> useless commits.
>> 
>> Best,
>> Jingsong Lee
>> 
>> On Fri, Mar 6, 2020 at 4:46 AM Matthias J. Sax  wrote:
>> 
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA512
>>> 
>>> Seems, this will be fixed today:
>>> 
>>> https://twitter.com/natfriedman/status/1235613840659767298?s=19
>>> 
>>> 
>>> - -Matthias
>>> 
>>> On 3/5/20 8:37 AM, Stephan Ewen wrote:
 It looks like this feature still messes up email addresses, for
 example if you do a "git log | grep noreply" in the repo.
 
 Don't most PRs consist anyways of multiple commits where we want
 to preserve "refactor" and "feature" differentiation in the
 history, rather than squash everything?
 
 On Thu, Mar 5, 2020 at 4:54 PM Piotr Nowojski 
 wrote:
 
> Hi,
> 
> If it’s really not preserving ownership (I didn’t notice the
> problem before), +1 for removing “squash and merge”.
> 
> However -1 for removing “rebase and merge”. I didn’t see any
> issues with it and I’m using it constantly.
> 
> Piotrek
> 
>> On 5 Mar 2020, at 16:40, Jark Wu  wrote:
>> 
>> Hi all,
>> 
>> Thanks for the feedbacks. But I want to clarify the motivation
>> to disable "Squash and merge" is just because of the
>> regression/bug of the missing author information. If GitHub
>> fixes this later, I think it makes sense to bring this button
>> back.
>> 
>> Hi Stephan & Zhijiang,
>> 
>> To be honest, I love the "Squash and merge" button and often
>> use it. It saves me a lot of time to merge PRs, because pulling
>> and pushing commits

[jira] [Created] (FLINK-16454) Update the copyright year in NOTICE files

2020-03-05 Thread Zhijiang (Jira)
Zhijiang created FLINK-16454:


 Summary: Update the copyright year in NOTICE files
 Key: FLINK-16454
 URL: https://issues.apache.org/jira/browse/FLINK-16454
 Project: Flink
  Issue Type: Task
  Components: Release System
 Environment: The current copyright year is 2014-2019 in NOTICE files. 
We should change it to 2014-2020.
Reporter: Zhijiang
Assignee: Zhijiang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16455) Introduce flink-sql-connector-hive modules to provide hive uber jars

2020-03-05 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-16455:


 Summary: Introduce flink-sql-connector-hive modules to provide 
hive uber jars
 Key: FLINK-16455
 URL: https://issues.apache.org/jira/browse/FLINK-16455
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: 1.11.0


Discussed in: 
[http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-flink-connector-hive-xx-modules-td38440.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: SerializableHadoopConfiguration

2020-03-05 Thread Sivaprasanna
That also makes sense but that, I believe, would be a breaking/major
change. If we are okay with merging them together, we can name something
like "flink-hadoop-compress" since SequenceFile is also a Hadoop format and
the existing "flink-compress" module, as of now, deals with Hadoop based
compression.

On Fri, Mar 6, 2020 at 1:33 AM João Boto  wrote:

> We could merge the two modules into one?
> sequence-files its another way of compressing files..
>
>
> On 2020/03/05 13:02:46, Sivaprasanna  wrote:
> > Hi Stephen,
> >
> > I guess it is a valid point to have something like 'flink-hadoop-utils'.
> > Maybe a [DISCUSS] thread can be started to understand what the community
> > thinks?
> >
> > On Thu, Mar 5, 2020 at 4:22 PM Stephan Ewen  wrote:
> >
> > > Do we have more cases of "common Hadoop Utils"?
> > >
> > > If yes, does it make sense to create a "flink-hadoop-utils" module with
> > > exactly such classes? It would have an optional dependency on
> > > "flink-shaded-hadoop".
> > >
> > > On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann 
> wrote:
> > >
> > > > Hi Sivaprasanna,
> > > >
> > > > we don't upload the source jars for the flink-shaded modules.
> However you
> > > > can build them yourself and install by cloning the flink-shaded
> > > repository
> > > > [1] and then call `mvn package -Dshade-sources`.
> > > >
> > > > [1] https://github.com/apache/flink-shaded
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna <
> sivaprasanna...@gmail.com>
> > > > wrote:
> > > >
> > > > > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if
> any
> > > > Flink
> > > > > module is going to use Hadoop in any way, it will most probably
> include
> > > > > flink-shaded-hadoop-2 as a dependency.
> > > > > However, flink-shaded modules don't have any source files. Is that
> a
> > > > strict
> > > > > convention that the community follows?
> > > > >
> > > > > -
> > > > > Sivaprasanna
> > > > >
> > > > > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <
> > > sivaprasanna...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Arvid,
> > > > > >
> > > > > > Thanks for the quick reply. Yes, it actually makes sense to avoid
> > > > Hadoop
> > > > > > dependencies from getting into Flink's core modules but I also
> wonder
> > > > if
> > > > > it
> > > > > > will be an overkill to add flink-hadoop-fs as a dependency just
> > > because
> > > > > we
> > > > > > want to use a utility class from that module.
> > > > > >
> > > > > > -
> > > > > > Sivaprasanna
> > > > > >
> > > > > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise 
> > > > wrote:
> > > > > >
> > > > > >> Hi Sivaprasanna,
> > > > > >>
> > > > > >> we actually want to remove Hadoop from all core modules, so we
> could
> > > > not
> > > > > >> place it in some very common place like flink-core.
> > > > > >>
> > > > > >> But I think the module flink-hadoop-fs could be a fitting place.
> > > > > >>
> > > > > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
> > > > sivaprasanna...@gmail.com
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi
> > > > > >> >
> > > > > >> > The flink-sequence-file module has a class named
> > > > > >> > SerializableHadoopConfiguration[1] which is nothing but a
> wrapper
> > > > > class
> > > > > >> for
> > > > > >> > Hadoop Configuration. I believe this class can be moved to a
> > > common
> > > > > >> module
> > > > > >> > since this is not necessarily tightly coupled with
> sequence-file
> > > > > module,
> > > > > >> > and also because it can be used by many other modules, for ex.
> > > > > >> > flink-compress. Thoughts?
> > > > > >> >
> > > > > >> > -
> > > > > >> > Sivaprasanna
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Kurt Young
These Github buttons sometimes can help me merge commits when the network
from China to Github is unstable. It would take me so long to fetch and
reorganize
commits locally, and fetch master, doing some rebase and then push. Each
step
is time consuming when network situation is bad.

So I would like to keep these buttons and leave the choice to individual
committers.

Best,
Kurt


On Fri, Mar 6, 2020 at 1:15 PM Zhijiang 
wrote:

> Hi Jark,
>
> Thanks for the further investigation.
>
> If the bug of missing author can be solved by Github soon, I am generally
> neutral to disable "Squash and merge" button, even somehow preferring to
> keep it because it could bring a bit benefits sometimes and some committers
> are willing to rely on it.
>
> My previously mentioned other side effects is not serious and can still
> work around.
>
> Best
> Zhijiang
>
>
> --
> From:Dian Fu 
> Send Time:2020 Mar. 6 (Fri.) 12:31
> To:dev 
> Subject:Re: [DISCUSS] Disable "Squash and merge" button for Flink
> repository on GitHub
>
> Hi Jark,
>
> Thanks for starting this discussion. Personally I also love the "squash
> and merge" button. It's very convenient.
>
> Regarding to the email address "noreply", it seems that there are two
> cases:
> - The email address in the original commit is already "noreply". In this
> case, this issue will still exist even if the PR is merged via command
> line, e.g. [1].
> - The email address in the original commit is correct and it becomes
> "noreply" when merged via web page button because the author has not
> correctly set the commit email address[2] in his personal github setting,
> e.g.[3]. In this case, it's indeed a problem. However, I have checked that
> there are only 75 such kind of commits out of 5375 commits since Jan 1,
> 2019. So maybe it's acceptable compared to the benefits we could gain.
>
> Regards,
> Dian
>
> [1]
> https://github.com/apache/flink/commit/c4db7052c78d6b8204170e17a80a2416fa760523
> <
> https://github.com/apache/flink/commit/c4db7052c78d6b8204170e17a80a2416fa760523
> >
> [2]
> https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/adding-an-email-address-to-your-github-account
> <
> https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/adding-an-email-address-to-your-github-account
> >
> [3]
> https://github.com/apache/flink/commit/9b5232d79a945607a83b02b0025b3206b06c27bd
> <
> https://github.com/apache/flink/commit/9b5232d79a945607a83b02b0025b3206b06c27bd
> >
> > 在 2020年3月6日,下午12:18,Jark Wu  写道:
> >
> > Hi Stephan,
> >
> >> noreply email address.
> > I investigated this and found some x...@users.noreply.github.com
> address. I
> > think that's because they enabled "kepp email addresses private" on
> GitHub
> > [1].
> >
> >> Don't most PRs consist anyways of multiple commits where we want to
> > preserve "refactor" and "feature" differentiation in the history, rather
> > than squash everything?
> > For multiple commits, GitHub provides another button called "rebase and
> > merge" which is mentioned by Piotr. But I usually operate in local if
> want
> > to preserve multiple commits.
> >
> > It seems that GitHub is fixing it in 24 hours:
> > https://twitter.com/yadong_xie/status/1235554461256302593
> >
> > Best,
> > Jark
> >
> > [1]:
> >
> https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/setting-your-commit-email-address
> >
> > On Fri, 6 Mar 2020 at 10:05, Jingsong Li  wrote:
> >
> >> Hi,
> >>
> >> I agree with Jark. The tool is useful. If there are some problem, I
> think
> >> we can reach an agreement to form certain terms?
> >>
> >> Github provides:
> >> - "rebase and merge" keep all commits.
> >> - "squash and merge" squash all commits to one commits, pull request
> >> authors used to be multiple commits, like "address comments", "Fix
> >> comments", "Fix checkstyle". I think we can help authors to squash these
> >> useless commits.
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Fri, Mar 6, 2020 at 4:46 AM Matthias J. Sax 
> wrote:
> >>
> >>> -BEGIN PGP SIGNED MESSAGE-
> >>> Hash: SHA512
> >>>
> >>> Seems, this will be fixed today:
> >>>
> >>> https://twitter.com/natfriedman/status/1235613840659767298?s=19
> >>>
> >>>
> >>> - -Matthias
> >>>
> >>> On 3/5/20 8:37 AM, Stephan Ewen wrote:
>  It looks like this feature still messes up email addresses, for
>  example if you do a "git log | grep noreply" in the repo.
> 
>  Don't most PRs consist anyways of multiple commits where we want
>  to preserve "refactor" and "feature" differentiation in the
>  history, rather than squash everything?
> 
>  On Thu, Mar 5, 2020 at 4:54 PM Piotr Nowojski 
>  wrote:
> 
> > Hi,
> >
> > If it’s really not preserving ownership (I didn’t notice the
> > problem before), +1 for removing “squash and merge”.
> >
> > However -1 for removing “rebase and merge”. I didn’t s

Re: [PROPOSAL] Reverse the dependency from flink-streaming-java to flink-client

2020-03-05 Thread Hequn Cheng
Hi,

+1 to make flink-streaming-java an API only module and solve it sooner
rather than later.
It would be more clear to only expose an SDK module for writing jobs.

Another benefit I can see is: the flink-streaming-java would be scala-free
if we reverse the dependencies and this would be really nice for the Java
API module.

As for the issue of dependencies setup of users, I agree with Stephan that
it's ok to do so
if we add corresponding document and runtime error messages about the
changes.

Best,
Hequn


On Fri, Mar 6, 2020 at 3:03 AM Kostas Kloudas  wrote:

> Big +1 also from my side.
>
> This will eliminate some work-arounds used so far to bypass the module
> structure (like code using reflection to extract a JobGraph from a
> Pipeline).
>
> I agree with Stephan that with proper documentation, release notes and
> tooling update, it will hopefully not be a big hassle for users to
> migrate.
> Also I think it should be done as early in the release as possible, so
> that we can give it enough exposure and testing. In the past, such
> deep changes late in the release have led to longer release-testing
> periods and, eventually, longer release cycles.
>
> Cheers,
> Kostas
>
> On Thu, Mar 5, 2020 at 3:35 PM Stephan Ewen  wrote:
> >
> > +1 to this fix, in general.
> >
> > If the main issue is that users have to now add "flink-clients"
> explicitly,
> > then I think this is okay, if we spell it out prominently in the release
> > notes, and make sure quickstarts / etc are updated, and have a good error
> > message when client/runtime classes are not found.
> >
> > On Thu, Mar 5, 2020 at 2:56 PM Aljoscha Krettek 
> wrote:
> >
> > > Hi,
> > >
> > > thanks for starting the discussion, Tison!
> > >
> > > I'd like to fix this dependency mess rather sooner than later, but we
> do
> > > have to consider the fact that we are breaking the dependency setup of
> > > users. If they they only had a dependency on flink-streaming-java
> before
> > > but used classes from flink-clients they would have to explicitly add
> > > this dependency now.
> > >
> > > Let's see what others think.
> > >
> > > Best,
> > > Aljoscha
> > >
> > > On 05.03.20 02:53, tison wrote:
> > > > Hi devs,
> > > >
> > > > Here is a proposal to reverse the dependency from
> flink-streaming-java to
> > > > flink-client, for a proper
> > > > module dependency graph. Since it changes current structure, it
> should be
> > > > discussed publicly.
> > > >
> > > > The original idea comes from that flink-streaming-java acts as an API
> > > only
> > > > module just as what
> > > > we do in its batch companion flink-java. If a Flink user want to
> write a
> > > > minimum DataStream
> > > > program, the only dependency should be flink-streaming java.
> > > >
> > > > However, currently as it is implemented, flink-client and even
> > > > flink-runtime are transitively polluted
> > > > in when user depends on flink-streaming-java. These dependencies
> polluted
> > > > in as
> > > >
> > > > flink-client:
> > > >- previously, ClusterClient, which is removed by FLIP-73 Executors
> > > >- accidentally, ProgramInvocationException, we just throw in
> place as
> > > it
> > > > is accessible.
> > > >- transitively, flink-optimizer, for one utility.
> > > >- transitively, flink-java, for several utilities.
> > > > flink-runtime:
> > > >- mainly for JobGraph generating.
> > > >
> > > > With a previous discussion with @Aljoscha Krettek <
> aljos...@apache.org>
> > > our
> > > > goal is briefly making flink-streaming-java
> > > > an API only module. As a first step we can break the dependency from
> > > > flink-streaming-java to
> > > > flink-client[1][2].
> > > >
> > > > With this first step, continuously we factor out common utilities in
> > > > flink-java to
> > > > flink-core and eventually eliminate dependencies from streaming to
> batch;
> > > > while
> > > > orthogonally, we factor out job compilation logic into
> > > > flink-streaming-compiler module and
> > > > break the dependency to flink-runtime. The final dependency graph
> will
> > > be:
> > > >
> > > >
> > > > flink-client -> flink-streaming-compiler -> flink-runtime
> > > >   \->
> > > > flink-streaming-java
> > > >
> > > > Looking forward to your feedback. Basically whether or not it is in a
> > > right
> > > > direction, and if so,
> > > > how the community integrates this proposal.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-15090
> > > > [2] https://issues.apache.org/jira/browse/FLINK-16427
> > > >
> > >
>


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Piotr Nowojski
Hi,

> It looks like this feature still messes up email addresses, for example if
> you do a "git log | grep noreply" in the repo.

I’ve checked my appearences on that list (git log | grep noreply) and they 
happened couple of times, when I actually used squash and merge (I wanted to 
squash fixup commits from within the UI) instead of rebase and merge. I still 
think rebase and merge is working as expected, without altering the the author. 
Otherwise there would be no contributions from Roman/Arvid in the log and they 
would be marked as "pnowoj...@users.noreply.github.com 
” as well, and they are not.

So I’m restating (very strong from my side) -1 for removing rebase and merge.

Piotrek 

> On 6 Mar 2020, at 06:13, Zhijiang  wrote:
> 
> Hi Jark,
> 
> Thanks for the further investigation. 
> 
> If the bug of missing author can be solved by Github soon, I am generally 
> neutral to disable "Squash and merge" button, even somehow preferring to keep 
> it because it could bring a bit benefits sometimes and some committers are 
> willing to rely on it.
> 
> My previously mentioned other side effects is not serious and can still work 
> around.
> 
> Best
> Zhijiang
> 
> 
> --
> From:Dian Fu 
> Send Time:2020 Mar. 6 (Fri.) 12:31
> To:dev 
> Subject:Re: [DISCUSS] Disable "Squash and merge" button for Flink repository 
> on GitHub
> 
> Hi Jark,
> 
> Thanks for starting this discussion. Personally I also love the "squash and 
> merge" button. It's very convenient.
> 
> Regarding to the email address "noreply", it seems that there are two cases:
> - The email address in the original commit is already "noreply". In this 
> case, this issue will still exist even if the PR is merged via command line, 
> e.g. [1].
> - The email address in the original commit is correct and it becomes 
> "noreply" when merged via web page button because the author has not 
> correctly set the commit email address[2] in his personal github setting, 
> e.g.[3]. In this case, it's indeed a problem. However, I have checked that 
> there are only 75 such kind of commits out of 5375 commits since Jan 1, 2019. 
> So maybe it's acceptable compared to the benefits we could gain.
> 
> Regards,
> Dian
> 
> [1] 
> https://github.com/apache/flink/commit/c4db7052c78d6b8204170e17a80a2416fa760523
>  
> 
> [2] 
> https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/adding-an-email-address-to-your-github-account
>  
> 
> [3] 
> https://github.com/apache/flink/commit/9b5232d79a945607a83b02b0025b3206b06c27bd
>  
> 
>> 在 2020年3月6日,下午12:18,Jark Wu  写道:
>> 
>> Hi Stephan,
>> 
>>> noreply email address.
>> I investigated this and found some x...@users.noreply.github.com address. I
>> think that's because they enabled "kepp email addresses private" on GitHub
>> [1].
>> 
>>> Don't most PRs consist anyways of multiple commits where we want to
>> preserve "refactor" and "feature" differentiation in the history, rather
>> than squash everything?
>> For multiple commits, GitHub provides another button called "rebase and
>> merge" which is mentioned by Piotr. But I usually operate in local if want
>> to preserve multiple commits.
>> 
>> It seems that GitHub is fixing it in 24 hours:
>> https://twitter.com/yadong_xie/status/1235554461256302593
>> 
>> Best,
>> Jark
>> 
>> [1]:
>> https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/setting-your-commit-email-address
>> 
>> On Fri, 6 Mar 2020 at 10:05, Jingsong Li  wrote:
>> 
>>> Hi,
>>> 
>>> I agree with Jark. The tool is useful. If there are some problem, I think
>>> we can reach an agreement to form certain terms?
>>> 
>>> Github provides:
>>> - "rebase and merge" keep all commits.
>>> - "squash and merge" squash all commits to one commits, pull request
>>> authors used to be multiple commits, like "address comments", "Fix
>>> comments", "Fix checkstyle". I think we can help authors to squash these
>>> useless commits.
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>> On Fri, Mar 6, 2020 at 4:46 AM Matthias J. Sax  wrote:
>>> 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512
 
 Seems, this will be fixed today:
 
 https://twitter.com/natfriedman/status/1235613840659767298?s=19
 
 
 - -Matthias
 
 On 3/5/20 8:37 AM, Stephan Ewen wrote:
> It looks like this feature still messes up email addresses, for
> example if you do a "git log | grep noreply" in the repo.
> 
> Don't most PRs consist anyways of multiple commits where we want
> to preserve "refactor" and "feature" differentiation in the
> history, rather th