date:20190825

[jira] [Created] (FLINK-13843) Unify and clean up StreamingFileSink format builders

2019-08-25 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-13843:
--

 Summary: Unify and clean up StreamingFileSink format builders
 Key: FLINK-13843
 URL: https://issues.apache.org/jira/browse/FLINK-13843
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream, Connectors / FileSystem
Affects Versions: 1.10.0
Reporter: Gyula Fora


I think the StreamingFileSink contains some problems that will affect us in the 
long-run if we intend this sink to be the main exactly-once FS sink.

*1. Code duplication*

The StreamingFileSink currently has 2 builders for row and bulk formats:

RowFormatBuilder, BulkFormatBuilder

They both contain almost exactly the same config settings with a lot of code 
duplication that should be moved to a common superclass 
(StreamingFileSink.BucketsBuilder). 

*2. Inconsistent config options*

I also noticed some strange/invalid configuration settings for the builders:

 - RowFormatBuilder#withBucketAssignerAndPolicy : feels like an internal method 
that is not used anywhere. It also overwrites the bucket factory

- BulkFormatBuilder#withBucketAssigner : takes an extra type parameter compared 
to the row format for the bucket ID type

-  BulkFormatBuilder#withBucketCheckInterval : does not affect behavior as it 
always uses the OnCheckpointRollingPolicy

This can probably solved by fixing the code duplication

*3. Fragmented configuration*

This is not a big problem but only affects the part file config options that 
were introduced recently. We have added 2 methods: withPartFilePrefix and 
withPartFileSuffix

I think we should aim to group configs that belong together -> 
withPartFileConfig

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13844) AtOmXpLuS

2019-08-25 Thread KORAY DURGUT (Jira)

KORAY DURGUT  created FLINK-13844:
-

 Summary: AtOmXpLuS
 Key: FLINK-13844
 URL: https://issues.apache.org/jira/browse/FLINK-13844
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Google Cloud PubSub
Affects Versions: 1.9.0
Reporter: KORAY DURGUT 
 Fix For: shaded-8.0


Everything in AtOmXpLuS.CoM



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13845) Drop all the content of removed "Checkpointed" interface

2019-08-25 Thread Yun Tang (Jira)

Yun Tang created FLINK-13845:


 Summary: Drop all the content of removed "Checkpointed" interface
 Key: FLINK-13845
 URL: https://issues.apache.org/jira/browse/FLINK-13845
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Yun Tang
 Fix For: 1.10.0


>From [FLINK-7461|https://issues.apache.org/jira/browse/FLINK-7461], we have 
>already removed the backward compatibility before Flink-1.1 and the deprecated 
>{{Checkpointed}} interface has been totally removed. However, we still have 
>many contents including java docs, documentation talked about this 
>non-existing interface. I think it's time to remove these contents now.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13846) Implement benchmark case on MapState#isEmpty

2019-08-25 Thread Yun Tang (Jira)

Yun Tang created FLINK-13846:


 Summary: Implement benchmark case on MapState#isEmpty
 Key: FLINK-13846
 URL: https://issues.apache.org/jira/browse/FLINK-13846
 Project: Flink
  Issue Type: Improvement
Reporter: Yun Tang


If FLINK-13034 merged, we need to implement benchmark case on 
{{MapState#isEmpty} in https://github.com/dataArtisans/flink-benchmarks



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [DISCUSS] Enhance Support for Multicast Communication Pattern

2019-08-25 Thread SHI Xiaogang

Hi all,

I also think that multicasting is a necessity in Flink, but more details
are needed to be considered.

Currently network is tightly coupled with states in Flink to achieve
automatic scaling. We can only access keyed states in keyed streams and
operator states in all streams.
In the concrete example of theta-joins implemented with mutlticasting, the
following questions exist:

   - In which type of states will the data be stored? Do we need another
   type of states which is coupled with multicasting streams?
   - How to ensure the consistency between network and states when jobs
   scale out or scale in?

Regards,
Xiaogang

Xingcan Cui  于2019年8月25日周日 上午10:03写道：

> Hi all,
>
> Sorry for joining this thread late. Basically, I think enabling multicast
> pattern could be the right direction, but more detailed implementation
> policies need to be discussed.
>
> Two years ago, I filed an issue [1] about the multicast API. However, due
> to some reasons, it was laid aside. After that, when I tried to cherry-pick
> the change for experimental use, I found the return type of
> `selectChannels()` method had changed from `int[]` to `int`, which makes
> the old implementation not work anymore.
>
> From my side, the multicast has always been used for theta-join. As far as
> I know, it’s an essential requirement for some sophisticated joining
> algorithms. Until now, the Flink non-equi joins can still only be executed
> single-threaded. If we'd like to make some improvements on this, we should
> first take some measures to support multicast pattern.
>
> Best,
> Xingcan
>
> [1] https://issues.apache.org/jira/browse/FLINK-6936
>
> > On Aug 24, 2019, at 5:54 AM, Zhu Zhu  wrote:
> >
> > Hi Piotr,
> >
> > Thanks for the explanation.
> > Agreed that the broadcastEmit(record) is a better choice for broadcasting
> > for the iterations.
> > As broadcasting for the iterations is the first motivation, let's support
> > it first.
> >
> > Thanks,
> > Zhu Zhu
> >
> > Yun Gao  于2019年8月23日周五 下午11:56写道：
> >
> >> Hi Piotr,
> >>
> >>  Very thanks for the suggestions!
> >>
> >> Totally agree with that we could first focus on the broadcast
> >> scenarios and exposing the broadcastEmit method first considering the
> >> semantics and performance.
> >>
> >> For the keyed stream, I also agree with that broadcasting keyed
> >> records to all the tasks may be confused considering the semantics of
> keyed
> >> partitioner. However, in the iteration case supporting broadcast over
> keyed
> >> partitioner should be required since users may create any subgraph for
> the
> >> iteration body, including the operators with key. I think a possible
> >> solution to this issue is to introduce another data type for
> >> 'broadcastEmit'. For example, for an operator Operator, it may
> broadcast
> >> emit another type E instead of T, and the transmitting E will bypass the
> >> partitioner and setting keyed context. This should result in the design
> to
> >> introduce customized operator event (option 1 in the document). The
> cost of
> >> this method is that we need to introduce a new type of StreamElement and
> >> new interface for this type, but it should be suitable for both keyed or
> >> non-keyed partitioner.
> >>
> >> Best,
> >> Yun
> >>
> >>
> >>
> >> --
> >> From:Piotr Nowojski 
> >> Send Time:2019 Aug. 23 (Fri.) 22:29
> >> To:Zhu Zhu 
> >> Cc:dev ; Yun Gao 
> >> Subject:Re: [DISCUSS] Enhance Support for Multicast Communication
> Pattern
> >>
> >> Hi,
> >>
> >> If the primary motivation is broadcasting (for the iterations) and we
> have
> >> no immediate need for multicast (cross join), I would prefer to first
> >> expose broadcast via the DataStream API and only later, once we finally
> >> need it, support multicast. As I wrote, multicast would be more
> challenging
> >> to implement, with more complicated runtime and API. And re-using
> multicast
> >> just to support broadcast doesn’t have much sense:
> >>
> >> 1. It’s a bit obfuscated. It’s easier to understand
> >> collectBroadcast(record) or broadcastEmit(record) compared to some
> >> multicast channel selector that just happens to return all of the
> channels.
> >> 2. There are performance benefits of explicitly calling
> >> `RecordWriter#broadcastEmit`.
> >>
> >>
> >> On a different note, what would be the semantic of such broadcast emit
> on
> >> KeyedStream? Would it be supported? Or would we limit support only to
> the
> >> non-keyed streams?
> >>
> >> Piotrek
> >>
> >>> On 23 Aug 2019, at 12:48, Zhu Zhu  wrote:
> >>>
> >>> Thanks Piotr,
> >>>
> >>> Users asked for this feature sometimes ago when they migrating batch
> >> jobs to Flink(Blink).
> >>> It's not very urgent as they have taken some workarounds to solve
> >> it.(like partitioning data set to different job vertices)
> >>> So it's fine to not make it top priority.
> >>>
> >>> Anyway, as a commonly known scenario, I think users can benefit from
>

Re: [DISCUSS] Add ARM CI build to Flink (information-only)

2019-08-25 Thread Xiyuan Wang

Thanks for Stephan to bring up this topic.

The package build jobs work well now. I have a simple online demo which is
built and ran on a ARM VM. Feel free to have a try[1].

As the first step for ARM support, maybe it's good to add them now.

While for the next step, the test part is still broken. It relates to some
points we find:

1. Some unit tests are failed[1] by Java coding. These kind of failure can
be fixed easily.
2. Some tests are failed by depending on third part libaraies[2]. It
includes frocksdb, MapR Client and Netty. They don't have ARM release.
a. Frocksdb: I'm testing it locally now by `make check_some` and `make
jtest` similar with its travis job. There are 3 tests failed by `make
check_some`. Please see the ticket for more details. Once the test pass,
frocksdb can release ARM package then.
b. MapR Client. This belongs to MapR company. At this moment, maybe we
should skip MapR support for Flink ARM.
c. Netty. Actually Netty runs well on our ARM machine. We will ask
Netty community to release ARM support. If they do not want, OpenLab will
handle a Maven Repository for some common libraries on ARM.

For Chesnay's concern:

Firstly, OpenLab team will keep maintaining and fixing ARM CI. It means
that once build or test fails, we'll fix it at once.
Secondly,  OpenLab can provide ARM VMs to everyone for reproducing and
testing. You just need to creat a  Test Request issue in openlab[1]. Then
we'll create ARM VMs for you, you can  login and do the thing you want.

Does it make sense?

[1]: http://114.115.168.52:8081/#/overview
[1]: https://issues.apache.org/jira/browse/FLINK-13449
  https://issues.apache.org/jira/browse/FLINK-13450
[2]: https://issues.apache.org/jira/browse/FLINK-13598
[3]: https://github.com/theopenlab/openlab/issues/new/choose

Chesnay Schepler  于2019年8月24日周六 上午12:10写道：

> I'm wondering what we are supposed to do if the build fails?
> We aren't providing and guides on setting up an arm dev environment; so
> reproducing it locally isn't possible.
>
> On 23/08/2019 17:55, Stephan Ewen wrote:
> > Hi all!
> >
> > As part of the Flink on ARM effort, there is a pull request that
> triggers a
> > build on OpenLabs CI for each push and runs tests on ARM machines.
> >
> > Currently that build is roughly equivalent to what the "core" and "tests"
> > profiles do on Travis.
> > The result will be posted to the PR comments, similar to the Flink Bot's
> > Travis build result.
> > The build currently passes :-) so Flink seems to be okay on ARM.
> >
> > My suggestion would be to try and add this and gather some experience
> with
> > it.
> > The Travis build results should be our "ground truth" and the ARM CI
> > (openlabs CI) would be "informational only" at the beginning, but helping
> > us understand when we break ARM support.
> >
> > You can see this in the PR that adds the openlabs CI config:
> > https://github.com/apache/flink/pull/9416
> >
> > Any objections?
> >
> > Best,
> > Stephan
> >
>
>

Re: [DISCUSS] Use Java's Duration instead of Flink's Time

2019-08-25 Thread Jark Wu

+1 to use Java's Duration instead of Flink's Time.

Regarding to the Duration parsing, we have mentioned this in FLIP-54[1] to
use `org.apache.flink.util.TimeUtils` for the parsing.

Best,
Jark

[1]:
https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit#heading=h.egdwkc93dn1k

On Sat, 24 Aug 2019 at 18:24, Zhu Zhu  wrote:

> +1 since Java Duration is more common and powerful than Flink Time.
>
> For whether to drop scala Duration for parsing duration OptionConfig, I
> think it's another question and should be discussed in another thread.
>
> Thanks,
> Zhu Zhu
>
> Becket Qin  于2019年8月24日周六 下午4:16写道：
>
> > +1, makes sense. BTW, we probably need a FLIP as this is a public API
> > change.
> >
> > On Sat, Aug 24, 2019 at 8:11 AM SHI Xiaogang 
> > wrote:
> >
> > > +1 to replace Flink's time with Java's Duration.
> > >
> > > Besides, i also suggest to use Java's Instant for "point-in-time".
> > > It can take care of time units when we calculate Duration between
> > different
> > > instants.
> > >
> > > Regards,
> > > Xiaogang
> > >
> > > Zili Chen  于2019年8月24日周六 上午10:45写道：
> > >
> > > > Hi vino,
> > > >
> > > > I agree that it introduces extra complexity to replace
> Duration(Scala)
> > > > with Duration(Java) *in Scala code*. We could separate the usage for
> > each
> > > > language and use a bridge when necessary.
> > > >
> > > > As a matter of fact, Scala concurrent APIs(including Duration) are
> used
> > > > more than necessary at least in flink-runtime. Also we even try to
> make
> > > > flink-runtime scala free.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > vino yang  于2019年8月24日周六 上午10:05写道：
> > > >
> > > > > +1 to replace the Time class provided by Flink with Java's
> Duration:
> > > > >
> > > > >
> > > > >- Java's Duration has better representation than the Flink's
> Time
> > > > class;
> > > > >- As a built-in Java class, Duration class has a clear advantage
> > > over
> > > > >Java's Time class when interacting with other Java APIs and
> > > > third-party
> > > > >libraries;
> > > > >
> > > > >
> > > > > But I have reservations about replacing the Duration and
> FineDuration
> > > > > classes in scala with the Duration class in Java. Java and Scala
> have
> > > > > different types of systems. Currently, Duration (scala) and
> > > FineDuration
> > > > > (scala) work well.  In addition, this work brings additional
> > complexity
> > > > and
> > > > > cost compared to the gains obtained.
> > > > >
> > > > > Best,
> > > > > Vino
> > > > >
> > > > > Zili Chen  于2019年8月23日周五 下午11:14写道：
> > > > >
> > > > > > Hi Stephan,
> > > > > >
> > > > > > I like the idea unify usage of time/duration api. We actually
> > > > > > use at least five different classes for this purposes(see below).
> > > > > >
> > > > > > One thing I'd like to pick up is that duration configuration
> > > > > > in Flink is almost in pattern as "60 s" that fits in the pattern
> > > > > > parsed by scala.concurrent.duration.Duration. AFAIK Duration
> > > > > > in Java 8 doesn't support this pattern. However, we can solve
> > > > > > it by introduce a DurationUtils.
> > > > > >
> > > > > > Also to clarify, we now have (correct me if any other)
> > > > > >
> > > > > > java.time.Duration
> > > > > > scala.concurrent.duration.Duration
> > > > > > scala.concurrent.duration.FiniteDuration
> > > > > > org.apache.flink.api.common.time.Time
> > > > > > org.apache.flink.streaming.api.windowing.time.Time
> > > > > >
> > > > > > in use. If we'd prefer java.time.Duration, it is worth to
> consider
> > > > > > whether we unify all of them into Java's Duration, i.e., Java's
> > > > > > Duration is the first class time/duration api, while others
> should
> > > > > > be converted into or out from it.
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > Stephan Ewen  于2019年8月23日周五 下午10:45写道：
> > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > Many parts of the code use Flink's "Time" class. The Time
> really
> > > is a
> > > > > > "time
> > > > > > > interval" or a "Duration".
> > > > > > >
> > > > > > > Since Java 8, there is a Java class "Duration" that is nice and
> > > > > flexible
> > > > > > to
> > > > > > > use.
> > > > > > > I would suggest we start using Java Duration instead and drop
> > Time
> > > as
> > > > > > much
> > > > > > > as possible in the runtime from now on.
> > > > > > >
> > > > > > > Maybe even drop that class from the API in Flink 2.0.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-08-25 Thread jincheng sun

Thanks for your feedback Hequn & Dian.

Dian, I am glad to see that you want help to create the FLIP!
Everyone will have first time, and I am very willing to help you complete
your first FLIP creation. Here some tips:

- First I'll give your account write permission for confluence.
- Before create the FLIP, please have look at the FLIP Template [1], (It's
better to know more about FLIP by reading [2])
- Create Flink Python UDFs related JIRAs after completing the VOTE of
FLIP.(I think you also can bring up the VOTE thread, if you want! )

Any problems you encounter during this period，feel free to tell me that we
can solve them together. :)

Best,
Jincheng




[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
[2]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals


Hequn Cheng  于2019年8月23日周五 上午11:54写道：

> +1 for starting the vote.
>
> Thanks Jincheng a lot for the discussion.
>
> Best, Hequn
>
> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu  wrote:
>
> > Hi Jincheng,
> >
> > +1 to start the FLIP create and VOTE on this feature. I'm willing to help
> > on the FLIP create if you don't mind. As I haven't created a FLIP before,
> > it will be great if you could help on this. :)
> >
> > Regards,
> > Dian
> >
> > > 在 2019年8月22日，下午11:41，jincheng sun  写道：
> > >
> > > Hi all,
> > >
> > > Thanks a lot for your feedback. If there are no more suggestions and
> > > comments, I think it's better to  initiate a vote to create a FLIP for
> > > Apache Flink Python UDFs.
> > > What do you think?
> > >
> > > Best, Jincheng
> > >
> > > jincheng sun  于2019年8月15日周四 上午12:54写道：
> > >
> > >> Hi Thomas,
> > >>
> > >> Thanks for your confirmation and the very important reminder about
> > bundle
> > >> processing.
> > >>
> > >> I have had add the description about how to perform bundle processing
> > from
> > >> the perspective of checkpoint and watermark. Feel free to leave
> > comments if
> > >> there are anything not describe clearly.
> > >>
> > >> Best,
> > >> Jincheng
> > >>
> > >>
> > >> Dian Fu  于2019年8月14日周三 上午10:08写道：
> > >>
> > >>> Hi Thomas,
> > >>>
> > >>> Thanks a lot the suggestions.
> > >>>
> > >>> Regarding to bundle processing, there is a section "Checkpoint"[1] in
> > the
> > >>> design doc which talks about how to handle the checkpoint.
> > >>> However, I think you are right that we should talk more about it,
> such
> > as
> > >>> what's bundle processing, how it affects the checkpoint and
> watermark,
> > how
> > >>> to handle the checkpoint and watermark, etc.
> > >>>
> > >>> [1]
> > >>>
> >
> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
> > >>> <
> > >>>
> >
> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
> > 
> > >>>
> > >>> Regards,
> > >>> Dian
> > >>>
> >  在 2019年8月14日，上午1:01，Thomas Weise  写道：
> > 
> >  Hi Jincheng,
> > 
> >  Thanks for putting this together. The proposal is very detailed,
> > >>> thorough
> >  and for me as a Beam Flink runner contributor easy to understand :)
> > 
> >  One thing that you should probably detail more is the bundle
> > >>> processing. It
> >  is critically important for performance that multiple elements are
> >  processed in a bundle. The default bundle size in the Flink runner
> is
> > >>> 1s or
> >  1000 elements, whichever comes first. And for streaming, you can
> find
> > >>> the
> >  logic necessary to align the bundle processing with watermarks and
> >  checkpointing here:
> > 
> > >>>
> >
> https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
> > 
> >  Thomas
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >  On Tue, Aug 13, 2019 at 7:05 AM jincheng sun <
> > sunjincheng...@gmail.com>
> >  wrote:
> > 
> > > Hi all,
> > >
> > > The Python Table API(without Python UDF support) has already been
> > >>> supported
> > > and will be available in the coming release 1.9.
> > > As Python UDF is very important for Python users, we'd like to
> start
> > >>> the
> > > discussion about the Python UDF support in the Python Table API.
> > > Aljoscha Krettek, Dian Fu and I have discussed offline and have
> > >>> drafted a
> > > design doc[1]. It includes the following items:
> > >
> > > - The user-defined function interfaces.
> > > - The user-defined function execution architecture.
> > >
> > > As mentioned by many guys in the previous discussion thread[2], a
> > > portability framework was introduced in Apache Beam in latest
> > >>> releases. It
> > > provides well-defined, language-neutral data structures and
> protocols
> > >>> for
> > > language-neutral user-defined function execution. This design is
> > based
> > >>> on
> > > Beam's portab

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-08-25 Thread Yang Wang

Hi Zili,

It make sense to me that a dedicated cluster is started for a per-job
cluster and will not accept more jobs.
Just have a question about the command line.

Currently we could use the following commands to start different clusters.
*per-job cluster*
./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
examples/streaming/WindowJoin.jar
*session cluster*
./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
examples/streaming/WindowJoin.jar

What will it look like after client enhancement?


Best,
Yang

Zili Chen  于2019年8月23日周五 下午10:46写道：

> Hi Till,
>
> Thanks for your update. Nice to hear :-)
>
> Best,
> tison.
>
>
> Till Rohrmann  于2019年8月23日周五 下午10:39写道：
>
> > Hi Tison,
> >
> > just a quick comment concerning the class loading issues when using the
> per
> > job mode. The community wants to change it so that the
> > StandaloneJobClusterEntryPoint actually uses the user code class loader
> > with child first class loading [1]. Hence, I hope that this problem will
> be
> > resolved soon.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-13840
> >
> > Cheers,
> > Till
> >
> > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas 
> wrote:
> >
> > > Hi all,
> > >
> > > On the topic of web submission, I agree with Till that it only seems
> > > to complicate things.
> > > It is bad for security, job isolation (anybody can submit/cancel jobs),
> > > and its
> > > implementation complicates some parts of the code. So, if it were to
> > > redesign the
> > > WebUI, maybe this part could be left out. In addition, I would say
> > > that the ability to cancel
> > > jobs could also be left out.
> > >
> > > Also I would also be in favour of removing the "detached" mode, for
> > > the reasons mentioned
> > > above (i.e. because now we will have a future representing the result
> > > on which the user
> > > can choose to wait or not).
> > >
> > > Now for the separating job submission and cluster creation, I am in
> > > favour of keeping both.
> > > Once again, the reasons are mentioned above by Stephan, Till, Aljoscha
> > > and also Zili seems
> > > to agree. They mainly have to do with security, isolation and ease of
> > > resource management
> > > for the user as he knows that "when my job is done, everything will be
> > > cleared up". This is
> > > also the experience you get when launching a process on your local OS.
> > >
> > > On excluding the per-job mode from returning a JobClient or not, I
> > > believe that eventually
> > > it would be nice to allow users to get back a jobClient. The reason is
> > > that 1) I cannot
> > > find any objective reason why the user-experience should diverge, and
> > > 2) this will be the
> > > way that the user will be able to interact with his running job.
> > > Assuming that the necessary
> > > ports are open for the REST API to work, then I think that the
> > > JobClient can run against the
> > > REST API without problems. If the needed ports are not open, then we
> > > are safe to not return
> > > a JobClient, as the user explicitly chose to close all points of
> > > communication to his running job.
> > >
> > > On the topic of not hijacking the "env.execute()" in order to get the
> > > Plan, I definitely agree but
> > > for the proposal of having a "compile()" method in the env, I would
> > > like to have a better look at
> > > the existing code.
> > >
> > > Cheers,
> > > Kostas
> > >
> > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen 
> wrote:
> > > >
> > > > Hi Yang,
> > > >
> > > > It would be helpful if you check Stephan's last comment,
> > > > which states that isolation is important.
> > > >
> > > > For per-job mode, we run a dedicated cluster(maybe it
> > > > should have been a couple of JM and TMs during FLIP-6
> > > > design) for a specific job. Thus the process is prevented
> > > > from other jobs.
> > > >
> > > > In our cases there was a time we suffered from multi
> > > > jobs submitted by different users and they affected
> > > > each other so that all ran into an error state. Also,
> > > > run the client inside the cluster could save client
> > > > resource at some points.
> > > >
> > > > However, we also face several issues as you mentioned,
> > > > that in per-job mode it always uses parent classloader
> > > > thus classloading issues occur.
> > > >
> > > > BTW, one can makes an analogy between session/per-job mode
> > > > in  Flink, and client/cluster mode in Spark.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Yang Wang  于2019年8月22日周四 上午11:25写道：
> > > >
> > > > > From the user's perspective, it is really confused about the scope
> of
> > > > > per-job cluster.
> > > > >
> > > > >
> > > > > If it means a flink cluster with single job, so that we could get
> > > better
> > > > > isolation.
> > > > >
> > > > > Now it does not matter how we deploy the cluster, directly
> > > deploy(mode1)
> > > > >
> > > > > or start a flink cluster and then submit job through cluster
> > > client(mode2).
> > > > >
> > > > >
> > > > > Otherwise, if

[jira] [Created] (FLINK-13847) Update release scripts to also update docs/_config.yml

2019-08-25 Thread Tzu-Li (Gordon) Tai (Jira)

Tzu-Li (Gordon) Tai created FLINK-13847:
---

 Summary: Update release scripts to also update docs/_config.yml
 Key: FLINK-13847
 URL: https://issues.apache.org/jira/browse/FLINK-13847
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Release System
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai


During the 1.9.0 release process, we missed quite a few configuration updates 
in {{docs/_config.yml}} related to Flink versions. This should be able to be 
done automatically in via the release scripts.

A list of settings in that file that needs to be touched on every major release 
include:
* version
* version_title
* github_branch
* baseurl
* stable_baseurl
* javadocs_baseurl
* pythondocs_baseurl
* is_stable
* Add new link to previous_docs

This can probably be done via the {{tools/releasing/create_release_branch.sh}} 
script, which is used for every major release.

We should also update the release guide in the project wiki to cover checking 
that file as an item in checklists.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-08-25 Thread Dian Fu

Hi Jincheng,

Appreciated for the kind tips and offering of help. Definitely need it! Could 
you grant me write permission for confluence? My Id: Dian Fu

Thanks,
Dian

> 在 2019年8月26日，上午9:53，jincheng sun  写道：
> 
> Thanks for your feedback Hequn & Dian.
> 
> Dian, I am glad to see that you want help to create the FLIP!
> Everyone will have first time, and I am very willing to help you complete
> your first FLIP creation. Here some tips:
> 
> - First I'll give your account write permission for confluence.
> - Before create the FLIP, please have look at the FLIP Template [1], (It's
> better to know more about FLIP by reading [2])
> - Create Flink Python UDFs related JIRAs after completing the VOTE of
> FLIP.(I think you also can bring up the VOTE thread, if you want! )
> 
> Any problems you encounter during this period，feel free to tell me that we
> can solve them together. :)
> 
> Best,
> Jincheng
> 
> 
> 
> 
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> 
> 
> Hequn Cheng  于2019年8月23日周五 上午11:54写道：
> 
>> +1 for starting the vote.
>> 
>> Thanks Jincheng a lot for the discussion.
>> 
>> Best, Hequn
>> 
>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu  wrote:
>> 
>>> Hi Jincheng,
>>> 
>>> +1 to start the FLIP create and VOTE on this feature. I'm willing to help
>>> on the FLIP create if you don't mind. As I haven't created a FLIP before,
>>> it will be great if you could help on this. :)
>>> 
>>> Regards,
>>> Dian
>>> 
 在 2019年8月22日，下午11:41，jincheng sun  写道：
 
 Hi all,
 
 Thanks a lot for your feedback. If there are no more suggestions and
 comments, I think it's better to  initiate a vote to create a FLIP for
 Apache Flink Python UDFs.
 What do you think?
 
 Best, Jincheng
 
 jincheng sun  于2019年8月15日周四 上午12:54写道：
 
> Hi Thomas,
> 
> Thanks for your confirmation and the very important reminder about
>>> bundle
> processing.
> 
> I have had add the description about how to perform bundle processing
>>> from
> the perspective of checkpoint and watermark. Feel free to leave
>>> comments if
> there are anything not describe clearly.
> 
> Best,
> Jincheng
> 
> 
> Dian Fu  于2019年8月14日周三 上午10:08写道：
> 
>> Hi Thomas,
>> 
>> Thanks a lot the suggestions.
>> 
>> Regarding to bundle processing, there is a section "Checkpoint"[1] in
>>> the
>> design doc which talks about how to handle the checkpoint.
>> However, I think you are right that we should talk more about it,
>> such
>>> as
>> what's bundle processing, how it affects the checkpoint and
>> watermark,
>>> how
>> to handle the checkpoint and watermark, etc.
>> 
>> [1]
>> 
>>> 
>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
>> <
>> 
>>> 
>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
>>> 
>> 
>> Regards,
>> Dian
>> 
>>> 在 2019年8月14日，上午1:01，Thomas Weise  写道：
>>> 
>>> Hi Jincheng,
>>> 
>>> Thanks for putting this together. The proposal is very detailed,
>> thorough
>>> and for me as a Beam Flink runner contributor easy to understand :)
>>> 
>>> One thing that you should probably detail more is the bundle
>> processing. It
>>> is critically important for performance that multiple elements are
>>> processed in a bundle. The default bundle size in the Flink runner
>> is
>> 1s or
>>> 1000 elements, whichever comes first. And for streaming, you can
>> find
>> the
>>> logic necessary to align the bundle processing with watermarks and
>>> checkpointing here:
>>> 
>> 
>>> 
>> https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
>>> 
>>> Thomas
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Aug 13, 2019 at 7:05 AM jincheng sun <
>>> sunjincheng...@gmail.com>
>>> wrote:
>>> 
 Hi all,
 
 The Python Table API(without Python UDF support) has already been
>> supported
 and will be available in the coming release 1.9.
 As Python UDF is very important for Python users, we'd like to
>> start
>> the
 discussion about the Python UDF support in the Python Table API.
 Aljoscha Krettek, Dian Fu and I have discussed offline and have
>> drafted a
 design doc[1]. It includes the following items:
 
 - The user-defined function interfaces.
 - The user-defined function execution architecture.
 
 As mentioned by many guys in the previous discussion thread[2], a
 portability framework was introduced in Apache Beam in latest

[jira] [Created] (FLINK-13848) Support “scheduleAtFixedRate/scheduleAtFixedDelay” in RpcEndpoint#MainThreadExecutor

2019-08-25 Thread Biao Liu (Jira)

Biao Liu created FLINK-13848:


 Summary: Support “scheduleAtFixedRate/scheduleAtFixedDelay” in 
RpcEndpoint#MainThreadExecutor
 Key: FLINK-13848
 URL: https://issues.apache.org/jira/browse/FLINK-13848
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Biao Liu
 Fix For: 1.10.0


Currently the methods “scheduleAtFixedRate/scheduleAtFixedDelay" of 
{{RpcEndpoint#MainThreadExecutor}} are not implemented. Because there was no 
requirement on them before.

Now we are planning to implement these methods to support periodic checkpoint 
triggering.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [DISCUSS] Setup a bui...@flink.apache.org mailing list for travis builds

2019-08-25 Thread Jark Wu

Hi all,

Sorry it take so long to get back. I have some good news.

After some investigation and development and the help from Chesnay, we
finally integrated Travis build notification with bui...@flink.apache.org
mailing list with remaining the beautiful formatting!
Currently, only the failure and failure->success builds will be notified,
only builds (include CRON) on apache/flink branches will be notified, the
pull request builds will not be notified.

The builds mailing list is also available in Flink website community page
[1]

I would encourage devs to subscribe the builds mailing list, and help the
community to pay more attention to the build status, especially the CRON
builds.

Feel free to leave your suggestions and feedbacks here!



# The implementation detail:

I implemented a flink-notification-bot[2] to receive Travis webhook[3]
payload and generate an HTML email and send the email to
bui...@flink.apache.org.
The flink-notification-bot is deployed on my own VM in DigitalOcean. You
can refer the github page [2] of the project to learn more details about
the implementation and deployment.
Btw, I'm glad to contribute the project to https://github.com/flink-ci or
https://github.com/flinkbot if the community accepts.

With the flink-notification-bot, we can easily integrate it with other CI
service or our own CI, and we can also integrate it with some other
applications (e.g. DingTalk).

# Rejected Alternative:

Option#1: Sending email notifications via "Travis Email Notification"[4].
Reasons:
 - If the emailing notification is set, Travis CI only sends an emails to
the addresses specified there, rather than to the committer and author.
 - We will lose the beautiful email formatting when Travis send Email to
builds ML.
 - The return-path of emails from Travis CI is not constant, which makes it
difficult for mailing list to accept it.

Cheers,
Jark

[1]: https://flink.apache.org/community.html#mailing-lists
[2]: https://github.com/wuchong/flink-notification-bot
[3]:
https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
[4]:
https://docs.travis-ci.com/user/notifications/#configuring-email-notifications




On Tue, 30 Jul 2019 at 18:35, Jark Wu  wrote:

> Hi all,
>
> Progress updates:
> 1. the bui...@flink.apache.org can be subscribed now (thanks @Robert),
> you can send an email to builds-subscr...@flink.apache.org to subscribe.
> 2. We have a pull request [1] to send only apache/flink builds
> notifications and it works well.
> 3. However, all the notifications are rejected by the builds mailing list
> (the MODERATE mails).
> I added & checked bui...@travis-ci.org to the subscriber/allow list,
> but still doesn't work. It might be recognized as spam by the mailing list.
> We are still trying to figure it out, and will update here if we have
> some progress.
>
>
> Thanks,
> Jark
>
>
>
> [1]: https://github.com/apache/flink/pull/9230
>
>
> On Thu, 25 Jul 2019 at 22:59, Robert Metzger  wrote:
>
>> The mailing list has been created, you can now subscribe to it.
>>
>> On Wed, Jul 24, 2019 at 1:43 PM Jark Wu  wrote:
>>
>> > Thanks Robert for helping out that.
>> >
>> > Best,
>> > Jark
>> >
>> > On Wed, 24 Jul 2019 at 19:16, Robert Metzger 
>> wrote:
>> >
>> > > I've requested the creation of the list, and made Jark, Chesnay and me
>> > > moderators of it.
>> > >
>> > > On Wed, Jul 24, 2019 at 1:12 PM Robert Metzger 
>> > > wrote:
>> > >
>> > > > @Jark: Yes, I will request the creation of a mailing list!
>> > > >
>> > > > On Tue, Jul 23, 2019 at 4:48 PM Hugo Louro 
>> wrote:
>> > > >
>> > > >> +1
>> > > >>
>> > > >> > On Jul 23, 2019, at 6:15 AM, Till Rohrmann > >
>> > > >> wrote:
>> > > >> >
>> > > >> > Good idea Jark. +1 for the proposal.
>> > > >> >
>> > > >> > Cheers,
>> > > >> > Till
>> > > >> >
>> > > >> >> On Tue, Jul 23, 2019 at 1:59 PM Hequn Cheng <
>> chenghe...@gmail.com>
>> > > >> wrote:
>> > > >> >>
>> > > >> >> Hi Jark,
>> > > >> >>
>> > > >> >> Good idea. +1!
>> > > >> >>
>> > > >> >>> On Tue, Jul 23, 2019 at 6:23 PM Jark Wu 
>> wrote:
>> > > >> >>>
>> > > >> >>> Thank you all for your positive feedback.
>> > > >> >>>
>> > > >> >>> We have three binding +1s, so I think, we can proceed with
>> this.
>> > > >> >>>
>> > > >> >>> Hi @Robert Metzger  , could you create a
>> > > >> request to
>> > > >> >>> INFRA for the mailing list?
>> > > >> >>> I'm not sure if this needs a PMC permission.
>> > > >> >>>
>> > > >> >>> Thanks,
>> > > >> >>> Jark
>> > > >> >>>
>> > > >> >>> On Tue, 23 Jul 2019 at 16:42, jincheng sun <
>> > > sunjincheng...@gmail.com>
>> > > >> >>> wrote:
>> > > >> >>>
>> > > >>  +1
>> > > >> 
>> > > >>  Robert Metzger  于2019年7月23日周二 下午4:01写道：
>> > > >> 
>> > > >> > +1
>> > > >> >
>> > > >> > On Mon, Jul 22, 2019 at 10:27 AM Biao Liu <
>> mmyy1...@gmail.com>
>> > > >> >> wrote:
>> > > >> >
>> > > >> >> +1, make sense to me.
>> > > >> >> Mailin

Re: [DISCUSS] Enhance Support for Multicast Communication Pattern

2019-08-25 Thread Guowei Ma

Thanks Yun for bringing up this discussion and very thanks for all the deep
thoughts!

For now, I think this discussion contains two scenarios: one if for
iteration library support and the other is for SQL join support. I think
both of the two scenarios are useful but they seem to have different best
suitable solutions. For making the discussion more clear, I would suggest
to split the discussion into two threads.

And I agree with Piotr that it is very tricky that a keyed stream received
a "broadcast element". So we may add some new interfaces, which could
broadcast or process some special "broadcast event". In that way "broadcast
event" will not be sent with the normal process.

Best,
Guowei


SHI Xiaogang  于2019年8月26日周一 上午9:27写道：

> Hi all,
>
> I also think that multicasting is a necessity in Flink, but more details
> are needed to be considered.
>
> Currently network is tightly coupled with states in Flink to achieve
> automatic scaling. We can only access keyed states in keyed streams and
> operator states in all streams.
> In the concrete example of theta-joins implemented with mutlticasting, the
> following questions exist:
>
>- In which type of states will the data be stored? Do we need another
>type of states which is coupled with multicasting streams?
>- How to ensure the consistency between network and states when jobs
>scale out or scale in?
>
> Regards,
> Xiaogang
>
> Xingcan Cui  于2019年8月25日周日 上午10:03写道：
>
> > Hi all,
> >
> > Sorry for joining this thread late. Basically, I think enabling multicast
> > pattern could be the right direction, but more detailed implementation
> > policies need to be discussed.
> >
> > Two years ago, I filed an issue [1] about the multicast API. However, due
> > to some reasons, it was laid aside. After that, when I tried to
> cherry-pick
> > the change for experimental use, I found the return type of
> > `selectChannels()` method had changed from `int[]` to `int`, which makes
> > the old implementation not work anymore.
> >
> > From my side, the multicast has always been used for theta-join. As far
> as
> > I know, it’s an essential requirement for some sophisticated joining
> > algorithms. Until now, the Flink non-equi joins can still only be
> executed
> > single-threaded. If we'd like to make some improvements on this, we
> should
> > first take some measures to support multicast pattern.
> >
> > Best,
> > Xingcan
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-6936
> >
> > > On Aug 24, 2019, at 5:54 AM, Zhu Zhu  wrote:
> > >
> > > Hi Piotr,
> > >
> > > Thanks for the explanation.
> > > Agreed that the broadcastEmit(record) is a better choice for
> broadcasting
> > > for the iterations.
> > > As broadcasting for the iterations is the first motivation, let's
> support
> > > it first.
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Yun Gao  于2019年8月23日周五 下午11:56写道：
> > >
> > >> Hi Piotr,
> > >>
> > >>  Very thanks for the suggestions!
> > >>
> > >> Totally agree with that we could first focus on the broadcast
> > >> scenarios and exposing the broadcastEmit method first considering the
> > >> semantics and performance.
> > >>
> > >> For the keyed stream, I also agree with that broadcasting keyed
> > >> records to all the tasks may be confused considering the semantics of
> > keyed
> > >> partitioner. However, in the iteration case supporting broadcast over
> > keyed
> > >> partitioner should be required since users may create any subgraph for
> > the
> > >> iteration body, including the operators with key. I think a possible
> > >> solution to this issue is to introduce another data type for
> > >> 'broadcastEmit'. For example, for an operator Operator, it may
> > broadcast
> > >> emit another type E instead of T, and the transmitting E will bypass
> the
> > >> partitioner and setting keyed context. This should result in the
> design
> > to
> > >> introduce customized operator event (option 1 in the document). The
> > cost of
> > >> this method is that we need to introduce a new type of StreamElement
> and
> > >> new interface for this type, but it should be suitable for both keyed
> or
> > >> non-keyed partitioner.
> > >>
> > >> Best,
> > >> Yun
> > >>
> > >>
> > >>
> > >> --
> > >> From:Piotr Nowojski 
> > >> Send Time:2019 Aug. 23 (Fri.) 22:29
> > >> To:Zhu Zhu 
> > >> Cc:dev ; Yun Gao 
> > >> Subject:Re: [DISCUSS] Enhance Support for Multicast Communication
> > Pattern
> > >>
> > >> Hi,
> > >>
> > >> If the primary motivation is broadcasting (for the iterations) and we
> > have
> > >> no immediate need for multicast (cross join), I would prefer to first
> > >> expose broadcast via the DataStream API and only later, once we
> finally
> > >> need it, support multicast. As I wrote, multicast would be more
> > challenging
> > >> to implement, with more complicated runtime and API. And re-using
> > multicast
> > >> just to support broadcast doesn’t have much sense

[jira] [Created] (FLINK-13849) The back-pressure monitoring tab in Web UI may cause errors

2019-08-25 Thread Xingcan Cui (Jira)

Xingcan Cui created FLINK-13849:
---

 Summary: The back-pressure monitoring tab in Web UI may cause 
errors
 Key: FLINK-13849
 URL: https://issues.apache.org/jira/browse/FLINK-13849
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.9.0
Reporter: Xingcan Cui


Clicking the back-pressure monitoring tab for a finished job in Web UI will 
cause an internal server error. The exceptions are as follows.
{code:java}
2019-08-26 01:23:54,845 ERROR 
org.apache.flink.runtime.rest.handler.job.JobVertexBackPressureHandler - 
Unhandled exception.
org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find 
Flink job (09e107685e0b81b443b556062debb443)
at 
org.apache.flink.runtime.dispatcher.Dispatcher.getJobMasterGatewayFuture(Dispatcher.java:825)
at 
org.apache.flink.runtime.dispatcher.Dispatcher.requestOperatorBackPressureStats(Dispatcher.java:524)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:279)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:194)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [DISSCUSS] Tolerate temporarily suspended ZooKeeper connections

2019-08-25 Thread Zili Chen

Hi Till,

I'd like to revive this thread since 1.9.0 has been released.

IMHO we already reached a consensus on JIRA and if you can review
the pull request we hopefully address the issue in next release.

Best,
tison.


Zili Chen  于2019年7月29日周一 下午11:05写道：

> Hi Till,
>
> Thanks for your explanation. Let's pick up this thread in 1.10 developing.
>
> Best,
> tison.
>
>
> Till Rohrmann  于2019年7月29日周一 下午9:12写道：
>
>> Hi Tison,
>>
>> I would consider this a new feature and as such it won't be possible to
>> include it in the 1.9.0 release since the feature freeze has been passed.
>> We might target 1.10, though.
>>
>> Cheers,
>> Till
>>
>> On Mon, Jul 29, 2019 at 3:01 AM Zili Chen  wrote:
>>
>> > Hi committers,
>> >
>> > Now that we have an ongoing pr[1] to this JIRA, we need a committer
>> > to push this thread forward. It would be glad to see this issue fixed
>> > in 1.9.0.
>> >
>> > Best,
>> > tison.
>> >
>> > [1] https://github.com/apache/flink/pull/9158
>> >
>> >
>> > 未来阳光 <2217232...@qq.com> 于2019年7月23日周二 下午9:28写道：
>> >
>> > > Ok, If you have any suggestions, we can talk aobut the details under
>> > > FLINK-10052.
>> > >
>> > >
>> > > Best.
>> > >
>> > >
>> > > -- 原始邮件 --
>> > > 发件人: "Till Rohrmann";
>> > > 发送时间: 2019年7月23日(星期二) 晚上9:19
>> > > 收件人: "dev";
>> > >
>> > > 主题: Re: [DISSCUSS] Tolerate temporarily suspended ZooKeeper
>> connections
>> > >
>> > >
>> > >
>> > > Hi Lamber-Ken,
>> > >
>> > > thanks for starting this discussion. I think there is benefit of not
>> > > directly losing leadership if the ZooKeeper connection goes into the
>> > > SUSPENDED state. In particular if we can guarantee that there is only
>> a
>> > > single JobMaster, it might make sense to not overly eagerly give up
>> > > leadership. I would suggest to continue the technical discussion on
>> the
>> > > JIRA issue thread since it already contains a good amount of details.
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > > On Sat, Jul 20, 2019 at 12:55 PM QQ邮箱 <2217232...@qq.com> wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > Desc
>> > > > We deploy flink streaming jobs on hadoop cluster on per-job model
>> and
>> > use
>> > > > zookeeper as HighAvailabilityService, but we found that flink job
>> will
>> > > > restart because of the network disconnected temporarily between
>> > > jobmanager
>> > > > and zookeeper.So we analyze this problem deeply. Flink JobManager
>> use
>> > > > curator's `LeaderLatch` to maintain the leadership. When network
>> > > > disconncet, the `LeaderLatch` will change leadership to false
>> directly.
>> > > We
>> > > > think it's too brutally that many flink longrunning jobs will
>> restart
>> > > > because of the network shake.Instead of directly revoking the
>> > leadership
>> > > > upon a SUSPENDED ZooKeeper connection, it would be better to wait
>> until
>> > > the
>> > > > ZooKeeper connection is LOST.
>> > > >
>> > > > Here're two jiras about the problem, FLINK-10052 and FLINK-13189,
>> they
>> > > are
>> > > > duplicate. Thanks to @Elias Levy told us that FLINK-13189, so close
>> > > > FLINK-13189.
>> > > >
>> > > > Solution
>> > > > Back to this problem, there're two ways to solve this currently,
>> one is
>> > > > rewrite LeaderLatch#handleStateChange method, another is upgrade
>> > > > curator-4.2.0. The first way is hackly but right, the second way
>> need
>> > to
>> > > > consider the
>> > > > compatibility. For more detail, please see FLINK-10052.
>> > > >
>> > > > Hope
>> > > > The FLINK-10052 was reported at 2018-08-03(about a year ago), so we
>> > hope
>> > > > this problem can fix as soon as possible.
>> > > > btw, thanks @TisonKun for talking about this problem and review pr.
>> > > >
>> > > > Links
>> > > > FLINK-10052 https://issues.apache.org/jira/browse/FLINK-10052 <
>> > > > https://issues.apache.org/jira/browse/FLINK-10052>
>> > > > FLINK-13189 https://issues.apache.org/jira/browse/FLINK-13189 <
>> > > > https://issues.apache.org/jira/browse/FLINK-13189>
>> > > >
>> > > > Any suggestion is welcome, what do you think?
>> > > >
>> > > > Best, lamber-ken.
>> >
>>
>

Re: [DISCUSS] Add ARM CI build to Flink (information-only)

2019-08-25 Thread Chesnay Schepler

I'm sorry, but if these issues are only fixed later anyway I see no 
reason to run these tests on each PR. We're just adding noise to each PR 
that everyone will just ignore.


I'm curious as to the benefit of having this directly in Flink; why 
aren't the ARM builds run outside of the Flink project, and fixes for it 
provided?


It seems to me like nothing about these arm builds is actually handled 
by the Flink project.


On 26/08/2019 03:43, Xiyuan Wang wrote:

Thanks for Stephan to bring up this topic.

The package build jobs work well now. I have a simple online demo which is
built and ran on a ARM VM. Feel free to have a try[1].

As the first step for ARM support, maybe it's good to add them now.

While for the next step, the test part is still broken. It relates to some
points we find:

1. Some unit tests are failed[1] by Java coding. These kind of failure can
be fixed easily.
2. Some tests are failed by depending on third part libaraies[2]. It
includes frocksdb, MapR Client and Netty. They don't have ARM release.
 a. Frocksdb: I'm testing it locally now by `make check_some` and `make
jtest` similar with its travis job. There are 3 tests failed by `make
check_some`. Please see the ticket for more details. Once the test pass,
frocksdb can release ARM package then.
 b. MapR Client. This belongs to MapR company. At this moment, maybe we
should skip MapR support for Flink ARM.
 c. Netty. Actually Netty runs well on our ARM machine. We will ask
Netty community to release ARM support. If they do not want, OpenLab will
handle a Maven Repository for some common libraries on ARM.


For Chesnay's concern:

Firstly, OpenLab team will keep maintaining and fixing ARM CI. It means
that once build or test fails, we'll fix it at once.
Secondly,  OpenLab can provide ARM VMs to everyone for reproducing and
testing. You just need to creat a  Test Request issue in openlab[1]. Then
we'll create ARM VMs for you, you can  login and do the thing you want.

Does it make sense?

[1]: http://114.115.168.52:8081/#/overview
[1]: https://issues.apache.org/jira/browse/FLINK-13449
   https://issues.apache.org/jira/browse/FLINK-13450
[2]: https://issues.apache.org/jira/browse/FLINK-13598
[3]: https://github.com/theopenlab/openlab/issues/new/choose




Chesnay Schepler  于2019年8月24日周六 上午12:10写道：


I'm wondering what we are supposed to do if the build fails?
We aren't providing and guides on setting up an arm dev environment; so
reproducing it locally isn't possible.

On 23/08/2019 17:55, Stephan Ewen wrote:

Hi all!

As part of the Flink on ARM effort, there is a pull request that

triggers a

build on OpenLabs CI for each push and runs tests on ARM machines.

Currently that build is roughly equivalent to what the "core" and "tests"
profiles do on Travis.
The result will be posted to the PR comments, similar to the Flink Bot's
Travis build result.
The build currently passes :-) so Flink seems to be okay on ARM.

My suggestion would be to try and add this and gather some experience

with

it.
The Travis build results should be our "ground truth" and the ARM CI
(openlabs CI) would be "informational only" at the beginning, but helping
us understand when we break ARM support.

You can see this in the PR that adds the openlabs CI config:
https://github.com/apache/flink/pull/9416

Any objections?

Best,
Stephan

[jira] [Created] (FLINK-13843) Unify and clean up StreamingFileSink format builders

[jira] [Created] (FLINK-13844) AtOmXpLuS

[jira] [Created] (FLINK-13845) Drop all the content of removed "Checkpointed" interface

[jira] [Created] (FLINK-13846) Implement benchmark case on MapState#isEmpty

Re: [DISCUSS] Enhance Support for Multicast Communication Pattern

Re: [DISCUSS] Add ARM CI build to Flink (information-only)

Re: [DISCUSS] Use Java's Duration instead of Flink's Time

Re: [DISCUSS] Flink Python User-Defined Function for Table API

Re: [DISCUSS] Flink client api enhancement for downstream project

[jira] [Created] (FLINK-13847) Update release scripts to also update docs/_config.yml

Re: [DISCUSS] Flink Python User-Defined Function for Table API

[jira] [Created] (FLINK-13848) Support “scheduleAtFixedRate/scheduleAtFixedDelay” in RpcEndpoint#MainThreadExecutor

Re: [DISCUSS] Setup a bui...@flink.apache.org mailing list for travis builds

Re: [DISCUSS] Enhance Support for Multicast Communication Pattern

[jira] [Created] (FLINK-13849) The back-pressure monitoring tab in Web UI may cause errors

Re: [DISSCUSS] Tolerate temporarily suspended ZooKeeper connections

Re: [DISCUSS] Add ARM CI build to Flink (information-only)

17 matches

Site Navigation

Mail list logo

Footer information