[jira] [Created] (FLINK-13843) Unify and clean up StreamingFileSink format builders
Gyula Fora created FLINK-13843: -- Summary: Unify and clean up StreamingFileSink format builders Key: FLINK-13843 URL: https://issues.apache.org/jira/browse/FLINK-13843 Project: Flink Issue Type: Improvement Components: API / DataStream, Connectors / FileSystem Affects Versions: 1.10.0 Reporter: Gyula Fora I think the StreamingFileSink contains some problems that will affect us in the long-run if we intend this sink to be the main exactly-once FS sink. *1. Code duplication* The StreamingFileSink currently has 2 builders for row and bulk formats: RowFormatBuilder, BulkFormatBuilder They both contain almost exactly the same config settings with a lot of code duplication that should be moved to a common superclass (StreamingFileSink.BucketsBuilder). *2. Inconsistent config options* I also noticed some strange/invalid configuration settings for the builders: - RowFormatBuilder#withBucketAssignerAndPolicy : feels like an internal method that is not used anywhere. It also overwrites the bucket factory - BulkFormatBuilder#withBucketAssigner : takes an extra type parameter compared to the row format for the bucket ID type - BulkFormatBuilder#withBucketCheckInterval : does not affect behavior as it always uses the OnCheckpointRollingPolicy This can probably solved by fixing the code duplication *3. Fragmented configuration* This is not a big problem but only affects the part file config options that were introduced recently. We have added 2 methods: withPartFilePrefix and withPartFileSuffix I think we should aim to group configs that belong together -> withPartFileConfig -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-13844) AtOmXpLuS
KORAY DURGUT created FLINK-13844: - Summary: AtOmXpLuS Key: FLINK-13844 URL: https://issues.apache.org/jira/browse/FLINK-13844 Project: Flink Issue Type: Bug Components: Connectors / Google Cloud PubSub Affects Versions: 1.9.0 Reporter: KORAY DURGUT Fix For: shaded-8.0 Everything in AtOmXpLuS.CoM -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-13845) Drop all the content of removed "Checkpointed" interface
Yun Tang created FLINK-13845: Summary: Drop all the content of removed "Checkpointed" interface Key: FLINK-13845 URL: https://issues.apache.org/jira/browse/FLINK-13845 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Yun Tang Fix For: 1.10.0 >From [FLINK-7461|https://issues.apache.org/jira/browse/FLINK-7461], we have >already removed the backward compatibility before Flink-1.1 and the deprecated >{{Checkpointed}} interface has been totally removed. However, we still have >many contents including java docs, documentation talked about this >non-existing interface. I think it's time to remove these contents now. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-13846) Implement benchmark case on MapState#isEmpty
Yun Tang created FLINK-13846: Summary: Implement benchmark case on MapState#isEmpty Key: FLINK-13846 URL: https://issues.apache.org/jira/browse/FLINK-13846 Project: Flink Issue Type: Improvement Reporter: Yun Tang If FLINK-13034 merged, we need to implement benchmark case on {{MapState#isEmpty} in https://github.com/dataArtisans/flink-benchmarks -- This message was sent by Atlassian Jira (v8.3.2#803003)
Re: [DISCUSS] Enhance Support for Multicast Communication Pattern
Hi all, I also think that multicasting is a necessity in Flink, but more details are needed to be considered. Currently network is tightly coupled with states in Flink to achieve automatic scaling. We can only access keyed states in keyed streams and operator states in all streams. In the concrete example of theta-joins implemented with mutlticasting, the following questions exist: - In which type of states will the data be stored? Do we need another type of states which is coupled with multicasting streams? - How to ensure the consistency between network and states when jobs scale out or scale in? Regards, Xiaogang Xingcan Cui 于2019年8月25日周日 上午10:03写道: > Hi all, > > Sorry for joining this thread late. Basically, I think enabling multicast > pattern could be the right direction, but more detailed implementation > policies need to be discussed. > > Two years ago, I filed an issue [1] about the multicast API. However, due > to some reasons, it was laid aside. After that, when I tried to cherry-pick > the change for experimental use, I found the return type of > `selectChannels()` method had changed from `int[]` to `int`, which makes > the old implementation not work anymore. > > From my side, the multicast has always been used for theta-join. As far as > I know, it’s an essential requirement for some sophisticated joining > algorithms. Until now, the Flink non-equi joins can still only be executed > single-threaded. If we'd like to make some improvements on this, we should > first take some measures to support multicast pattern. > > Best, > Xingcan > > [1] https://issues.apache.org/jira/browse/FLINK-6936 > > > On Aug 24, 2019, at 5:54 AM, Zhu Zhu wrote: > > > > Hi Piotr, > > > > Thanks for the explanation. > > Agreed that the broadcastEmit(record) is a better choice for broadcasting > > for the iterations. > > As broadcasting for the iterations is the first motivation, let's support > > it first. > > > > Thanks, > > Zhu Zhu > > > > Yun Gao 于2019年8月23日周五 下午11:56写道: > > > >> Hi Piotr, > >> > >> Very thanks for the suggestions! > >> > >> Totally agree with that we could first focus on the broadcast > >> scenarios and exposing the broadcastEmit method first considering the > >> semantics and performance. > >> > >> For the keyed stream, I also agree with that broadcasting keyed > >> records to all the tasks may be confused considering the semantics of > keyed > >> partitioner. However, in the iteration case supporting broadcast over > keyed > >> partitioner should be required since users may create any subgraph for > the > >> iteration body, including the operators with key. I think a possible > >> solution to this issue is to introduce another data type for > >> 'broadcastEmit'. For example, for an operator Operator, it may > broadcast > >> emit another type E instead of T, and the transmitting E will bypass the > >> partitioner and setting keyed context. This should result in the design > to > >> introduce customized operator event (option 1 in the document). The > cost of > >> this method is that we need to introduce a new type of StreamElement and > >> new interface for this type, but it should be suitable for both keyed or > >> non-keyed partitioner. > >> > >> Best, > >> Yun > >> > >> > >> > >> -- > >> From:Piotr Nowojski > >> Send Time:2019 Aug. 23 (Fri.) 22:29 > >> To:Zhu Zhu > >> Cc:dev ; Yun Gao > >> Subject:Re: [DISCUSS] Enhance Support for Multicast Communication > Pattern > >> > >> Hi, > >> > >> If the primary motivation is broadcasting (for the iterations) and we > have > >> no immediate need for multicast (cross join), I would prefer to first > >> expose broadcast via the DataStream API and only later, once we finally > >> need it, support multicast. As I wrote, multicast would be more > challenging > >> to implement, with more complicated runtime and API. And re-using > multicast > >> just to support broadcast doesn’t have much sense: > >> > >> 1. It’s a bit obfuscated. It’s easier to understand > >> collectBroadcast(record) or broadcastEmit(record) compared to some > >> multicast channel selector that just happens to return all of the > channels. > >> 2. There are performance benefits of explicitly calling > >> `RecordWriter#broadcastEmit`. > >> > >> > >> On a different note, what would be the semantic of such broadcast emit > on > >> KeyedStream? Would it be supported? Or would we limit support only to > the > >> non-keyed streams? > >> > >> Piotrek > >> > >>> On 23 Aug 2019, at 12:48, Zhu Zhu wrote: > >>> > >>> Thanks Piotr, > >>> > >>> Users asked for this feature sometimes ago when they migrating batch > >> jobs to Flink(Blink). > >>> It's not very urgent as they have taken some workarounds to solve > >> it.(like partitioning data set to different job vertices) > >>> So it's fine to not make it top priority. > >>> > >>> Anyway, as a commonly known scenario, I think users can benefit from >
Re: [DISCUSS] Add ARM CI build to Flink (information-only)
Thanks for Stephan to bring up this topic. The package build jobs work well now. I have a simple online demo which is built and ran on a ARM VM. Feel free to have a try[1]. As the first step for ARM support, maybe it's good to add them now. While for the next step, the test part is still broken. It relates to some points we find: 1. Some unit tests are failed[1] by Java coding. These kind of failure can be fixed easily. 2. Some tests are failed by depending on third part libaraies[2]. It includes frocksdb, MapR Client and Netty. They don't have ARM release. a. Frocksdb: I'm testing it locally now by `make check_some` and `make jtest` similar with its travis job. There are 3 tests failed by `make check_some`. Please see the ticket for more details. Once the test pass, frocksdb can release ARM package then. b. MapR Client. This belongs to MapR company. At this moment, maybe we should skip MapR support for Flink ARM. c. Netty. Actually Netty runs well on our ARM machine. We will ask Netty community to release ARM support. If they do not want, OpenLab will handle a Maven Repository for some common libraries on ARM. For Chesnay's concern: Firstly, OpenLab team will keep maintaining and fixing ARM CI. It means that once build or test fails, we'll fix it at once. Secondly, OpenLab can provide ARM VMs to everyone for reproducing and testing. You just need to creat a Test Request issue in openlab[1]. Then we'll create ARM VMs for you, you can login and do the thing you want. Does it make sense? [1]: http://114.115.168.52:8081/#/overview [1]: https://issues.apache.org/jira/browse/FLINK-13449 https://issues.apache.org/jira/browse/FLINK-13450 [2]: https://issues.apache.org/jira/browse/FLINK-13598 [3]: https://github.com/theopenlab/openlab/issues/new/choose Chesnay Schepler 于2019年8月24日周六 上午12:10写道: > I'm wondering what we are supposed to do if the build fails? > We aren't providing and guides on setting up an arm dev environment; so > reproducing it locally isn't possible. > > On 23/08/2019 17:55, Stephan Ewen wrote: > > Hi all! > > > > As part of the Flink on ARM effort, there is a pull request that > triggers a > > build on OpenLabs CI for each push and runs tests on ARM machines. > > > > Currently that build is roughly equivalent to what the "core" and "tests" > > profiles do on Travis. > > The result will be posted to the PR comments, similar to the Flink Bot's > > Travis build result. > > The build currently passes :-) so Flink seems to be okay on ARM. > > > > My suggestion would be to try and add this and gather some experience > with > > it. > > The Travis build results should be our "ground truth" and the ARM CI > > (openlabs CI) would be "informational only" at the beginning, but helping > > us understand when we break ARM support. > > > > You can see this in the PR that adds the openlabs CI config: > > https://github.com/apache/flink/pull/9416 > > > > Any objections? > > > > Best, > > Stephan > > > >
Re: [DISCUSS] Use Java's Duration instead of Flink's Time
+1 to use Java's Duration instead of Flink's Time. Regarding to the Duration parsing, we have mentioned this in FLIP-54[1] to use `org.apache.flink.util.TimeUtils` for the parsing. Best, Jark [1]: https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit#heading=h.egdwkc93dn1k On Sat, 24 Aug 2019 at 18:24, Zhu Zhu wrote: > +1 since Java Duration is more common and powerful than Flink Time. > > For whether to drop scala Duration for parsing duration OptionConfig, I > think it's another question and should be discussed in another thread. > > Thanks, > Zhu Zhu > > Becket Qin 于2019年8月24日周六 下午4:16写道: > > > +1, makes sense. BTW, we probably need a FLIP as this is a public API > > change. > > > > On Sat, Aug 24, 2019 at 8:11 AM SHI Xiaogang > > wrote: > > > > > +1 to replace Flink's time with Java's Duration. > > > > > > Besides, i also suggest to use Java's Instant for "point-in-time". > > > It can take care of time units when we calculate Duration between > > different > > > instants. > > > > > > Regards, > > > Xiaogang > > > > > > Zili Chen 于2019年8月24日周六 上午10:45写道: > > > > > > > Hi vino, > > > > > > > > I agree that it introduces extra complexity to replace > Duration(Scala) > > > > with Duration(Java) *in Scala code*. We could separate the usage for > > each > > > > language and use a bridge when necessary. > > > > > > > > As a matter of fact, Scala concurrent APIs(including Duration) are > used > > > > more than necessary at least in flink-runtime. Also we even try to > make > > > > flink-runtime scala free. > > > > > > > > Best, > > > > tison. > > > > > > > > > > > > vino yang 于2019年8月24日周六 上午10:05写道: > > > > > > > > > +1 to replace the Time class provided by Flink with Java's > Duration: > > > > > > > > > > > > > > >- Java's Duration has better representation than the Flink's > Time > > > > class; > > > > >- As a built-in Java class, Duration class has a clear advantage > > > over > > > > >Java's Time class when interacting with other Java APIs and > > > > third-party > > > > >libraries; > > > > > > > > > > > > > > > But I have reservations about replacing the Duration and > FineDuration > > > > > classes in scala with the Duration class in Java. Java and Scala > have > > > > > different types of systems. Currently, Duration (scala) and > > > FineDuration > > > > > (scala) work well. In addition, this work brings additional > > complexity > > > > and > > > > > cost compared to the gains obtained. > > > > > > > > > > Best, > > > > > Vino > > > > > > > > > > Zili Chen 于2019年8月23日周五 下午11:14写道: > > > > > > > > > > > Hi Stephan, > > > > > > > > > > > > I like the idea unify usage of time/duration api. We actually > > > > > > use at least five different classes for this purposes(see below). > > > > > > > > > > > > One thing I'd like to pick up is that duration configuration > > > > > > in Flink is almost in pattern as "60 s" that fits in the pattern > > > > > > parsed by scala.concurrent.duration.Duration. AFAIK Duration > > > > > > in Java 8 doesn't support this pattern. However, we can solve > > > > > > it by introduce a DurationUtils. > > > > > > > > > > > > Also to clarify, we now have (correct me if any other) > > > > > > > > > > > > java.time.Duration > > > > > > scala.concurrent.duration.Duration > > > > > > scala.concurrent.duration.FiniteDuration > > > > > > org.apache.flink.api.common.time.Time > > > > > > org.apache.flink.streaming.api.windowing.time.Time > > > > > > > > > > > > in use. If we'd prefer java.time.Duration, it is worth to > consider > > > > > > whether we unify all of them into Java's Duration, i.e., Java's > > > > > > Duration is the first class time/duration api, while others > should > > > > > > be converted into or out from it. > > > > > > > > > > > > Best, > > > > > > tison. > > > > > > > > > > > > > > > > > > Stephan Ewen 于2019年8月23日周五 下午10:45写道: > > > > > > > > > > > > > Hi all! > > > > > > > > > > > > > > Many parts of the code use Flink's "Time" class. The Time > really > > > is a > > > > > > "time > > > > > > > interval" or a "Duration". > > > > > > > > > > > > > > Since Java 8, there is a Java class "Duration" that is nice and > > > > > flexible > > > > > > to > > > > > > > use. > > > > > > > I would suggest we start using Java Duration instead and drop > > Time > > > as > > > > > > much > > > > > > > as possible in the runtime from now on. > > > > > > > > > > > > > > Maybe even drop that class from the API in Flink 2.0. > > > > > > > > > > > > > > Best, > > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] Flink Python User-Defined Function for Table API
Thanks for your feedback Hequn & Dian. Dian, I am glad to see that you want help to create the FLIP! Everyone will have first time, and I am very willing to help you complete your first FLIP creation. Here some tips: - First I'll give your account write permission for confluence. - Before create the FLIP, please have look at the FLIP Template [1], (It's better to know more about FLIP by reading [2]) - Create Flink Python UDFs related JIRAs after completing the VOTE of FLIP.(I think you also can bring up the VOTE thread, if you want! ) Any problems you encounter during this period,feel free to tell me that we can solve them together. :) Best, Jincheng [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals Hequn Cheng 于2019年8月23日周五 上午11:54写道: > +1 for starting the vote. > > Thanks Jincheng a lot for the discussion. > > Best, Hequn > > On Fri, Aug 23, 2019 at 10:06 AM Dian Fu wrote: > > > Hi Jincheng, > > > > +1 to start the FLIP create and VOTE on this feature. I'm willing to help > > on the FLIP create if you don't mind. As I haven't created a FLIP before, > > it will be great if you could help on this. :) > > > > Regards, > > Dian > > > > > 在 2019年8月22日,下午11:41,jincheng sun 写道: > > > > > > Hi all, > > > > > > Thanks a lot for your feedback. If there are no more suggestions and > > > comments, I think it's better to initiate a vote to create a FLIP for > > > Apache Flink Python UDFs. > > > What do you think? > > > > > > Best, Jincheng > > > > > > jincheng sun 于2019年8月15日周四 上午12:54写道: > > > > > >> Hi Thomas, > > >> > > >> Thanks for your confirmation and the very important reminder about > > bundle > > >> processing. > > >> > > >> I have had add the description about how to perform bundle processing > > from > > >> the perspective of checkpoint and watermark. Feel free to leave > > comments if > > >> there are anything not describe clearly. > > >> > > >> Best, > > >> Jincheng > > >> > > >> > > >> Dian Fu 于2019年8月14日周三 上午10:08写道: > > >> > > >>> Hi Thomas, > > >>> > > >>> Thanks a lot the suggestions. > > >>> > > >>> Regarding to bundle processing, there is a section "Checkpoint"[1] in > > the > > >>> design doc which talks about how to handle the checkpoint. > > >>> However, I think you are right that we should talk more about it, > such > > as > > >>> what's bundle processing, how it affects the checkpoint and > watermark, > > how > > >>> to handle the checkpoint and watermark, etc. > > >>> > > >>> [1] > > >>> > > > https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 > > >>> < > > >>> > > > https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 > > > > >>> > > >>> Regards, > > >>> Dian > > >>> > > 在 2019年8月14日,上午1:01,Thomas Weise 写道: > > > > Hi Jincheng, > > > > Thanks for putting this together. The proposal is very detailed, > > >>> thorough > > and for me as a Beam Flink runner contributor easy to understand :) > > > > One thing that you should probably detail more is the bundle > > >>> processing. It > > is critically important for performance that multiple elements are > > processed in a bundle. The default bundle size in the Flink runner > is > > >>> 1s or > > 1000 elements, whichever comes first. And for streaming, you can > find > > >>> the > > logic necessary to align the bundle processing with watermarks and > > checkpointing here: > > > > >>> > > > https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java > > > > Thomas > > > > > > > > > > > > > > > > On Tue, Aug 13, 2019 at 7:05 AM jincheng sun < > > sunjincheng...@gmail.com> > > wrote: > > > > > Hi all, > > > > > > The Python Table API(without Python UDF support) has already been > > >>> supported > > > and will be available in the coming release 1.9. > > > As Python UDF is very important for Python users, we'd like to > start > > >>> the > > > discussion about the Python UDF support in the Python Table API. > > > Aljoscha Krettek, Dian Fu and I have discussed offline and have > > >>> drafted a > > > design doc[1]. It includes the following items: > > > > > > - The user-defined function interfaces. > > > - The user-defined function execution architecture. > > > > > > As mentioned by many guys in the previous discussion thread[2], a > > > portability framework was introduced in Apache Beam in latest > > >>> releases. It > > > provides well-defined, language-neutral data structures and > protocols > > >>> for > > > language-neutral user-defined function execution. This design is > > based > > >>> on > > > Beam's portab
Re: [DISCUSS] Flink client api enhancement for downstream project
Hi Zili, It make sense to me that a dedicated cluster is started for a per-job cluster and will not accept more jobs. Just have a question about the command line. Currently we could use the following commands to start different clusters. *per-job cluster* ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster examples/streaming/WindowJoin.jar *session cluster* ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster examples/streaming/WindowJoin.jar What will it look like after client enhancement? Best, Yang Zili Chen 于2019年8月23日周五 下午10:46写道: > Hi Till, > > Thanks for your update. Nice to hear :-) > > Best, > tison. > > > Till Rohrmann 于2019年8月23日周五 下午10:39写道: > > > Hi Tison, > > > > just a quick comment concerning the class loading issues when using the > per > > job mode. The community wants to change it so that the > > StandaloneJobClusterEntryPoint actually uses the user code class loader > > with child first class loading [1]. Hence, I hope that this problem will > be > > resolved soon. > > > > [1] https://issues.apache.org/jira/browse/FLINK-13840 > > > > Cheers, > > Till > > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas > wrote: > > > > > Hi all, > > > > > > On the topic of web submission, I agree with Till that it only seems > > > to complicate things. > > > It is bad for security, job isolation (anybody can submit/cancel jobs), > > > and its > > > implementation complicates some parts of the code. So, if it were to > > > redesign the > > > WebUI, maybe this part could be left out. In addition, I would say > > > that the ability to cancel > > > jobs could also be left out. > > > > > > Also I would also be in favour of removing the "detached" mode, for > > > the reasons mentioned > > > above (i.e. because now we will have a future representing the result > > > on which the user > > > can choose to wait or not). > > > > > > Now for the separating job submission and cluster creation, I am in > > > favour of keeping both. > > > Once again, the reasons are mentioned above by Stephan, Till, Aljoscha > > > and also Zili seems > > > to agree. They mainly have to do with security, isolation and ease of > > > resource management > > > for the user as he knows that "when my job is done, everything will be > > > cleared up". This is > > > also the experience you get when launching a process on your local OS. > > > > > > On excluding the per-job mode from returning a JobClient or not, I > > > believe that eventually > > > it would be nice to allow users to get back a jobClient. The reason is > > > that 1) I cannot > > > find any objective reason why the user-experience should diverge, and > > > 2) this will be the > > > way that the user will be able to interact with his running job. > > > Assuming that the necessary > > > ports are open for the REST API to work, then I think that the > > > JobClient can run against the > > > REST API without problems. If the needed ports are not open, then we > > > are safe to not return > > > a JobClient, as the user explicitly chose to close all points of > > > communication to his running job. > > > > > > On the topic of not hijacking the "env.execute()" in order to get the > > > Plan, I definitely agree but > > > for the proposal of having a "compile()" method in the env, I would > > > like to have a better look at > > > the existing code. > > > > > > Cheers, > > > Kostas > > > > > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen > wrote: > > > > > > > > Hi Yang, > > > > > > > > It would be helpful if you check Stephan's last comment, > > > > which states that isolation is important. > > > > > > > > For per-job mode, we run a dedicated cluster(maybe it > > > > should have been a couple of JM and TMs during FLIP-6 > > > > design) for a specific job. Thus the process is prevented > > > > from other jobs. > > > > > > > > In our cases there was a time we suffered from multi > > > > jobs submitted by different users and they affected > > > > each other so that all ran into an error state. Also, > > > > run the client inside the cluster could save client > > > > resource at some points. > > > > > > > > However, we also face several issues as you mentioned, > > > > that in per-job mode it always uses parent classloader > > > > thus classloading issues occur. > > > > > > > > BTW, one can makes an analogy between session/per-job mode > > > > in Flink, and client/cluster mode in Spark. > > > > > > > > Best, > > > > tison. > > > > > > > > > > > > Yang Wang 于2019年8月22日周四 上午11:25写道: > > > > > > > > > From the user's perspective, it is really confused about the scope > of > > > > > per-job cluster. > > > > > > > > > > > > > > > If it means a flink cluster with single job, so that we could get > > > better > > > > > isolation. > > > > > > > > > > Now it does not matter how we deploy the cluster, directly > > > deploy(mode1) > > > > > > > > > > or start a flink cluster and then submit job through cluster > > > client(mode2). > > > > > > > > > > > > > > > Otherwise, if
[jira] [Created] (FLINK-13847) Update release scripts to also update docs/_config.yml
Tzu-Li (Gordon) Tai created FLINK-13847: --- Summary: Update release scripts to also update docs/_config.yml Key: FLINK-13847 URL: https://issues.apache.org/jira/browse/FLINK-13847 Project: Flink Issue Type: Improvement Components: Documentation, Release System Reporter: Tzu-Li (Gordon) Tai Assignee: Tzu-Li (Gordon) Tai During the 1.9.0 release process, we missed quite a few configuration updates in {{docs/_config.yml}} related to Flink versions. This should be able to be done automatically in via the release scripts. A list of settings in that file that needs to be touched on every major release include: * version * version_title * github_branch * baseurl * stable_baseurl * javadocs_baseurl * pythondocs_baseurl * is_stable * Add new link to previous_docs This can probably be done via the {{tools/releasing/create_release_branch.sh}} script, which is used for every major release. We should also update the release guide in the project wiki to cover checking that file as an item in checklists. -- This message was sent by Atlassian Jira (v8.3.2#803003)
Re: [DISCUSS] Flink Python User-Defined Function for Table API
Hi Jincheng, Appreciated for the kind tips and offering of help. Definitely need it! Could you grant me write permission for confluence? My Id: Dian Fu Thanks, Dian > 在 2019年8月26日,上午9:53,jincheng sun 写道: > > Thanks for your feedback Hequn & Dian. > > Dian, I am glad to see that you want help to create the FLIP! > Everyone will have first time, and I am very willing to help you complete > your first FLIP creation. Here some tips: > > - First I'll give your account write permission for confluence. > - Before create the FLIP, please have look at the FLIP Template [1], (It's > better to know more about FLIP by reading [2]) > - Create Flink Python UDFs related JIRAs after completing the VOTE of > FLIP.(I think you also can bring up the VOTE thread, if you want! ) > > Any problems you encounter during this period,feel free to tell me that we > can solve them together. :) > > Best, > Jincheng > > > > > [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template > [2] > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > Hequn Cheng 于2019年8月23日周五 上午11:54写道: > >> +1 for starting the vote. >> >> Thanks Jincheng a lot for the discussion. >> >> Best, Hequn >> >> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu wrote: >> >>> Hi Jincheng, >>> >>> +1 to start the FLIP create and VOTE on this feature. I'm willing to help >>> on the FLIP create if you don't mind. As I haven't created a FLIP before, >>> it will be great if you could help on this. :) >>> >>> Regards, >>> Dian >>> 在 2019年8月22日,下午11:41,jincheng sun 写道: Hi all, Thanks a lot for your feedback. If there are no more suggestions and comments, I think it's better to initiate a vote to create a FLIP for Apache Flink Python UDFs. What do you think? Best, Jincheng jincheng sun 于2019年8月15日周四 上午12:54写道: > Hi Thomas, > > Thanks for your confirmation and the very important reminder about >>> bundle > processing. > > I have had add the description about how to perform bundle processing >>> from > the perspective of checkpoint and watermark. Feel free to leave >>> comments if > there are anything not describe clearly. > > Best, > Jincheng > > > Dian Fu 于2019年8月14日周三 上午10:08写道: > >> Hi Thomas, >> >> Thanks a lot the suggestions. >> >> Regarding to bundle processing, there is a section "Checkpoint"[1] in >>> the >> design doc which talks about how to handle the checkpoint. >> However, I think you are right that we should talk more about it, >> such >>> as >> what's bundle processing, how it affects the checkpoint and >> watermark, >>> how >> to handle the checkpoint and watermark, etc. >> >> [1] >> >>> >> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 >> < >> >>> >> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 >>> >> >> Regards, >> Dian >> >>> 在 2019年8月14日,上午1:01,Thomas Weise 写道: >>> >>> Hi Jincheng, >>> >>> Thanks for putting this together. The proposal is very detailed, >> thorough >>> and for me as a Beam Flink runner contributor easy to understand :) >>> >>> One thing that you should probably detail more is the bundle >> processing. It >>> is critically important for performance that multiple elements are >>> processed in a bundle. The default bundle size in the Flink runner >> is >> 1s or >>> 1000 elements, whichever comes first. And for streaming, you can >> find >> the >>> logic necessary to align the bundle processing with watermarks and >>> checkpointing here: >>> >> >>> >> https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java >>> >>> Thomas >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Aug 13, 2019 at 7:05 AM jincheng sun < >>> sunjincheng...@gmail.com> >>> wrote: >>> Hi all, The Python Table API(without Python UDF support) has already been >> supported and will be available in the coming release 1.9. As Python UDF is very important for Python users, we'd like to >> start >> the discussion about the Python UDF support in the Python Table API. Aljoscha Krettek, Dian Fu and I have discussed offline and have >> drafted a design doc[1]. It includes the following items: - The user-defined function interfaces. - The user-defined function execution architecture. As mentioned by many guys in the previous discussion thread[2], a portability framework was introduced in Apache Beam in latest
[jira] [Created] (FLINK-13848) Support “scheduleAtFixedRate/scheduleAtFixedDelay” in RpcEndpoint#MainThreadExecutor
Biao Liu created FLINK-13848: Summary: Support “scheduleAtFixedRate/scheduleAtFixedDelay” in RpcEndpoint#MainThreadExecutor Key: FLINK-13848 URL: https://issues.apache.org/jira/browse/FLINK-13848 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Biao Liu Fix For: 1.10.0 Currently the methods “scheduleAtFixedRate/scheduleAtFixedDelay" of {{RpcEndpoint#MainThreadExecutor}} are not implemented. Because there was no requirement on them before. Now we are planning to implement these methods to support periodic checkpoint triggering. -- This message was sent by Atlassian Jira (v8.3.2#803003)
Re: [DISCUSS] Setup a bui...@flink.apache.org mailing list for travis builds
Hi all, Sorry it take so long to get back. I have some good news. After some investigation and development and the help from Chesnay, we finally integrated Travis build notification with bui...@flink.apache.org mailing list with remaining the beautiful formatting! Currently, only the failure and failure->success builds will be notified, only builds (include CRON) on apache/flink branches will be notified, the pull request builds will not be notified. The builds mailing list is also available in Flink website community page [1] I would encourage devs to subscribe the builds mailing list, and help the community to pay more attention to the build status, especially the CRON builds. Feel free to leave your suggestions and feedbacks here! # The implementation detail: I implemented a flink-notification-bot[2] to receive Travis webhook[3] payload and generate an HTML email and send the email to bui...@flink.apache.org. The flink-notification-bot is deployed on my own VM in DigitalOcean. You can refer the github page [2] of the project to learn more details about the implementation and deployment. Btw, I'm glad to contribute the project to https://github.com/flink-ci or https://github.com/flinkbot if the community accepts. With the flink-notification-bot, we can easily integrate it with other CI service or our own CI, and we can also integrate it with some other applications (e.g. DingTalk). # Rejected Alternative: Option#1: Sending email notifications via "Travis Email Notification"[4]. Reasons: - If the emailing notification is set, Travis CI only sends an emails to the addresses specified there, rather than to the committer and author. - We will lose the beautiful email formatting when Travis send Email to builds ML. - The return-path of emails from Travis CI is not constant, which makes it difficult for mailing list to accept it. Cheers, Jark [1]: https://flink.apache.org/community.html#mailing-lists [2]: https://github.com/wuchong/flink-notification-bot [3]: https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications [4]: https://docs.travis-ci.com/user/notifications/#configuring-email-notifications On Tue, 30 Jul 2019 at 18:35, Jark Wu wrote: > Hi all, > > Progress updates: > 1. the bui...@flink.apache.org can be subscribed now (thanks @Robert), > you can send an email to builds-subscr...@flink.apache.org to subscribe. > 2. We have a pull request [1] to send only apache/flink builds > notifications and it works well. > 3. However, all the notifications are rejected by the builds mailing list > (the MODERATE mails). > I added & checked bui...@travis-ci.org to the subscriber/allow list, > but still doesn't work. It might be recognized as spam by the mailing list. > We are still trying to figure it out, and will update here if we have > some progress. > > > Thanks, > Jark > > > > [1]: https://github.com/apache/flink/pull/9230 > > > On Thu, 25 Jul 2019 at 22:59, Robert Metzger wrote: > >> The mailing list has been created, you can now subscribe to it. >> >> On Wed, Jul 24, 2019 at 1:43 PM Jark Wu wrote: >> >> > Thanks Robert for helping out that. >> > >> > Best, >> > Jark >> > >> > On Wed, 24 Jul 2019 at 19:16, Robert Metzger >> wrote: >> > >> > > I've requested the creation of the list, and made Jark, Chesnay and me >> > > moderators of it. >> > > >> > > On Wed, Jul 24, 2019 at 1:12 PM Robert Metzger >> > > wrote: >> > > >> > > > @Jark: Yes, I will request the creation of a mailing list! >> > > > >> > > > On Tue, Jul 23, 2019 at 4:48 PM Hugo Louro >> wrote: >> > > > >> > > >> +1 >> > > >> >> > > >> > On Jul 23, 2019, at 6:15 AM, Till Rohrmann > > >> > > >> wrote: >> > > >> > >> > > >> > Good idea Jark. +1 for the proposal. >> > > >> > >> > > >> > Cheers, >> > > >> > Till >> > > >> > >> > > >> >> On Tue, Jul 23, 2019 at 1:59 PM Hequn Cheng < >> chenghe...@gmail.com> >> > > >> wrote: >> > > >> >> >> > > >> >> Hi Jark, >> > > >> >> >> > > >> >> Good idea. +1! >> > > >> >> >> > > >> >>> On Tue, Jul 23, 2019 at 6:23 PM Jark Wu >> wrote: >> > > >> >>> >> > > >> >>> Thank you all for your positive feedback. >> > > >> >>> >> > > >> >>> We have three binding +1s, so I think, we can proceed with >> this. >> > > >> >>> >> > > >> >>> Hi @Robert Metzger , could you create a >> > > >> request to >> > > >> >>> INFRA for the mailing list? >> > > >> >>> I'm not sure if this needs a PMC permission. >> > > >> >>> >> > > >> >>> Thanks, >> > > >> >>> Jark >> > > >> >>> >> > > >> >>> On Tue, 23 Jul 2019 at 16:42, jincheng sun < >> > > sunjincheng...@gmail.com> >> > > >> >>> wrote: >> > > >> >>> >> > > >> +1 >> > > >> >> > > >> Robert Metzger 于2019年7月23日周二 下午4:01写道: >> > > >> >> > > >> > +1 >> > > >> > >> > > >> > On Mon, Jul 22, 2019 at 10:27 AM Biao Liu < >> mmyy1...@gmail.com> >> > > >> >> wrote: >> > > >> > >> > > >> >> +1, make sense to me. >> > > >> >> Mailin
Re: [DISCUSS] Enhance Support for Multicast Communication Pattern
Thanks Yun for bringing up this discussion and very thanks for all the deep thoughts! For now, I think this discussion contains two scenarios: one if for iteration library support and the other is for SQL join support. I think both of the two scenarios are useful but they seem to have different best suitable solutions. For making the discussion more clear, I would suggest to split the discussion into two threads. And I agree with Piotr that it is very tricky that a keyed stream received a "broadcast element". So we may add some new interfaces, which could broadcast or process some special "broadcast event". In that way "broadcast event" will not be sent with the normal process. Best, Guowei SHI Xiaogang 于2019年8月26日周一 上午9:27写道: > Hi all, > > I also think that multicasting is a necessity in Flink, but more details > are needed to be considered. > > Currently network is tightly coupled with states in Flink to achieve > automatic scaling. We can only access keyed states in keyed streams and > operator states in all streams. > In the concrete example of theta-joins implemented with mutlticasting, the > following questions exist: > >- In which type of states will the data be stored? Do we need another >type of states which is coupled with multicasting streams? >- How to ensure the consistency between network and states when jobs >scale out or scale in? > > Regards, > Xiaogang > > Xingcan Cui 于2019年8月25日周日 上午10:03写道: > > > Hi all, > > > > Sorry for joining this thread late. Basically, I think enabling multicast > > pattern could be the right direction, but more detailed implementation > > policies need to be discussed. > > > > Two years ago, I filed an issue [1] about the multicast API. However, due > > to some reasons, it was laid aside. After that, when I tried to > cherry-pick > > the change for experimental use, I found the return type of > > `selectChannels()` method had changed from `int[]` to `int`, which makes > > the old implementation not work anymore. > > > > From my side, the multicast has always been used for theta-join. As far > as > > I know, it’s an essential requirement for some sophisticated joining > > algorithms. Until now, the Flink non-equi joins can still only be > executed > > single-threaded. If we'd like to make some improvements on this, we > should > > first take some measures to support multicast pattern. > > > > Best, > > Xingcan > > > > [1] https://issues.apache.org/jira/browse/FLINK-6936 > > > > > On Aug 24, 2019, at 5:54 AM, Zhu Zhu wrote: > > > > > > Hi Piotr, > > > > > > Thanks for the explanation. > > > Agreed that the broadcastEmit(record) is a better choice for > broadcasting > > > for the iterations. > > > As broadcasting for the iterations is the first motivation, let's > support > > > it first. > > > > > > Thanks, > > > Zhu Zhu > > > > > > Yun Gao 于2019年8月23日周五 下午11:56写道: > > > > > >> Hi Piotr, > > >> > > >> Very thanks for the suggestions! > > >> > > >> Totally agree with that we could first focus on the broadcast > > >> scenarios and exposing the broadcastEmit method first considering the > > >> semantics and performance. > > >> > > >> For the keyed stream, I also agree with that broadcasting keyed > > >> records to all the tasks may be confused considering the semantics of > > keyed > > >> partitioner. However, in the iteration case supporting broadcast over > > keyed > > >> partitioner should be required since users may create any subgraph for > > the > > >> iteration body, including the operators with key. I think a possible > > >> solution to this issue is to introduce another data type for > > >> 'broadcastEmit'. For example, for an operator Operator, it may > > broadcast > > >> emit another type E instead of T, and the transmitting E will bypass > the > > >> partitioner and setting keyed context. This should result in the > design > > to > > >> introduce customized operator event (option 1 in the document). The > > cost of > > >> this method is that we need to introduce a new type of StreamElement > and > > >> new interface for this type, but it should be suitable for both keyed > or > > >> non-keyed partitioner. > > >> > > >> Best, > > >> Yun > > >> > > >> > > >> > > >> -- > > >> From:Piotr Nowojski > > >> Send Time:2019 Aug. 23 (Fri.) 22:29 > > >> To:Zhu Zhu > > >> Cc:dev ; Yun Gao > > >> Subject:Re: [DISCUSS] Enhance Support for Multicast Communication > > Pattern > > >> > > >> Hi, > > >> > > >> If the primary motivation is broadcasting (for the iterations) and we > > have > > >> no immediate need for multicast (cross join), I would prefer to first > > >> expose broadcast via the DataStream API and only later, once we > finally > > >> need it, support multicast. As I wrote, multicast would be more > > challenging > > >> to implement, with more complicated runtime and API. And re-using > > multicast > > >> just to support broadcast doesn’t have much sense
[jira] [Created] (FLINK-13849) The back-pressure monitoring tab in Web UI may cause errors
Xingcan Cui created FLINK-13849: --- Summary: The back-pressure monitoring tab in Web UI may cause errors Key: FLINK-13849 URL: https://issues.apache.org/jira/browse/FLINK-13849 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Affects Versions: 1.9.0 Reporter: Xingcan Cui Clicking the back-pressure monitoring tab for a finished job in Web UI will cause an internal server error. The exceptions are as follows. {code:java} 2019-08-26 01:23:54,845 ERROR org.apache.flink.runtime.rest.handler.job.JobVertexBackPressureHandler - Unhandled exception. org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find Flink job (09e107685e0b81b443b556062debb443) at org.apache.flink.runtime.dispatcher.Dispatcher.getJobMasterGatewayFuture(Dispatcher.java:825) at org.apache.flink.runtime.dispatcher.Dispatcher.requestOperatorBackPressureStats(Dispatcher.java:524) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:279) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:194) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
Re: [DISSCUSS] Tolerate temporarily suspended ZooKeeper connections
Hi Till, I'd like to revive this thread since 1.9.0 has been released. IMHO we already reached a consensus on JIRA and if you can review the pull request we hopefully address the issue in next release. Best, tison. Zili Chen 于2019年7月29日周一 下午11:05写道: > Hi Till, > > Thanks for your explanation. Let's pick up this thread in 1.10 developing. > > Best, > tison. > > > Till Rohrmann 于2019年7月29日周一 下午9:12写道: > >> Hi Tison, >> >> I would consider this a new feature and as such it won't be possible to >> include it in the 1.9.0 release since the feature freeze has been passed. >> We might target 1.10, though. >> >> Cheers, >> Till >> >> On Mon, Jul 29, 2019 at 3:01 AM Zili Chen wrote: >> >> > Hi committers, >> > >> > Now that we have an ongoing pr[1] to this JIRA, we need a committer >> > to push this thread forward. It would be glad to see this issue fixed >> > in 1.9.0. >> > >> > Best, >> > tison. >> > >> > [1] https://github.com/apache/flink/pull/9158 >> > >> > >> > 未来阳光 <2217232...@qq.com> 于2019年7月23日周二 下午9:28写道: >> > >> > > Ok, If you have any suggestions, we can talk aobut the details under >> > > FLINK-10052. >> > > >> > > >> > > Best. >> > > >> > > >> > > -- 原始邮件 -- >> > > 发件人: "Till Rohrmann"; >> > > 发送时间: 2019年7月23日(星期二) 晚上9:19 >> > > 收件人: "dev"; >> > > >> > > 主题: Re: [DISSCUSS] Tolerate temporarily suspended ZooKeeper >> connections >> > > >> > > >> > > >> > > Hi Lamber-Ken, >> > > >> > > thanks for starting this discussion. I think there is benefit of not >> > > directly losing leadership if the ZooKeeper connection goes into the >> > > SUSPENDED state. In particular if we can guarantee that there is only >> a >> > > single JobMaster, it might make sense to not overly eagerly give up >> > > leadership. I would suggest to continue the technical discussion on >> the >> > > JIRA issue thread since it already contains a good amount of details. >> > > >> > > Cheers, >> > > Till >> > > >> > > On Sat, Jul 20, 2019 at 12:55 PM QQ邮箱 <2217232...@qq.com> wrote: >> > > >> > > > Hi All, >> > > > >> > > > Desc >> > > > We deploy flink streaming jobs on hadoop cluster on per-job model >> and >> > use >> > > > zookeeper as HighAvailabilityService, but we found that flink job >> will >> > > > restart because of the network disconnected temporarily between >> > > jobmanager >> > > > and zookeeper.So we analyze this problem deeply. Flink JobManager >> use >> > > > curator's `LeaderLatch` to maintain the leadership. When network >> > > > disconncet, the `LeaderLatch` will change leadership to false >> directly. >> > > We >> > > > think it's too brutally that many flink longrunning jobs will >> restart >> > > > because of the network shake.Instead of directly revoking the >> > leadership >> > > > upon a SUSPENDED ZooKeeper connection, it would be better to wait >> until >> > > the >> > > > ZooKeeper connection is LOST. >> > > > >> > > > Here're two jiras about the problem, FLINK-10052 and FLINK-13189, >> they >> > > are >> > > > duplicate. Thanks to @Elias Levy told us that FLINK-13189, so close >> > > > FLINK-13189. >> > > > >> > > > Solution >> > > > Back to this problem, there're two ways to solve this currently, >> one is >> > > > rewrite LeaderLatch#handleStateChange method, another is upgrade >> > > > curator-4.2.0. The first way is hackly but right, the second way >> need >> > to >> > > > consider the >> > > > compatibility. For more detail, please see FLINK-10052. >> > > > >> > > > Hope >> > > > The FLINK-10052 was reported at 2018-08-03(about a year ago), so we >> > hope >> > > > this problem can fix as soon as possible. >> > > > btw, thanks @TisonKun for talking about this problem and review pr. >> > > > >> > > > Links >> > > > FLINK-10052 https://issues.apache.org/jira/browse/FLINK-10052 < >> > > > https://issues.apache.org/jira/browse/FLINK-10052> >> > > > FLINK-13189 https://issues.apache.org/jira/browse/FLINK-13189 < >> > > > https://issues.apache.org/jira/browse/FLINK-13189> >> > > > >> > > > Any suggestion is welcome, what do you think? >> > > > >> > > > Best, lamber-ken. >> > >> >
Re: [DISCUSS] Add ARM CI build to Flink (information-only)
I'm sorry, but if these issues are only fixed later anyway I see no reason to run these tests on each PR. We're just adding noise to each PR that everyone will just ignore. I'm curious as to the benefit of having this directly in Flink; why aren't the ARM builds run outside of the Flink project, and fixes for it provided? It seems to me like nothing about these arm builds is actually handled by the Flink project. On 26/08/2019 03:43, Xiyuan Wang wrote: Thanks for Stephan to bring up this topic. The package build jobs work well now. I have a simple online demo which is built and ran on a ARM VM. Feel free to have a try[1]. As the first step for ARM support, maybe it's good to add them now. While for the next step, the test part is still broken. It relates to some points we find: 1. Some unit tests are failed[1] by Java coding. These kind of failure can be fixed easily. 2. Some tests are failed by depending on third part libaraies[2]. It includes frocksdb, MapR Client and Netty. They don't have ARM release. a. Frocksdb: I'm testing it locally now by `make check_some` and `make jtest` similar with its travis job. There are 3 tests failed by `make check_some`. Please see the ticket for more details. Once the test pass, frocksdb can release ARM package then. b. MapR Client. This belongs to MapR company. At this moment, maybe we should skip MapR support for Flink ARM. c. Netty. Actually Netty runs well on our ARM machine. We will ask Netty community to release ARM support. If they do not want, OpenLab will handle a Maven Repository for some common libraries on ARM. For Chesnay's concern: Firstly, OpenLab team will keep maintaining and fixing ARM CI. It means that once build or test fails, we'll fix it at once. Secondly, OpenLab can provide ARM VMs to everyone for reproducing and testing. You just need to creat a Test Request issue in openlab[1]. Then we'll create ARM VMs for you, you can login and do the thing you want. Does it make sense? [1]: http://114.115.168.52:8081/#/overview [1]: https://issues.apache.org/jira/browse/FLINK-13449 https://issues.apache.org/jira/browse/FLINK-13450 [2]: https://issues.apache.org/jira/browse/FLINK-13598 [3]: https://github.com/theopenlab/openlab/issues/new/choose Chesnay Schepler 于2019年8月24日周六 上午12:10写道: I'm wondering what we are supposed to do if the build fails? We aren't providing and guides on setting up an arm dev environment; so reproducing it locally isn't possible. On 23/08/2019 17:55, Stephan Ewen wrote: Hi all! As part of the Flink on ARM effort, there is a pull request that triggers a build on OpenLabs CI for each push and runs tests on ARM machines. Currently that build is roughly equivalent to what the "core" and "tests" profiles do on Travis. The result will be posted to the PR comments, similar to the Flink Bot's Travis build result. The build currently passes :-) so Flink seems to be okay on ARM. My suggestion would be to try and add this and gather some experience with it. The Travis build results should be our "ground truth" and the ARM CI (openlabs CI) would be "informational only" at the beginning, but helping us understand when we break ARM support. You can see this in the PR that adds the openlabs CI config: https://github.com/apache/flink/pull/9416 Any objections? Best, Stephan