Re: [DISCUSS] Embracing Table API in Flink ML

2018-11-20 Thread Chen Qin
Hi Yun, Very excited to see Flink ML forward! There are many touch points your document touched. I couldn't agree more the value of having a (unified) table API could bring to Flink ecosystem towards running ML workload. Most ML pipelines we observed starts from single box python scripts or adhoc

Re: [DISCUSS] Support Interactive Programming in Flink Table API

2018-11-20 Thread Becket Qin
Hi Weihua, Thanks for the comments. These are great questions! To answer question 1, I think it depends on what do we want from the cache service. At this point, it is not quite clear to me whether Flink needs different caching levels. For example, in Spark, the memory level caching are mostly us

Re: [DISCUSS] Support Interactive Programming in Flink Table API

2018-11-20 Thread Shaoxuan Wang
Hi Xingcan, Thanks for the comments. Yes, "cache/persistent the intermediate data" is useful. It can bring benefit to many scenarios. But different scenarios may have different ways to solve it. For instance, as I replied to http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Em

Re: [DISCUSS] Support Interactive Programming in Flink Table API

2018-11-20 Thread Becket Qin
Hi Xingcan, Thanks for the feedback. Adding the cache to DataSet is useful. In fact, the current proposal does not assume the "PersistService" can only be used by the Table. We can always add DataSet.cache() and let it benefit from the underlying persistency support. So it seems more of a wording

Re: [DISCUSS] Support Interactive Programming in Flink Table API

2018-11-20 Thread Weihua Jiang
Hi Becket, The design is quite interesting and useful. I have several questions about your design: 1. Shall we add some persistence level hint to cache() function for different temperature data? E.g. IN_MEM, IN_DISK, etc, or HOTTEST, HOT, WARM, COLD? 2. When will the corresponding cached data be

Re: [DISCUSS] Embracing Table API in Flink ML

2018-11-20 Thread Weihua Jiang
Hi Yun, Can't wait to see your design. Thanks Weihua Yun Gao 于2018年11月21日周三 上午12:43写道: > Hi Weihua, > > Thanks for the exciting proposal! > > I have quickly read through it, and I really appropriate the idea of > providing the ML Pipeline API similar to the commonly used library > sci

[jira] [Created] (FLINK-10956) Reuse same rexCall during codegen

2018-11-20 Thread Hequn Cheng (JIRA)
Hequn Cheng created FLINK-10956: --- Summary: Reuse same rexCall during codegen Key: FLINK-10956 URL: https://issues.apache.org/jira/browse/FLINK-10956 Project: Flink Issue Type: Improvement

Re: [DISCUSS] Embracing Table API in Flink ML

2018-11-20 Thread Weihua Jiang
Hi Shaoxuan, You are perfectly right. What I want to achieve is a combination of all your 3 points. Let me rephrase here: 1. Define a Table based ML Pipeline interface to have the same functionality as current DataSet based implementations. 2. Support new features like online learning, streaming i

Re: [DISCUSS] Embracing Table API in Flink ML

2018-11-20 Thread Weihua Jiang
HI Becket, Thanks a lot for the Table API enhancement design doc. I am working on some simple ML algorithm using this new ML pipeline. Will feedback you if there is any Table enhancement needed. Thanks Weihua Becket Qin 于2018年11月20日周二 下午10:43写道: > Hi Weihua, > > Thanks for the well written

Re: [DISCUSS] Embracing Table API in Flink ML

2018-11-20 Thread Weihua Jiang
Hi Jincheng, Thanks a lot for the warm feedback. I've already read your Table API enhancement google doc. Those enhancements are essential to implement any ML/DL algorithm on Table API. Our two designs are perfectly complementary to each other. :) Will add a section in my google doc for the impl

Question: Flink JIRA issues for simple code comment improvements

2018-11-20 Thread Miguel Coimbra
Hello, I want to start contributing to the Apache Flink code base and have a question. While reading the code, I found small inconsistencies in the way comments are written (highly irrelevant but still noticeable) such as some function argument comments ending in a period while others do not, am

[jira] [Created] (FLINK-10955) Extend release notes for Flink 1.7

2018-11-20 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10955: - Summary: Extend release notes for Flink 1.7 Key: FLINK-10955 URL: https://issues.apache.org/jira/browse/FLINK-10955 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-10954) Hardlink from files of previous local stored state might cross devices

2018-11-20 Thread Yun Tang (JIRA)
Yun Tang created FLINK-10954: Summary: Hardlink from files of previous local stored state might cross devices Key: FLINK-10954 URL: https://issues.apache.org/jira/browse/FLINK-10954 Project: Flink

Re: [DISCUSS] Support Interactive Programming in Flink Table API

2018-11-20 Thread Xingcan Cui
Hi Becket, Thanks for bringing this up! For a long time, the intermediate cache problem has always been a pain point of the Flink streaming model. As far as I know, it’s quite a block for iterate operations in batch-related libs such as Gelly and FlinkML. Actually, there’s an old JIRA[1], aim

[ANNOUNCE] Flink Forward San Francisco Call for Presentations closes soon

2018-11-20 Thread Fabian Hueske
Hi Everyone, Flink Forward San Francisco will *take place on April 1st and 2nd 2019*. Flink Forward is a community conference organized by data Artisans and gathers many members of the Flink community, including users, contributors, and committers. It is the perfect event to get in touch and conne

[jira] [Created] (FLINK-10953) InterruptedException on KafkaProducer

2018-11-20 Thread Avi Levi (JIRA)
Avi Levi created FLINK-10953: Summary: InterruptedException on KafkaProducer Key: FLINK-10953 URL: https://issues.apache.org/jira/browse/FLINK-10953 Project: Flink Issue Type: Bug Compo

回复:[DISCUSS] Embracing Table API in Flink ML

2018-11-20 Thread Yun Gao
Hi Weihua, Thanks for the exciting proposal! I have quickly read through it, and I really appropriate the idea of providing the ML Pipeline API similar to the commonly used library scikit-learn, since it greatly reduce the learning cost for the AI engineers to transfer to the Flink p

[jira] [Created] (FLINK-10952) How to stream MySQl data and Store in Flink Dataset

2018-11-20 Thread vinoth (JIRA)
vinoth created FLINK-10952: -- Summary: How to stream MySQl data and Store in Flink Dataset Key: FLINK-10952 URL: https://issues.apache.org/jira/browse/FLINK-10952 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-10951) Disable enforcing of YARN container virtual memory limits in tests

2018-11-20 Thread Gary Yao (JIRA)
Gary Yao created FLINK-10951: Summary: Disable enforcing of YARN container virtual memory limits in tests Key: FLINK-10951 URL: https://issues.apache.org/jira/browse/FLINK-10951 Project: Flink I

Re: [DISCUSS] Table API Enhancement Outline

2018-11-20 Thread Shaoxuan Wang
+1. I agree that we should open the JIRAs to start the work. We may have better ideas on the flavor of the interface when implement/review the code. Regards, shaoxuan On 11/20/18, jincheng sun wrote: > Hi all, > > Thanks all for the feedback. > > @Piotr About not using abbreviations naming, +1

Re: [DISCUSS] Embracing Table API in Flink ML

2018-11-20 Thread Shaoxuan Wang
Hi Weihua, Thanks for the proposal. I have quickly read through it. It looks great. A quick question. Do you consider changing the ML Lib (implementation of Estimator/Predictor/Transformer) also on top of the tableAPI? I will be very happy if this is also included in the scope. It is not easy and

[ANNOUNCE] Weekly community update #47

2018-11-20 Thread Till Rohrmann
Dear community, this is the weekly community update thread #47. Please post any news and updates you want to share with the community to this thread. # Updates on sharing state between subtasks Jamie opened a first PR to add a first version of sharing state between tasks. It works by using the J

Re: [DISCUSS] Embracing Table API in Flink ML

2018-11-20 Thread Becket Qin
Hi Weihua, Thanks for the well written design doc! The abstraction of ML pipeline is pretty handy to the AI engineers. As Jincheng mentioned, there is an undergoing effort to enhance the Table API for ML. But it would still be helpful to understand what is missing in Table API to fully support th

[VOTE] Release 1.7.0, release candidate #2

2018-11-20 Thread Till Rohrmann
Hi everyone, Please review and vote on the release candidate #2 for the version 1.7.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], *

[jira] [Created] (FLINK-10950) RocksDB backend does not work on Windows due to path issue

2018-11-20 Thread Adam Laczynski (JIRA)
Adam Laczynski created FLINK-10950: -- Summary: RocksDB backend does not work on Windows due to path issue Key: FLINK-10950 URL: https://issues.apache.org/jira/browse/FLINK-10950 Project: Flink

[DISCUSS] Support Interactive Programming in Flink Table API

2018-11-20 Thread Becket Qin
Hi all, As a few recent email threads have pointed out, it is a promising opportunity to enhance Flink Table API in various aspects, including functionality and ease of use among others. One of the scenarios where we feel Flink could improve is interactive programming. To explain the issues and fa

[jira] [Created] (FLINK-10949) When use flink-1.6.2's intervalJoin funtion, the thread is stucked in rockdb's seek for too long time

2018-11-20 Thread Liu (JIRA)
Liu created FLINK-10949: --- Summary: When use flink-1.6.2's intervalJoin funtion, the thread is stucked in rockdb's seek for too long time Key: FLINK-10949 URL: https://issues.apache.org/jira/browse/FLINK-10949 P

Re: [DISCUSS] Embracing Table API in Flink ML

2018-11-20 Thread jincheng sun
Hi Weihua, Thanks for bring up this discuss! I quickly read the google doc,and I fully agree that ML can be well supported on TableAPI (at some stage in the future). In fact, Xiaowei and I have already brought up a discussion on enhancing the Table API. In the first phase, we will add support for

[jira] [Created] (FLINK-10948) Add option to write out termination message with application status

2018-11-20 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-10948: --- Summary: Add option to write out termination message with application status Key: FLINK-10948 URL: https://issues.apache.org/jira/browse/FLINK-10948 Project: Flink

[DISCUSS] Embracing Table API in Flink ML

2018-11-20 Thread Weihua Jiang
ML Pipeline is the idea brought by Scikit-learn . Both Spark and Flink has borrowed this idea and made their own implementations [Spark ML Pipeline , Flink ML Pipeline

Re: Apply for flink contributor permission

2018-11-20 Thread Zhu Zhu
Thanks Till! Till Rohrmann 于2018年11月20日周二 下午6:17写道: > Welcome to the community Zhu. I've given you contributor permissions. > > Cheers, > Till > > On Tue, Nov 20, 2018 at 11:10 AM Zhu Zhu wrote: > > > Hi there, > > > > Could anyone kindly give me the contributor permission? > > My JIRA id is zh

[jira] [Created] (FLINK-10947) Document handling of null keys in the data types documentation

2018-11-20 Thread Flavio Pompermaier (JIRA)
Flavio Pompermaier created FLINK-10947: -- Summary: Document handling of null keys in the data types documentation Key: FLINK-10947 URL: https://issues.apache.org/jira/browse/FLINK-10947 Project: F

[jira] [Created] (FLINK-10946) Resuming Externalized Checkpoint (rocks, incremental, scale up) end-to-end test failed on Travis

2018-11-20 Thread Andrey Zagrebin (JIRA)
Andrey Zagrebin created FLINK-10946: --- Summary: Resuming Externalized Checkpoint (rocks, incremental, scale up) end-to-end test failed on Travis Key: FLINK-10946 URL: https://issues.apache.org/jira/browse/FLINK-1

Re: Apply for flink contributor permission

2018-11-20 Thread Till Rohrmann
Welcome to the community Zhu. I've given you contributor permissions. Cheers, Till On Tue, Nov 20, 2018 at 11:10 AM Zhu Zhu wrote: > Hi there, > > Could anyone kindly give me the contributor permission? > My JIRA id is zhuzh. > > Thanks, > Zhu >

[jira] [Created] (FLINK-10945) Avoid resource deadlocks for finite stream jobs when resources are limited

2018-11-20 Thread Zhu Zhu (JIRA)
Zhu Zhu created FLINK-10945: --- Summary: Avoid resource deadlocks for finite stream jobs when resources are limited Key: FLINK-10945 URL: https://issues.apache.org/jira/browse/FLINK-10945 Project: Flink

Apply for flink contributor permission

2018-11-20 Thread Zhu Zhu
Hi there, Could anyone kindly give me the contributor permission? My JIRA id is zhuzh. Thanks, Zhu

[jira] [Created] (FLINK-10944) EventTimeWindowCheckpointingITCase.testTumblingTimeWindow failed on Travis

2018-11-20 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10944: - Summary: EventTimeWindowCheckpointingITCase.testTumblingTimeWindow failed on Travis Key: FLINK-10944 URL: https://issues.apache.org/jira/browse/FLINK-10944 Project:

[jira] [Created] (FLINK-10943) Flink runtime test failed caused by curator dependency conflicts

2018-11-20 Thread Paul Lin (JIRA)
Paul Lin created FLINK-10943: Summary: Flink runtime test failed caused by curator dependency conflicts Key: FLINK-10943 URL: https://issues.apache.org/jira/browse/FLINK-10943 Project: Flink Iss