[jira] [Created] (FLINK-5903) taskmanager.numberOfTaskSlots and yarn.containers.vcores did not work well in YARN mode

2017-02-23 Thread Tao Wang (JIRA)
Tao Wang created FLINK-5903: --- Summary: taskmanager.numberOfTaskSlots and yarn.containers.vcores did not work well in YARN mode Key: FLINK-5903 URL: https://issues.apache.org/jira/browse/FLINK-5903 Project:

[jira] [Created] (FLINK-5902) Some images can not show in IE

2017-02-23 Thread Tao Wang (JIRA)
Tao Wang created FLINK-5902: --- Summary: Some images can not show in IE Key: FLINK-5902 URL: https://issues.apache.org/jira/browse/FLINK-5902 Project: Flink Issue Type: Bug Components: Webf

[jira] [Created] (FLINK-5901) DAG can not show properly in IE

2017-02-23 Thread Tao Wang (JIRA)
Tao Wang created FLINK-5901: --- Summary: DAG can not show properly in IE Key: FLINK-5901 URL: https://issues.apache.org/jira/browse/FLINK-5901 Project: Flink Issue Type: Bug Components: Web

[jira] [Created] (FLINK-5900) Add non-partial merge Aggregates and unit tests

2017-02-23 Thread Shaoxuan Wang (JIRA)
Shaoxuan Wang created FLINK-5900: Summary: Add non-partial merge Aggregates and unit tests Key: FLINK-5900 URL: https://issues.apache.org/jira/browse/FLINK-5900 Project: Flink Issue Type: Imp

[jira] [Created] (FLINK-5899) Fix the bug in initializing the DataSetTumbleTimeWindowAggReduceGroupFunction

2017-02-23 Thread Shaoxuan Wang (JIRA)
Shaoxuan Wang created FLINK-5899: Summary: Fix the bug in initializing the DataSetTumbleTimeWindowAggReduceGroupFunction Key: FLINK-5899 URL: https://issues.apache.org/jira/browse/FLINK-5899 Project:

[jira] [Created] (FLINK-5898) Race-Condition with Amazon Kinesis KPL

2017-02-23 Thread Scott Kidder (JIRA)
Scott Kidder created FLINK-5898: --- Summary: Race-Condition with Amazon Kinesis KPL Key: FLINK-5898 URL: https://issues.apache.org/jira/browse/FLINK-5898 Project: Flink Issue Type: Bug

Re: Visualizing topologies

2017-02-23 Thread Ken Krugler
Hi Ufuk, > On Feb 22, 2017, at 2:18am, Ufuk Celebi wrote: > > Hey Ken! > > This looks really good. +1 to make this available publicly. > > We can link it from the Flink website and the viz tool Pat linked to. > The vizualizer has currently some open issues, it is not up to date > with the one

Re: Visualizing topologies

2017-02-23 Thread Ken Krugler
Hi Pat, > On Feb 21, 2017, at 6:01pm, Pattarawat Chormai wrote: > > Hi Ken, > > Maybe you can look into this one : http://flink.apache.org/visualizer/. Thanks, that’s interesting and convenient. Though I’d probably keep using OmniGraffle with a dot file as that gives me the ability to edit/a

[jira] [Created] (FLINK-5897) Untie Checkpoint Externalization from FileSystems

2017-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-5897: --- Summary: Untie Checkpoint Externalization from FileSystems Key: FLINK-5897 URL: https://issues.apache.org/jira/browse/FLINK-5897 Project: Flink Issue Type: Sub

Re: [DISCUSS] Code style / checkstyle

2017-02-23 Thread Jinkui Shi
Thanks to discuss this problem again. 1. Google checkstyle is good for java. 2. scala check style is here [1] 3. We can make a Flink plan contain issues, one sub-issue one rule. Resolve this in short time. Current code style may be historical accumulate. If we don’t normalize the code step by s

Re: FLINK-4565 Support for SQL IN operator

2017-02-23 Thread Fabian Hueske
Hi Dmytro, done. Looking forward to your contribution! Cheers, Fabian 2017-02-23 17:25 GMT+01:00 Dmytro Shkvyra : > Hello, > > > > I would like to start contribute to Flink. > > Could anyone assign issue https://issues.apache.org/jira/browse/FLINK-4565 > to me (dshkvyra) in jira? > > > Sincerel

FLINK-4565 Support for SQL IN operator

2017-02-23 Thread Dmytro Shkvyra
Hello, I would like to start contribute to Flink. Could anyone assign issue https://issues.apache.org/jira/browse/FLINK-4565 to me (dshkvyra) in jira? Sincerely Dmytro Shkvyra Senior Software Engineer Office: +380 44 390 5457 x 65346 Cell: +380 50 357 6828 Email: dmytro_shkv...@epam

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
@Theodore, thanks for taking lead in the coordination :) Let's see what we can do, and then decide what should start out as an independent project, or strictly inside Flink. I agree that something experimental like batch ML on streaming would probably benefit more an independent repo first. O

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Theodore Vasiloudis
Sure having a deadline for March 3rd is fine. I can act as coordinator, trying to guide the discussion to concrete results. For committers it's up to their discretion and time if one wants to participate. I don't think it's necessary to have one, but it would be most welcome. @Katherin I would su

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
Okay, let's just aim for around the end of next week, but we can take more time to discuss if there's still a lot of ongoing activity. Keep the topic hot! Thanks all for the enthusiasm :) On 2017-02-23 16:17, Stavros Kontopoulos wrote: @Gabor 3rd March is ok for me. But maybe giving a bit mo

Re: [DISCUSS] Code style / checkstyle

2017-02-23 Thread Aljoscha Krettek
If we go for a codestyle/checkstyle I would suggest to use the Google style. This already has checkstyle, IntelliJ style, Eclipse style and a code formatting tool and is well established. However, some people will not like this style. In general, I think we will never manage to find a style that al

[jira] [Created] (FLINK-5896) Improve readability of the event time docs

2017-02-23 Thread David Anderson (JIRA)
David Anderson created FLINK-5896: - Summary: Improve readability of the event time docs Key: FLINK-5896 URL: https://issues.apache.org/jira/browse/FLINK-5896 Project: Flink Issue Type: Improv

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Stavros Kontopoulos
@Gabor 3rd March is ok for me. But maybe giving a bit more time to it like a week may suit more people. What do you think all? I will contribute to the doc. +100 for having a co-ordinator + commiter. Thank you all for joining the discussion. Cheers, Stavros On Thu, Feb 23, 2017 at 4:48 PM, Gábo

[jira] [Created] (FLINK-5895) Reduce logging aggressiveness of FileSystemSafetyNet

2017-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-5895: --- Summary: Reduce logging aggressiveness of FileSystemSafetyNet Key: FLINK-5895 URL: https://issues.apache.org/jira/browse/FLINK-5895 Project: Flink Issue Type:

Re: [DISCUSS] Side Outputs and Split/Select

2017-02-23 Thread Gyula Fóra
Hi, Thanks for the nice proposal, I like the idea of side outputs, and it would make a lot of topologies much simpler. Regarding the API I think we should come up with a way of making side otuputs accessible from all sort of operators in a similar way. For instance through the RichFunction interf

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
Okay, I've created a skeleton of the design doc for choosing a direction: https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw/edit?usp=sharing Much of the pros/cons have already been discussed here, so I'll try to put there all the arguments mentioned in this thread.

[DISCUSS] Side Outputs and Split/Select

2017-02-23 Thread Aljoscha Krettek
Hi Folks, Chen and I have been working for a while now on making FLIP-13 (side outputs) [1] a reality. We think we have a pretty good internal implementation and also a proposal for an API but now we need to discuss how we want to go forward with this, especially how we should deal with split/selec

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Katherin Eri
I have asked already some teams for useful cases, but all of them need time to think. During analysis something will finally arise. May be we can ask partners of Flink for cases? Data Artisans got results of customers survey: [1], ML better support is wanted, so we could ask what exactly is necess

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Stavros Kontopoulos
+100 for a design doc. Could we also set a roadmap after some time-boxed investigation captured in that document? We need action. Looking forward to work on this (whatever that might be) ;) Also are there any data supporting one direction or the other from a customer perspective? It would help to

Re: [DISCUSS] Project build time and possible restructuring

2017-02-23 Thread Stephan Ewen
If we can get a incremental builds to work, that would actually be the preferred solution in my opinion. Many companies have invested heavily in making a "single repository" code base work, because it has the advantage of not having to update/publish several repositories first. However, the strong

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Katherin Eri
Yes, ok. let's start some design document, and write down there already mentioned ideas about: parameter server, about clipper and others. Would be nice if we will also map this approaches to cases. Will work on it collaboratively on each topic, may be finally we will form some picture, that could

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
I agree, that it's better to go in one direction first, but I think online and offline with streaming API can go somewhat parallel later. We could set a short-term goal, concentrate initially on one direction, and showcase that direction (e.g. in a blogpost). But first, we should list the pros/

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Katherin Eri
I'm not sure that this is feasible, doing all at the same time could mean doing nothing I'm just afraid, that words: we will work on streaming not on batching, we have no commiter's time for this, mean that yes, we started work on FLINK-1730, but nobody will commit this work in the end, as it a

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
@Theodore: Great to hear you think the "batch on streaming" approach is possible! Of course, we need to pay attention all the pitfalls there, if we go that way. +1 for a design doc! I would add that it's possible to make efforts in all the three directions (i.e. batch, online, batch on stream

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Theodore Vasiloudis
Hello all, @Gabor, we have discussed the idea of using the streaming API to write all of our ML algorithms with a couple of people offline, and I think it might be possible and is generally worth a shot. The approach we would take would be close to Vowpal Wabbit, not exactly "online", but rather

[jira] [Created] (FLINK-5894) HA docs are misleading re: state backends

2017-02-23 Thread David Anderson (JIRA)
David Anderson created FLINK-5894: - Summary: HA docs are misleading re: state backends Key: FLINK-5894 URL: https://issues.apache.org/jira/browse/FLINK-5894 Project: Flink Issue Type: Improve

Re: [DISCUSS] Per-key event time

2017-02-23 Thread Gábor Hermann
Hey all, Let me share some ideas about this. @Paris: The local-only progress tracking indeed seems easier, we do not need to broadcast anything. Implementation-wise it is easier, but performance-wise probably not. If one key can come from multiple sources, there could be a lot more network ov