Re: Tuple

2015-07-31 Thread Chesnay Schepler
also, I'm not sure if I ever sent a Tuple0 through a program, it could be that the system freaks out. On 31.07.2015 22:40, Chesnay Schepler wrote: there's no specific reason. it was added fairly recently by me (mid of april), and you're most likely the second person to use it. i didn't integr

Re: Tuple

2015-07-31 Thread Chesnay Schepler
there's no specific reason. it was added fairly recently by me (mid of april), and you're most likely the second person to use it. i didn't integrate into all our tuple related stuff because, well, i never thought anyone would actually need it, so i saved myself the trouble. Hi, is there an

Re: Question about DataStream class hierarchy

2015-07-31 Thread Gyula Fóra
Hi Matthias, I think Aljoscha is preparing a nice PR that completely reworks the DataStream classes and the information they actually contain. I don't think it's a good idea to mess things up before he gets a chance to open the PR. Also I don't see a well supported reason for moving the setParall

Re: Question about DataStream class hierarchy

2015-07-31 Thread Matthias J. Sax
Hi, I would like to apply the following changes to DataStream class hierarchy: https://github.com/mjsax/flink/tree/flink-2306-storm-namedStreams Please give some feedback if those changes are reasonable to you. I need those change to get a clean design for https://issues.apache.org/jira/browse/F

Tuple

2015-07-31 Thread Matthias J. Sax
Hi, is there any specific reason, why Tuple.getTupleClass(int arity) does not support arity zero? There is a class Tuple0, but it cannot be generator by Tuple.getTupleClass(...). Is it a missing feature (I would like to have it). -Matthias signature.asc Description: OpenPGP digital signature

[jira] [Created] (FLINK-2455) Misleading I/O manager error log messages

2015-07-31 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2455: -- Summary: Misleading I/O manager error log messages Key: FLINK-2455 URL: https://issues.apache.org/jira/browse/FLINK-2455 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-2453) Update POM to use Java7 as the source and target version

2015-07-31 Thread Henry Saputra (JIRA)
Henry Saputra created FLINK-2453: Summary: Update POM to use Java7 as the source and target version Key: FLINK-2453 URL: https://issues.apache.org/jira/browse/FLINK-2453 Project: Flink Issue

[jira] [Created] (FLINK-2454) Update Travis file to run build using Java7

2015-07-31 Thread Henry Saputra (JIRA)
Henry Saputra created FLINK-2454: Summary: Update Travis file to run build using Java7 Key: FLINK-2454 URL: https://issues.apache.org/jira/browse/FLINK-2454 Project: Flink Issue Type: Sub-tas

Re: Question About "Preserve Partitioning" in Stream Iteration

2015-07-31 Thread Gyula Fóra
Maybe you can reuse some of the logic that is currently there on the StreamGraph, with building StreamLoops first which will be used to generate the sources and sinks right before building the JobGraph. This avoids the need of knowing everything beforehand. I actually added this to avoid the compl

[jira] [Created] (FLINK-2452) Add a playcount threshold to the MusicProfiles examples

2015-07-31 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-2452: Summary: Add a playcount threshold to the MusicProfiles examples Key: FLINK-2452 URL: https://issues.apache.org/jira/browse/FLINK-2452 Project: Flink Issue T

[jira] [Created] (FLINK-2451) Cleanup Gelly examples

2015-07-31 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-2451: Summary: Cleanup Gelly examples Key: FLINK-2451 URL: https://issues.apache.org/jira/browse/FLINK-2451 Project: Flink Issue Type: Improvement Compon

Re: Question About "Preserve Partitioning" in Stream Iteration

2015-07-31 Thread Aljoscha Krettek
Sure it can be done, it's just more complex if you try to do it in a sane way without having the code that builds the StreamGraph all over the place. :D I'll try to come up with something. This is my current work in progress, by the way: https://github.com/aljoscha/flink/tree/stream-api-rework I

Re: Question About "Preserve Partitioning" in Stream Iteration

2015-07-31 Thread Gyula Fóra
There might be reasons why a user would want different parallelism at the head operators (depending on what else that head operator might process) so restricting them to the same parallelism is a little bit weird don't you think? It kind of goes against the whole opeartors-parallelism idea. I don'

Re: Question About "Preserve Partitioning" in Stream Iteration

2015-07-31 Thread Aljoscha Krettek
Yes, I'm not saying that it makes sense to do it, I'm just saying that it does translate and run. Your observation is true. :D I'm wondering whether it makes sense to allow users to have iteration heads with differing parallelism, in fact. On Fri, 31 Jul 2015 at 16:40 Gyula Fóra wrote: > I stil

Re: Question About "Preserve Partitioning" in Stream Iteration

2015-07-31 Thread Gyula Fóra
I still don't get how it could possibly work, let me tell you how I see and correct me in my logic :) You have this program: ids.map1().setParallelism(2) ids.map2().setParallelism(4) //... ids.closeWith(feedback.groupBy(0)) You are suggesting that we only have one iteration source/sink pair wit

RE: Off-heap memory in Flink?

2015-07-31 Thread Radu Tudoran
Hi, Is there some info, description about how this off-heap memory is managed and its goals? Thanks Dr. Radu Tudoran Research Engineer IT R&D Division HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research Center Riesstrasse 25, 80992 München E-mail: radu.tudo...@huawei.com Mobile: +49 152090

Re: Off-heap memory in Flink?

2015-07-31 Thread Stephan Ewen
The Pull Request is basically ready. I would like to benchmark a bit before merging it. The on-heap Flink-managed memory classes are highly optimized to be JIT friendly. Just want to make sure that we don't loose that. I have worked a lot on streaming issues lately, so this is still in my backlog

Re: Question About "Preserve Partitioning" in Stream Iteration

2015-07-31 Thread Aljoscha Krettek
Yes, this would still work. For example, I have this crazy graph: http://postimg.org/image/xtv8ay8hv/full/ That results from this program: https://gist.github.com/aljoscha/45aaf62b2a7957cfafd5 It works, and the implementation is very simple, actually. On Fri, 31 Jul 2015 at 14:30 Gyula Fóra wrot

Re: Question About "Preserve Partitioning" in Stream Iteration

2015-07-31 Thread Gyula Fóra
I mean that the head operators have different parallelism: IterativeDataStream ids = ... ids.map().setParallelism(2) ids.map().setParallelism(4) //... ids.closeWith(feedback) Aljoscha Krettek ezt írta (időpont: 2015. júl. 31., P, 14:23): > I thought about having some tighter restrictions her

Re: Question About "Preserve Partitioning" in Stream Iteration

2015-07-31 Thread Aljoscha Krettek
I thought about having some tighter restrictions here. My idea was to enforce that the feedback edges must have the same parallelism as the original input stream, otherwise shipping strategies such as "keyBy", "shuffle", "rebalance" don't seem to make sense because they would differ from the distri

Re: Off-heap memory in Flink?

2015-07-31 Thread Maximilian Michels
Hi Slim, Off-heap memory has been postponed because it's not a pressing but rather a nice-to-have feature. I know that Stephan continued to work on the off-heap memory. I think we can get it in sometime this year. Best, Max On Fri, Jul 31, 2015 at 11:57 AM, Slim Baltagi wrote: > Hi > > I remem

Re: Types in the Python API

2015-07-31 Thread Gyula Fóra
In any case, thank you guys for the exhaustive discussion :D Aljoscha Krettek ezt írta (időpont: 2015. júl. 31., P, 13:52): > Yes, I wouldn't deal with that now, that's orthogonal to the Types issue. > > On Fri, 31 Jul 2015 at 12:09 Chesnay Schepler wrote: > > > I feel like we drifted away from

Re: Question About "Preserve Partitioning" in Stream Iteration

2015-07-31 Thread Gyula Fóra
Hey, I am not sure what is the intuitive behaviour here. As you are not applying a transformation on the feedback stream but pass it to a closeWith method, I thought it was somehow nature that it gets the partitioning of the iteration input, but maybe its not intuitive. If others also think that

Re: Types in the Python API

2015-07-31 Thread Aljoscha Krettek
Yes, I wouldn't deal with that now, that's orthogonal to the Types issue. On Fri, 31 Jul 2015 at 12:09 Chesnay Schepler wrote: > I feel like we drifted away from the original topic a bit, but alright. > > I don't consider it a pity we created a proprietary protocol. we know > exactly how it work

Re: Bug fix release 0.9.1

2015-07-31 Thread Fabian Hueske
Thanks Ufuk for starting this discussion. We should also go through the commit logs of the master branch and see if we forgot to cherry-pick some fixes over to the release-0.9 branch. I can do that and compile a list of potential fixes. Cheers, Fabian 2015-07-31 11:34 GMT+02:00 Ufuk Celebi : >

Re: Types in the Python API

2015-07-31 Thread Chesnay Schepler
I feel like we drifted away from the original topic a bit, but alright. I don't consider it a pity we created a proprietary protocol. we know exactly how it works and what it is capable of. It is also made exactly for our use case, in contrast to general purpose libraries. If we ever decide th

Off-heap memory in Flink?

2015-07-31 Thread Slim Baltagi
Hi I remember seeing that using off-heap memory was on Flink’s roadmap as well as a related pull request https://github.com/apache/flink/pull/290 Any update on such effort? Thanks Slim Baltagi -- View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Of

Question About "Preserve Partitioning" in Stream Iteration

2015-07-31 Thread Aljoscha Krettek
Hi, I'm currently working on making the StreamGraph generation more centralized (i.e. not spread across the different API classes). The question is now why we need to switch to preserve partitioning? Could we not make "preserve" partitioning the default and if users want to have shuffle partitionin

Bug fix release 0.9.1

2015-07-31 Thread Ufuk Celebi
Hey all! I want to start a discussion about the next 0.9 release. Since 0.9.0 there have been 19 commits addressing 16 issues. Of these 16 issues, two were critical issues regarding the runtime, which require a new release urgently. What's do you think about this? --- If we agree about doing

Re: Types in the Python API

2015-07-31 Thread Maximilian Michels
py4j looks really nice and the communication works in both ways. There is also another Python to Java communication library called javabridge. I think it is a pity we chose to implement a proprietary protocol for the network communication of the Python API. This could have been abstracted more nice

Re: Types in the Python API

2015-07-31 Thread Till Rohrmann
Zeppelin uses py4j [1] to transfer data between a Python process and a JVM. That way they can run a Python interpreter and Java interpreter and easily share state between them. Spark also uses py4j as a bridge between Java and Python. However, I don't know for what exactly. And I also don't know wh

[jira] [Created] (FLINK-2450) IndexOutOfBoundsException in KryoSerializer

2015-07-31 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2450: -- Summary: IndexOutOfBoundsException in KryoSerializer Key: FLINK-2450 URL: https://issues.apache.org/jira/browse/FLINK-2450 Project: Flink Issue Type: Bug

Re: Types in the Python API

2015-07-31 Thread Stephan Ewen
I think in short: Spark never worried about types. It is just something arbitrary. Flink worries about types, for memory management. Aljoscha's suggestion is a good one: have a PythonTypeInfo that is dynamic. Till' also found a pretty nice way to connect Python and Java in his Zeppelin-based dem

Re: Types in the Python API

2015-07-31 Thread Aljoscha Krettek
I don't know yet. :D Maybe the sorting will have to be delegated to python. I don't think it's possible to always get a meaningful order when only sorting on the serialized bytes. It should however work for grouping. On Fri, 31 Jul 2015 at 10:31 Chesnay Schepler wrote: > if its just a single ar

Re: Types in the Python API

2015-07-31 Thread Chesnay Schepler
if its just a single array, how would you define group/sort keys? On 31.07.2015 07:03, Aljoscha Krettek wrote: I think then the Python part would just serialize all the tuple fields to a big byte array. And all the key fields to another array, so that the java side can to comparisons on the whol

[jira] [Created] (FLINK-2449) DistCache doesn't work with JavaProgram Collection tests

2015-07-31 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-2449: --- Summary: DistCache doesn't work with JavaProgram Collection tests Key: FLINK-2449 URL: https://issues.apache.org/jira/browse/FLINK-2449 Project: Flink

[jira] [Created] (FLINK-2448) registerCacheFile fails with MultipleProgramsTestbase

2015-07-31 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-2448: --- Summary: registerCacheFile fails with MultipleProgramsTestbase Key: FLINK-2448 URL: https://issues.apache.org/jira/browse/FLINK-2448 Project: Flink Iss

[jira] [Created] (FLINK-2447) TypeExtractor returns wrong type info when a Tuple has two fields of the same POJO type

2015-07-31 Thread Gabor Gevay (JIRA)
Gabor Gevay created FLINK-2447: -- Summary: TypeExtractor returns wrong type info when a Tuple has two fields of the same POJO type Key: FLINK-2447 URL: https://issues.apache.org/jira/browse/FLINK-2447 Pro