Re: Types in the Python API

2015-07-31 Thread Gyula Fóra
In any case, thank you guys for the exhaustive discussion :D Aljoscha Krettek ezt írta (időpont: 2015. júl. 31., P, 13:52): > Yes, I wouldn't deal with that now, that's orthogonal to the Types issue. > > On Fri, 31 Jul 2015 at 12:09 Chesnay Schepler wrote: > > > I feel like we drifted away from

Re: Types in the Python API

2015-07-31 Thread Aljoscha Krettek
Yes, I wouldn't deal with that now, that's orthogonal to the Types issue. On Fri, 31 Jul 2015 at 12:09 Chesnay Schepler wrote: > I feel like we drifted away from the original topic a bit, but alright. > > I don't consider it a pity we created a proprietary protocol. we know > exactly how it work

Re: Types in the Python API

2015-07-31 Thread Chesnay Schepler
I feel like we drifted away from the original topic a bit, but alright. I don't consider it a pity we created a proprietary protocol. we know exactly how it works and what it is capable of. It is also made exactly for our use case, in contrast to general purpose libraries. If we ever decide th

Re: Types in the Python API

2015-07-31 Thread Maximilian Michels
py4j looks really nice and the communication works in both ways. There is also another Python to Java communication library called javabridge. I think it is a pity we chose to implement a proprietary protocol for the network communication of the Python API. This could have been abstracted more nice

Re: Types in the Python API

2015-07-31 Thread Till Rohrmann
Zeppelin uses py4j [1] to transfer data between a Python process and a JVM. That way they can run a Python interpreter and Java interpreter and easily share state between them. Spark also uses py4j as a bridge between Java and Python. However, I don't know for what exactly. And I also don't know wh

Re: Types in the Python API

2015-07-31 Thread Stephan Ewen
I think in short: Spark never worried about types. It is just something arbitrary. Flink worries about types, for memory management. Aljoscha's suggestion is a good one: have a PythonTypeInfo that is dynamic. Till' also found a pretty nice way to connect Python and Java in his Zeppelin-based dem

Re: Types in the Python API

2015-07-31 Thread Aljoscha Krettek
I don't know yet. :D Maybe the sorting will have to be delegated to python. I don't think it's possible to always get a meaningful order when only sorting on the serialized bytes. It should however work for grouping. On Fri, 31 Jul 2015 at 10:31 Chesnay Schepler wrote: > if its just a single ar

Re: Types in the Python API

2015-07-31 Thread Chesnay Schepler
if its just a single array, how would you define group/sort keys? On 31.07.2015 07:03, Aljoscha Krettek wrote: I think then the Python part would just serialize all the tuple fields to a big byte array. And all the key fields to another array, so that the java side can to comparisons on the whol

Re: Types in the Python API

2015-07-30 Thread Aljoscha Krettek
I think then the Python part would just serialize all the tuple fields to a big byte array. And all the key fields to another array, so that the java side can to comparisons on the whole "key blob". Maybe it's overly simplistic, but it might work. :D On Thu, 30 Jul 2015 at 23:35 Chesnay Schepler

Re: Types in the Python API

2015-07-30 Thread Chesnay Schepler
I can see this working for basic types, but am unsure how it would work with Tuples. Wouldn't the java API still need to know the arity to setup serializers? On 30.07.2015 23:02, Aljoscha Krettek wrote: I believe it should be possible to create a special PythonTypeInfo where the python side is

Re: Types in the Python API

2015-07-30 Thread Aljoscha Krettek
I believe it should be possible to create a special PythonTypeInfo where the python side is responsible for serializing data to a byte array and to the java side it is just a byte array and all the comparisons are also performed on these byte arrays. I think partitioning and sort should still work,

Re: Types in the Python API

2015-07-30 Thread Chesnay Schepler
To be perfectly honest i never really managed to work my way through Spark's python API, it's a whole bunch of magic to me; not even the general structure is understandable. With "pure python" do you mean doing everything in python? as in just having serialized data on the java side? I belie

Re: Types in the Python API

2015-07-30 Thread Gyula Fóra
That I understand, but could you please tell me how is this done differently in Spark for instance? What would we need to change to make this work with pure python (as it seems to be possible)? This probably have large performance implications though. Gyula Chesnay Schepler ezt írta (időpont: 2

Re: Types in the Python API

2015-07-30 Thread Chesnay Schepler
because it still goes through the Java API that requires some kind of type information. imagine a java api program where you omit all generic types, it just wouldn't work as of now. On 30.07.2015 21:17, Gyula Fóra wrote: Hey! Could anyone briefly tell me what exactly is the reason why we forc

Types in the Python API

2015-07-30 Thread Gyula Fóra
Hey! Could anyone briefly tell me what exactly is the reason why we force the users in the Python API to declare types for operators? I don't really understand how this works in different systems but I am just curious why Flink has types and why Spark doesn't for instance. If you give me some po