I believe it should be possible to create a special PythonTypeInfo where the python side is responsible for serializing data to a byte array and to the java side it is just a byte array and all the comparisons are also performed on these byte arrays. I think partitioning and sort should still work, since the sorting is (in most cases) only used to group the elements for a groupBy(). If proper sort order would be required this would have to be done on the python side.
On Thu, 30 Jul 2015 at 22:21 Chesnay Schepler <c.schep...@web.de> wrote: > To be perfectly honest i never really managed to work my way through > Spark's python API, it's a whole bunch of magic to me; not even the > general structure is understandable. > > With "pure python" do you mean doing everything in python? as in just > having serialized data on the java side? > > I believe the way to do this with Flink is to add a switch that > a) disables all type checks > b) creates serializers dynamically at runtime. > > a) should be fairly straight forward, b) on the other hand.... > > btw., the Python API itself doesn't require the type information, it > already does the b part. > > On 30.07.2015 22:11, Gyula Fóra wrote: > > That I understand, but could you please tell me how is this done > > differently in Spark for instance? > > > > What would we need to change to make this work with pure python (as it > > seems to be possible)? This probably have large performance implications > > though. > > > > Gyula > > > > Chesnay Schepler <c.schep...@web.de> ezt írta (időpont: 2015. júl. 30., > Cs, > > 22:04): > > > >> because it still goes through the Java API that requires some kind of > >> type information. imagine a java api program where you omit all generic > >> types, it just wouldn't work as of now. > >> > >> On 30.07.2015 21:17, Gyula Fóra wrote: > >>> Hey! > >>> > >>> Could anyone briefly tell me what exactly is the reason why we force > the > >>> users in the Python API to declare types for operators? > >>> > >>> I don't really understand how this works in different systems but I am > >> just > >>> curious why Flink has types and why Spark doesn't for instance. > >>> > >>> If you give me some pointers to read that would also be fine :) > >>> > >>> Thank you, > >>> Gyula > >>> > >> > >