I believe it should be possible to create a special PythonTypeInfo where
the python side is responsible for serializing data to a byte array and to
the java side it is just a byte array and all the comparisons are also
performed on these byte arrays. I think partitioning and sort should still
work, since the sorting is (in most cases) only used to group the elements
for a groupBy(). If proper sort order would be required this would have to
be done on the python side.

On Thu, 30 Jul 2015 at 22:21 Chesnay Schepler <c.schep...@web.de> wrote:

> To be perfectly honest i never really managed to work my way through
> Spark's python API, it's a whole bunch of magic to me; not even the
> general structure is understandable.
>
> With "pure python" do you mean doing everything in python? as in just
> having serialized data on the java side?
>
> I believe the way to do this with Flink is to add a switch that
> a) disables all type checks
> b) creates serializers dynamically at runtime.
>
> a) should be fairly straight forward, b) on the other hand....
>
> btw., the Python API itself doesn't require the type information, it
> already does the b part.
>
> On 30.07.2015 22:11, Gyula Fóra wrote:
> > That I understand, but could you please tell me how is this done
> > differently in Spark for instance?
> >
> > What would we need to change to make this work with pure python (as it
> > seems to be possible)? This probably have large performance implications
> > though.
> >
> > Gyula
> >
> > Chesnay Schepler <c.schep...@web.de> ezt írta (időpont: 2015. júl. 30.,
> Cs,
> > 22:04):
> >
> >> because it still goes through the Java API that requires some kind of
> >> type information. imagine a java api program where you omit all generic
> >> types, it just wouldn't work as of now.
> >>
> >> On 30.07.2015 21:17, Gyula Fóra wrote:
> >>> Hey!
> >>>
> >>> Could anyone briefly tell me what exactly is the reason why we force
> the
> >>> users in the Python API to declare types for operators?
> >>>
> >>> I don't really understand how this works in different systems but I am
> >> just
> >>> curious why Flink has types and why Spark doesn't for instance.
> >>>
> >>> If you give me some pointers to read that would also be fine :)
> >>>
> >>> Thank you,
> >>> Gyula
> >>>
> >>
>
>

Reply via email to