Re: [DISCUSS] Make FieldAccessor logic consistent with remaining API

Márton Balassi Thu, 03 Nov 2016 08:45:29 -0700

Thanks for bringing up the issue, Fabian.

I am also for unifying the access patterns between the FieldAccessor and
ExpressionKeys logic.


In terms of Fabian's suggestion to refactor the ExpressionKeys parsing
logic and rely on it in the FieldAccessors I think that is nice first step.
It would be great if we could provide even tighter integration though.

On Fri, Oct 28, 2016 at 6:17 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> IMO, FieldAccessors are certainly valuable and should exist.
> I only want to make sure that there is a common syntax for field
> references.
>
> Maybe we can refactor the parsing code for field references in
> ExpressionKeys and expose it in static public methods that the
> FieldAccessors can call.
>
> Best,
> Fabian
>
>
>
> 2016-10-27 14:52 GMT+02:00 Gábor Gévay <gga...@gmail.com>:
>
> > Hello,
> >
> > Thanks Fabian for starting this discussion.
> >
> > I would just like to add a few thougths about why does the
> > FieldAccessors even exist. One could say that we should instead just
> > re-use ExpressionKeys for the aggregations, and we are done. As far as
> > I can see, the way ExpressionKeys is often used is roughly to call
> > computeLogicalKeyPositions on it, then call
> > CompositeType.createComparator with these key positions, and then we
> > can call extractKeys on the comparator to get the values of the fields
> > of a record.
> >
> > However, the problem with this is that in the aggregations we also
> > need to write fields and not just read them. So with the above
> > procedure, we would need an "intractKey" method on the comparators
> > that would be the opposite of extractKeys, i.e. which would set the
> > key fields.
> >
> > Another difference between the FieldAccessor world and the
> > ExpressionKeys world is that in the aggregations we need to deal with
> > only one field. This means that enhancing the comparators with the
> > exact opposite of extractKeys would be overkill, because we don't need
> > to set multiple fields at a time.
> >
> > It certainly seems to be a good idea to unify these at least
> > externally (from the side of the users), but I'm not sure what would
> > be a neat way to also internally unify them.
> >
> > Best,
> > Gábor
> >
> >
> > 2016-10-26 20:54 GMT+02:00 Fabian Hueske <fhue...@gmail.com>:
> > > Hi everybody,
> > >
> > > while reviewing PR #2094 [1] I noticed that the field reference syntax
> > for
> > > FieldAccessors is not compatible with the syntax supported for key
> > > definitions (ExpressionKeys) used in groupBy(), keyBy(),
> > > join().where().equalTo(), etc.
> > >
> > > FieldAccessors are only used for build-in aggregations in the
> DataStream
> > > API (sum(), min(), max(), ...).
> > >
> > > In particular I identified the following inconsistencies:
> > >
> > > - FieldAccessors allow to address array cells. ExpressionKeys treat
> > arrays
> > > as AtomicTypes (Array TypeInfos do not extend CompositeType). Hence, it
> > is
> > > not possible to address array cells.
> > > - ExpressionKeys do only support Integer keys for tuples. An atomic
> type
> > > can only be addressed with "*". FieldAccessors allow to address
> > AtomicTypes
> > > with 0 in addtion to "*".
> > > - ExpressionKeys support to address fields of Java tuples with "f2" and
> > > Scala tuple fields with "_3". FieldAccessors do not support the "f" or
> > "_"
> > > prefix.
> > >
> > > I would like to propose to adapt the syntax of both mechanisms
> (ideally,
> > > both should use the same code for validation). IMO, the ExpressionKey
> > > syntax much more widely used and is well designed. Therefore, I would
> > adopt
> > > it for FieldAccessors as well. However, that would mean to restrict the
> > > syntax of the FieldAccessors and might break existing code.
> > >
> > > What do others think?
> > >
> > > [1] https://github.com/apache/flink/pull/2094
> >
>

Re: [DISCUSS] Make FieldAccessor logic consistent with remaining API

Reply via email to