IMO, FieldAccessors are certainly valuable and should exist.
I only want to make sure that there is a common syntax for field references.

Maybe we can refactor the parsing code for field references in
ExpressionKeys and expose it in static public methods that the
FieldAccessors can call.

Best,
Fabian



2016-10-27 14:52 GMT+02:00 Gábor Gévay <gga...@gmail.com>:

> Hello,
>
> Thanks Fabian for starting this discussion.
>
> I would just like to add a few thougths about why does the
> FieldAccessors even exist. One could say that we should instead just
> re-use ExpressionKeys for the aggregations, and we are done. As far as
> I can see, the way ExpressionKeys is often used is roughly to call
> computeLogicalKeyPositions on it, then call
> CompositeType.createComparator with these key positions, and then we
> can call extractKeys on the comparator to get the values of the fields
> of a record.
>
> However, the problem with this is that in the aggregations we also
> need to write fields and not just read them. So with the above
> procedure, we would need an "intractKey" method on the comparators
> that would be the opposite of extractKeys, i.e. which would set the
> key fields.
>
> Another difference between the FieldAccessor world and the
> ExpressionKeys world is that in the aggregations we need to deal with
> only one field. This means that enhancing the comparators with the
> exact opposite of extractKeys would be overkill, because we don't need
> to set multiple fields at a time.
>
> It certainly seems to be a good idea to unify these at least
> externally (from the side of the users), but I'm not sure what would
> be a neat way to also internally unify them.
>
> Best,
> Gábor
>
>
> 2016-10-26 20:54 GMT+02:00 Fabian Hueske <fhue...@gmail.com>:
> > Hi everybody,
> >
> > while reviewing PR #2094 [1] I noticed that the field reference syntax
> for
> > FieldAccessors is not compatible with the syntax supported for key
> > definitions (ExpressionKeys) used in groupBy(), keyBy(),
> > join().where().equalTo(), etc.
> >
> > FieldAccessors are only used for build-in aggregations in the DataStream
> > API (sum(), min(), max(), ...).
> >
> > In particular I identified the following inconsistencies:
> >
> > - FieldAccessors allow to address array cells. ExpressionKeys treat
> arrays
> > as AtomicTypes (Array TypeInfos do not extend CompositeType). Hence, it
> is
> > not possible to address array cells.
> > - ExpressionKeys do only support Integer keys for tuples. An atomic type
> > can only be addressed with "*". FieldAccessors allow to address
> AtomicTypes
> > with 0 in addtion to "*".
> > - ExpressionKeys support to address fields of Java tuples with "f2" and
> > Scala tuple fields with "_3". FieldAccessors do not support the "f" or
> "_"
> > prefix.
> >
> > I would like to propose to adapt the syntax of both mechanisms (ideally,
> > both should use the same code for validation). IMO, the ExpressionKey
> > syntax much more widely used and is well designed. Therefore, I would
> adopt
> > it for FieldAccessors as well. However, that would mean to restrict the
> > syntax of the FieldAccessors and might break existing code.
> >
> > What do others think?
> >
> > [1] https://github.com/apache/flink/pull/2094
>

Reply via email to