Thanks for bringing up the issue, Fabian. I am also for unifying the access patterns between the FieldAccessor and ExpressionKeys logic.
In terms of Fabian's suggestion to refactor the ExpressionKeys parsing logic and rely on it in the FieldAccessors I think that is nice first step. It would be great if we could provide even tighter integration though. On Fri, Oct 28, 2016 at 6:17 PM, Fabian Hueske <fhue...@gmail.com> wrote: > IMO, FieldAccessors are certainly valuable and should exist. > I only want to make sure that there is a common syntax for field > references. > > Maybe we can refactor the parsing code for field references in > ExpressionKeys and expose it in static public methods that the > FieldAccessors can call. > > Best, > Fabian > > > > 2016-10-27 14:52 GMT+02:00 Gábor Gévay <gga...@gmail.com>: > > > Hello, > > > > Thanks Fabian for starting this discussion. > > > > I would just like to add a few thougths about why does the > > FieldAccessors even exist. One could say that we should instead just > > re-use ExpressionKeys for the aggregations, and we are done. As far as > > I can see, the way ExpressionKeys is often used is roughly to call > > computeLogicalKeyPositions on it, then call > > CompositeType.createComparator with these key positions, and then we > > can call extractKeys on the comparator to get the values of the fields > > of a record. > > > > However, the problem with this is that in the aggregations we also > > need to write fields and not just read them. So with the above > > procedure, we would need an "intractKey" method on the comparators > > that would be the opposite of extractKeys, i.e. which would set the > > key fields. > > > > Another difference between the FieldAccessor world and the > > ExpressionKeys world is that in the aggregations we need to deal with > > only one field. This means that enhancing the comparators with the > > exact opposite of extractKeys would be overkill, because we don't need > > to set multiple fields at a time. > > > > It certainly seems to be a good idea to unify these at least > > externally (from the side of the users), but I'm not sure what would > > be a neat way to also internally unify them. > > > > Best, > > Gábor > > > > > > 2016-10-26 20:54 GMT+02:00 Fabian Hueske <fhue...@gmail.com>: > > > Hi everybody, > > > > > > while reviewing PR #2094 [1] I noticed that the field reference syntax > > for > > > FieldAccessors is not compatible with the syntax supported for key > > > definitions (ExpressionKeys) used in groupBy(), keyBy(), > > > join().where().equalTo(), etc. > > > > > > FieldAccessors are only used for build-in aggregations in the > DataStream > > > API (sum(), min(), max(), ...). > > > > > > In particular I identified the following inconsistencies: > > > > > > - FieldAccessors allow to address array cells. ExpressionKeys treat > > arrays > > > as AtomicTypes (Array TypeInfos do not extend CompositeType). Hence, it > > is > > > not possible to address array cells. > > > - ExpressionKeys do only support Integer keys for tuples. An atomic > type > > > can only be addressed with "*". FieldAccessors allow to address > > AtomicTypes > > > with 0 in addtion to "*". > > > - ExpressionKeys support to address fields of Java tuples with "f2" and > > > Scala tuple fields with "_3". FieldAccessors do not support the "f" or > > "_" > > > prefix. > > > > > > I would like to propose to adapt the syntax of both mechanisms > (ideally, > > > both should use the same code for validation). IMO, the ExpressionKey > > > syntax much more widely used and is well designed. Therefore, I would > > adopt > > > it for FieldAccessors as well. However, that would mean to restrict the > > > syntax of the FieldAccessors and might break existing code. > > > > > > What do others think? > > > > > > [1] https://github.com/apache/flink/pull/2094 > > >