Automatically dereferencing, basically. It is nice. Especially for many-to-many relationships like the example. I don't know if the aggregation is any different though, is it?
Kenn On Sun, Oct 29, 2023 at 1:12 PM Robert Burke <rob...@frantil.com> wrote: > I came across Edge DB, and it has a novel syntax moving away from SQL with > their EdgeQL. > > https://www.edgedb.com/ > > Eg. Heere are two equivalent "nested" queries. > > > # EdgeQL > > select Movie { > title, > actors: { > name > }, > rating := math::mean(.reviews.score) > } filter "Zendaya" in .actors.name; > > > # SQL > > SELECT > title, > Actors.name AS actor_name, > (SELECT avg(score) > FROM Movie_Reviews > WHERE movie_id = Movie.id) AS rating > FROM > Movie > LEFT JOIN Movie_Actors ON > Movie.id = Movie_Actors.movie_id > LEFT JOIN Person AS Actors ON > Movie_Actors.person_id = Person.id > WHERE > 'Zendaya' IN ( > SELECT Person.name > FROM > Movie_Actors > INNER JOIN Person > ON Movie_Actors.person_id = Person.id > WHERE > Movie_Actors.movie_id = Movie.id) > > > The key observations here are specifics around join kinds and stuff don't > often need to be directly expressed in the query. > > I'd need to dig deeper around it (such as do they share... ) but it does > do a nice first impression of demos. > > > On Mon, Oct 23, 2023, 7:00 AM XQ Hu via dev <dev@beam.apache.org> wrote: > >> +1 on your proposal. >> >> On Fri, Oct 20, 2023 at 4:59 PM Robert Bradshaw via dev < >> dev@beam.apache.org> wrote: >> >>> On Fri, Oct 20, 2023 at 11:35 AM Kenneth Knowles <k...@apache.org> >>> wrote: >>> > >>> > A couple other bits on having an expression language: >>> > >>> > - You already have Python lambdas at places, right? so that's quite a >>> lot more complex than SQL project/aggregate expressions >>> > - It really does save a lot of pain for users (at the cost of >>> implementation complexity) when you need to "SUM(col1*col2)" where >>> otherwise you have to Map first. This could be viewed as desirable as well, >>> of course. >>> > >>> > Anyhow I'm pretty much in agreement with all your reasoning as to why >>> *not* to use SQL-like expressions in strings. But it does seem odd when >>> juxtaposed with Python snippets. >>> >>> Well, we say "here's a Python expression" when we're using a Python >>> string. But "SUM(col1*col2)" isn't as transparent. (Agree about the >>> niceties of being able to provide an expression rather than a column.) >>> >>> > On Thu, Oct 19, 2023 at 4:00 PM Robert Bradshaw via dev < >>> dev@beam.apache.org> wrote: >>> >> >>> >> On Thu, Oct 19, 2023 at 12:53 PM Reuven Lax <re...@google.com> wrote: >>> >> > >>> >> > Is the schema Group transform (in Java) something along these lines? >>> >> >>> >> Yes, for sure it is. It (and Python's and Typescript's equivalent) are >>> >> linked in the original post. The open question is how to best express >>> >> this in YAML. >>> >> >>> >> > On Wed, Oct 18, 2023 at 1:11 PM Robert Bradshaw via dev < >>> dev@beam.apache.org> wrote: >>> >> >> >>> >> >> Beam Yaml has good support for IOs and mappings, but one key >>> missing >>> >> >> feature for even writing a WordCount is the ability to do >>> Aggregations >>> >> >> [1]. While the traditional Beam primitive is GroupByKey (and >>> >> >> CombineValues), we're eschewing KVs in the notion of more schema'd >>> >> >> data (which has some precedence in our other languages, see the >>> links >>> >> >> below). The key components the user needs to specify are (1) the >>> key >>> >> >> fields on which the grouping will take place, (2) the fields >>> >> >> (expressions?) involved in the aggregation, and (3) what >>> aggregating >>> >> >> fn to use. >>> >> >> >>> >> >> A straw-man example could be something like >>> >> >> >>> >> >> type: Aggregating >>> >> >> config: >>> >> >> key: [field1, field2] >>> >> >> aggregating: >>> >> >> total_cost: >>> >> >> fn: sum >>> >> >> value: cost >>> >> >> max_cost: >>> >> >> fn: max >>> >> >> value: cost >>> >> >> >>> >> >> This would basically correspond to the SQL expression >>> >> >> >>> >> >> "SELECT field1, field2, sum(cost) as total_cost, max(cost) as >>> max_cost >>> >> >> from table GROUP BY field1, field2" >>> >> >> >>> >> >> (though I'm not requiring that we use this as an implementation >>> >> >> strategy). I do not think we need a separate (non aggregating) >>> >> >> Grouping operation, this can be accomplished by having a >>> concat-style >>> >> >> combiner. >>> >> >> >>> >> >> There are still some open questions here, notably around how to >>> >> >> specify the aggregation fns themselves. We could of course provide >>> a >>> >> >> number of built-ins (like SQL does). This gets into the question of >>> >> >> how and where to document this complete set, but some basics should >>> >> >> take us pretty far. Many aggregators, however, are parameterized >>> (e.g. >>> >> >> quantiles); where do we put the parameters? We could go with >>> something >>> >> >> like >>> >> >> >>> >> >> fn: >>> >> >> type: ApproximateQuantiles >>> >> >> config: >>> >> >> n: 10 >>> >> >> >>> >> >> but others are even configured by functions themselves (e.g. >>> LargestN >>> >> >> that wants a comparator Fn). Maybe we decide not to support these >>> >> >> (yet?) >>> >> >> >>> >> >> One thing I think we should support, however, is referencing custom >>> >> >> CombineFns. We have some precedent for this with our Fns from >>> >> >> MapToFields, where we accept things like inline lambdas and >>> external >>> >> >> references. Again the topic of how to configure them comes up, as >>> >> >> these custom Fns are more likely to be parameterized than Map Fns >>> >> >> (though, to be clear, perhaps it'd be good to allow >>> parameterizatin of >>> >> >> MapFns as well). Maybe we allow >>> >> >> >>> >> >> language: python. # like MapToFields (and here it'd be harder to >>> mix >>> >> >> and match per Fn) >>> >> >> fn: >>> >> >> type: ??? >>> >> >> # should these be nested as config? >>> >> >> name: fully.qualiied.name >>> >> >> path: /path/to/defining/file >>> >> >> args: [...] >>> >> >> kwargs: {...} >>> >> >> >>> >> >> which would invoke the constructor. >>> >> >> >>> >> >> I'm also open to other ways of naming/structuring these essential >>> >> >> parameters if it makes things more clear. >>> >> >> >>> >> >> - Robert >>> >> >> >>> >> >> >>> >> >> Java: >>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/Group.html >>> >> >> Python: >>> https://beam.apache.org/documentation/transforms/python/aggregation/groupby >>> >> >> Typescript: >>> https://beam.apache.org/releases/typedoc/current/classes/transforms_group_and_combine.GroupBy.html >>> >> >> >>> >> >> [1] One can of course use SqlTransform for this, but I'm leaning >>> >> >> towards offering something more native. >>> >>