Some parts of schemas need a spec that is standardized cross language (in particular, the definition of what a schema is). Other things can be language specific. The next step with schemas will need to be formalizing those answer.
On Mon, Jan 7, 2019 at 2:13 PM Robert Burke <[email protected]> wrote: > In the eventual future where the Go SDK supports schemas, it should be > possible to use struct Field Tags to specify paths for extraction from > schema data, for usage similar to what Java uses parameter Annotations. > > eg. > > type MyKey struct { > K string `jsonpath:userid` > } > type MyValue struct{ > K string `jsonpath:userid` > Loc []Location `jsonpath:action.location.*` > } > > func MyDoFn(k MyKey, v MyValue) (...) {...} > > One could likely access any number of schema fields this way, and it would > be statically analyisable, so fast extraction would be possible at runtime, > rather than the default reflection paths. > > The would be agnostic to whichever path approach is decided on as the beam > standard approach. > > On Mon, Jan 7, 2019, 11:59 AM Reuven Lax <[email protected]> wrote: > >> I'll take a look. >> >> Honestly though, if we leave out features such as array slices this is a >> dirt-simple path syntax, that pretty much matches what SQL does. It's >> basically just field1.field2, or field1.*. >> >> JMESPath along with JsonPath also supports various aggregations, which I >> think is beyond the scope of what we want here; all that's needed here is a >> selector expression. AFAICT what I have is already a strict subset of >> JMESPath, though I'll take a closer look to make sure there are no semantic >> incompatibilities. >> >> Reuven >> >> On Mon, Jan 7, 2019 at 10:21 AM Jeff Klukas <[email protected]> wrote: >> >>> There is also JMESPath (http://jmespath.org/) which is quite similar to >>> JsonPath, but does have a spec and lacks the leading $ character. The AWS >>> CLI uses JMESPath for defining queries. >>> >>> >>> >>> On Mon, Jan 7, 2019 at 1:05 PM Reuven Lax <[email protected]> wrote: >>> >>>> >>>> >>>> On Mon, Jan 7, 2019 at 1:44 AM Robert Bradshaw <[email protected]> >>>> wrote: >>>> >>>>> On Sun, Jan 6, 2019 at 12:46 PM Reuven Lax <[email protected]> wrote: >>>>> > >>>>> > Some time ago, @Jean-Baptiste Onofré made the excellent suggestion >>>>> that we look into using JsonPath as a selector format for schema fields. >>>>> This provides a simple and natural way for users to select nested schema >>>>> fields, as well as wildcards. This would allow users to more simply select >>>>> nested fields using the Select transform, e.g.: >>>>> > >>>>> > p.apply(Select.fields("event.userid", "event.location.*"); >>>>> > >>>>> > It would also fit into NewDoFn (Java) like this: >>>>> > >>>>> > @ProcessElement >>>>> > public void process(@Field("userid") String userId, >>>>> > @Field("action.location.*") Location location) { >>>>> > } >>>>> > >>>>> > After some investigation, I believe that we're better off with >>>>> something very close to a subset of JsonPath, but not precisely JsonPath. >>>>> >>>>> I am very wary of creating something that's very close to, but not >>>>> quite, a (subset of) a well established standard. Is there >>>>> disadvantage to not being a strict actual subset? If we go this route, >>>>> we should at least ensure that any divergence is illegal JsonPath >>>>> rather than having different semantic meaning. >>>>> >>>> >>>> As far as I can tell, JsonPath isn't much of a "standard." There >>>> doesn't seem to be much of a spec other than implementation. >>>> >>>> For the most part, I am speaking of a strict subset of JsonPath. The >>>> only incompatibility is that JsonPath expressions all start with a '$' >>>> (which represents the root node). So in the above expression you would >>>> write "$.action.location.*" instead. I think staying closer to BeamSql >>>> syntax makes more sense here, and I would like to dispense with the need to >>>> begin with a $ character. JsonPath also assumes that each object is also a >>>> JavaScript object (which makes no sense here), and some of the JsonPath >>>> features are based on that. >>>> >>>> >>>>> > JsonPath has many features that are Javascript specific (e.g. the >>>>> ability to embed Javascript expressions), JsonPath also includes the >>>>> ability to do complex filtering and aggregation, which I don't think we >>>>> want here; Beam already provides the ability to do such filtering and >>>>> aggregation, and it's not needed here. One example of a change: JsonPath >>>>> queries always begin with $ (representing the root node), and I think >>>>> we're >>>>> better off not requiring that so that these queries look more like BeamSql >>>>> queries. >>>>> > >>>>> > I've created a small ANTLR grammar (which has the advantage that >>>>> it's easy to extend) for these expressions and have everything working in >>>>> a >>>>> branch. However there are a few more features of JsonPath that might be >>>>> useful here, and I wanted community feedback to see whether it's worth >>>>> implementing them. >>>>> > >>>>> > The first are array/map slices and selectors. Currently if a schema >>>>> contains an array (or map) field, you can only select all elements of the >>>>> array or map. JsonPath however supports selecting and slicing the array. >>>>> For example, consider the following: >>>>> > >>>>> > @DefaultSchema(JavaFieldSchema.class) >>>>> > public class Event { >>>>> > public final String userId; >>>>> > public final List<Action> actions; >>>>> > } >>>>> > >>>>> > Currently you can apply Select.fields("actions.location"), and that >>>>> will return a schema containing a list of Locations, one for every action >>>>> in the original event. If we allowed slicing, you could instead write >>>>> Select.fields("actions[0:9].locations"), which would do the same but only >>>>> for the first 10 elements of the array. >>>>> > >>>>> > Is this useful in Beam? It would not be hard to implement, but I >>>>> want to see what folks think first. >>>>> > >>>>> > The second feature is recursive field selection. The example often >>>>> given in JsonPath is a Json document containing the inventory for a store. >>>>> There are lists of subobjects representing books, bicycles, tables, >>>>> chairs, >>>>> etc. etc. The JsonPath query "$..price" recursively finds every object >>>>> that >>>>> has a field named price, and returns those prices; in this case it returns >>>>> the price of every element in the store. >>>>> > >>>>> > I'm a bit less convinced that recursive field selection is useful in >>>>> Beam. The usual example for Json involves a document that represents an >>>>> entire corpus, e.g. a store inventory. In Beam, the schemas are applied to >>>>> individual records, and I don't know how often there will be a use for >>>>> this >>>>> sort of recursive selection. However I could be wrong here, so if anyone >>>>> has a good use case for this sort of selection, please let me know. >>>>> >>>>> Records often contain lists, e.g. the record could be an order, and it >>>>> could be useful to select on the price of the items (just to throw it >>>>> out there). >>>>> >>>> >>>> BTW, that already works. The .. operator in JsonPath is a recursive >>>> field search, across any lists or records that are lower in the tree. >>>> >>>
