Re: SqlTransform Metadata

Reza Rokni Tue, 14 May 2019 20:03:55 -0700

Hi,

One use case would be when dealing with the windowing functions for example:


SELECT f_int, COUNT(*) , TUMBLE_START(f_timestamp, INTERVAL '1' HOUR)
tumble_start
  FROM PCOLLECTION
  GROUP BY
    f_int,
    TUMBLE(f_timestamp, INTERVAL '1' HOUR)

For an element which is using Metadata to inform the EvenTime of the
element, rather than data within the element itself, I would need to create
a new schema which added the timestamp as a field. I think other examples
which maybe interesting is getting the value of a row with the max/min
timestamp. None of this would be difficult but it does feel a little on the
verbose side and also makes the pipeline a little harder to read.

Cheers
Reza





*From: *Kenneth Knowles <[email protected]>
*Date: *Wed, 15 May 2019 at 01:15
*To: *dev

We have support for nested rows so this should be easy. The .withMetadata
> would reify the struct, moving from Row to WindowedValue<Row> if I
> understand it...
>
> SqlTransform.query("SELECT field1 from PCOLLECTION"):
>
>     Schema = {
>       field1: type1,
>       field2: type2
>     }
>
>     SqlTransform.query(...)
>
> SqlTransform.withMetadata().query("SELECT event_timestamp, value.field1
> FROM PCOLLECTION")
>
>     Derived schema = {
>       event_timestamp: TIMESTAMP,
>       pane_info: { ... }
>       value: {
>         field1: type1,
>         field2: type2,
>         ...
>       }
>     }
>
> SqlTransform would expand into a different composite, and it would be a
> straightforward ParDo to adjust the data, possibly automatic via the new
> schema conversions.
>
> Embedding the window would be a bit wonky, something like { end_of_window:
> TIMESTAMP, encoded_window: bytes } which would be expensive due to
> encoding. But timestamp and pane info not so bad.
>
> Kenn
>
> *From: *Anton Kedin <[email protected]>
> *Date: *Tue, May 14, 2019 at 9:17 AM
> *To: * <[email protected]>
>
> Reza, can you share more thoughts on how you think this can work
>> end-to-end?
>>
>> Currently the approach is that populating the rows with the data happens
>> before the SqlTransform, and within the query you can only use the
>> things that are already in the rows or in the catalog/schema (or built-in
>> things). In general case populating the rows with any data can be solved
>> via a ParDo before SqlTransform. Do you think this approach lacks something
>> or maybe too verbose?
>>
>> My thoughts on this, lacking more info or concrete examples: in order to
>> access a timestamp value from within a query there has to be a syntax for
>> it. Field access expressions or function calls are the only things that
>> come to mind among existing syntax features that would allow that. Making
>> timestamp a field of the data row makes more sense to me here because in
>> Beam it is already a part of the element. It's not a result of a function
>> call and it's already easily accessible, doesn't make sense to build extra
>> functions here. One of the problems with both approaches however is the
>> potential conflicts with the existing schema of the data elements (or the
>> schema/catalog of the data source in general). E.g. if we add a magical
>> "event_timestamp" column or "event_timestamp()" function there may
>> potentially already exist a field or a function in the schema with this
>> name. This can be solved in couple of ways, but we will probably want to
>> provide a configuration mechanism to assign a different field/function
>> names in case of conflicts.
>>
>> Given that, it may make sense to allow users to attach the whole pane
>> info or some subset of it to the row (e.g. only the timestamp), and make
>> that configurable. However I am not sure whether exposing something like
>> pane info is enough and will cover a lot of useful cases. Plus adding
>> methods like `attachTimestamp("fieldname")` or
>> `attachWindowInfo("fieldname")` might open a portal to ever-increasing
>> collection of these `attachX()`, `attachY()` that can make the API less
>> usable. If on the other hand we would make it more generic then it will
>> probably have to look a lot like a ParDo or MapElements.via() anyway. And
>> at that point the question would be whether it makes sense to build
>> something extra that probably looks and functions like an existing feature.
>>
>> Regards,
>> Anton
>>
>>
>>
>> *From: *Andrew Pilloud <[email protected]>
>> *Date: *Tue, May 14, 2019 at 7:29 AM
>> *To: *dev
>>
>> Hi Reza,
>>>
>>> Where will this metadata be coming from? Beam SQL is tightly coupled
>>> with the schema of the PCollection, so adding fields not in the data would
>>> be difficult.
>>>
>>> If what you want is the timestamp out of the DoFn.ProcessContext we
>>> might be able to add a SQL function to fetch that.
>>>
>>> Andrew
>>>
>>> *From: *Reza Rokni <[email protected]>
>>> *Date: *Tue, May 14, 2019, 1:08 AM
>>> *To: * <[email protected]>
>>>
>>> Hi,
>>>>
>>>> What are folks thoughts about adding something like
>>>> SqlTransform.withMetadata().query(...)to enable users to be able to
>>>> access things like Timestamp information from within the query without
>>>> having to refiy the information into the element itself?
>>>>
>>>> Cheers
>>>> Reza
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> This email may be confidential and privileged. If you received this
>>>> communication by mistake, please don't forward it to anyone else, please
>>>> erase all copies and attachments, and please let me know that it has gone
>>>> to the wrong person.
>>>>
>>>> The above terms reflect a potential business arrangement, are provided
>>>> solely as a basis for further discussion, and are not intended to be and do
>>>> not constitute a legally binding obligation. No legally binding obligations
>>>> will be created, implied, or inferred until an agreement in final form is
>>>> executed in writing by all parties involved.
>>>>
>>>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.

Re: SqlTransform Metadata

Reply via email to