Re: Apache Beam SQL and UDF

Talat Uyarer Wed, 10 Feb 2021 14:14:30 -0800

Thanks Rui to remind me lifecycle of UDF. LOoks liek there is no any
lifecycle. I checked the code looks like we create UDF's instance for each
message:


org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.SqlFunctions.isTrue(new
> com.paloaltonetworks.cortex.streamcompute.functions.MyUDFFunction().apply(current.getString(2))


Do you think we should put the UDF instance in the setup function and call
in processlement ? Do you see anything besides this ? Or How can i achieve
my goal in a different way ? I thought I could use the Table Provider
approach. But it does not update data with any mechanisim.

Thanks


On Wed, Feb 10, 2021 at 12:41 PM Talat Uyarer <tuya...@paloaltonetworks.com>
wrote:

> Does beam create UDF function for every bundle or in setup of pipeline ?
>
> I will keep internal state in memory. The Async thread will update that in
> memory state based on an interval such as every hour etc. If beam keeps UDF
> instance more than one bundle it is ok for me.
>
>
> On Wed, Feb 10, 2021, 12:37 PM Rui Wang <ruw...@google.com> wrote:
>
>> The problem that I can think of is maybe before the async call is
>> completed, the UDF life cycle has reached to the end.
>>
>>
>> -Rui
>>
>> On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> Hi,
>>>
>>> We plan to use UDF on our sql. We want to achieve some kind of
>>> filtering based on internal states. We want to update that internal state
>>> with a separate async thread in UDF. Before implementing that thing I want
>>> to get your options. Is there any limitation for UDF to have multi-thread
>>> implementation ?  Our UDF is a scalar function. It will get 1 or 2 input
>>> and return boolean.
>>>
>>> I will appreciate your comments in advance.
>>>
>>> Thanks
>>>
>>

Re: Apache Beam SQL and UDF

Reply via email to