Error when trying to read from S3

2021-02-10 Thread Nir Gazit
Hey,
I'm getting this error:
apache_beam.io.filesystem.BeamIOError: Match operation failed with
exceptions {'s3://fiverr-data-science-de
v/beam_poc/beam/wc/input.txt': BeamIOError("exists() operation failed with
exceptions {'s3://fiverr-data-sc
ience-dev/beam_poc/beam/wc/input.txt': ValueError('Must provide one of
client or options')}")} [while runni
ng
'ReadFromText/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/PairWithRestriction0']
with except
ions None

When trying to run a simple pipeline over Flink with an external
environment.

What could be the problem?

Thanks!


Apache Beam UX Research Findings

2021-02-10 Thread Carlos Camacho
BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//ical.marudot.com//iCal Event Maker
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:America/Chicago
TZURL:http://tzurl.org/zoneinfo-outlook/America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T02
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T02
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210208T204137Z
UID:20210208t204137z-772146...@marudot.com
DTSTART;TZID=America/Chicago:20210211T11
DTEND;TZID=America/Chicago:20210211T12
SUMMARY:Apache Beam UX Research Readout
URL:https://meet.google.com/xfc-majk-byk
DESCRIPTION:This event has a video call.\nJoin: https://meet.google.com/xfc-majk-byk\n\nSome relevant information about the session:\n-It will be 60 minutes long\n-It will be held via Meet\n-The session will be recorded for those who are not able to attend
BEGIN:VALARM
ACTION:DISPLAY
DESCRIPTION:Apache Beam UX Research Readout
TRIGGER:-PT5M
END:VALARM
END:VEVENT
END:VCALENDAR
BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//ical.marudot.com//iCal Event Maker
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:America/Chicago
TZURL:http://tzurl.org/zoneinfo-outlook/America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T02
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T02
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210208T204137Z
UID:20210208t204137z-772146...@marudot.com
DTSTART;TZID=America/Chicago:20210211T11
DTEND;TZID=America/Chicago:20210211T12
SUMMARY:Apache Beam UX Research Readout
URL:https://meet.google.com/xfc-majk-byk
DESCRIPTION:This event has a video call.\nJoin: https://meet.google.com/xfc-majk-byk\n\nSome relevant information about the session:\n-It will be 60 minutes long\n-It will be held via Meet\n-The session will be recorded for those who are not able to attend
BEGIN:VALARM
ACTION:DISPLAY
DESCRIPTION:Apache Beam UX Research Readout
TRIGGER:-PT5M
END:VALARM
END:VEVENT
END:VCALENDAR

Re: Error when trying to read from S3

2021-02-10 Thread Nir Gazit
+d...@beam.apache.org 

Digging more into the code it looks more like a bug that was introduced in
a recent PR (https://github.com/apache/beam/pull/13180). It seems that
pipeline_options is absent when using external workers and it causes the
S3IO to fail in an assertion in its constructor.

I wonder what is the reason this assertion was added as it seems that it
hasn't been there in the past.

Any assistance would be appreciated!

Nir

On Wed, Feb 10, 2021 at 6:49 PM Nir Gazit  wrote:

> Hey,
> I'm getting this error:
> apache_beam.io.filesystem.BeamIOError: Match operation failed with
> exceptions {'s3://fiverr-data-science-de
> v/beam_poc/beam/wc/input.txt': BeamIOError("exists() operation failed with
> exceptions {'s3://fiverr-data-sc
> ience-dev/beam_poc/beam/wc/input.txt': ValueError('Must provide one of
> client or options')}")} [while runni
> ng
> 'ReadFromText/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/PairWithRestriction0']
> with except
> ions None
>
> When trying to run a simple pipeline over Flink with an external
> environment.
>
> What could be the problem?
>
> Thanks!
>


Apache Beam SQL and UDF

2021-02-10 Thread Talat Uyarer
Hi,

We plan to use UDF on our sql. We want to achieve some kind of
filtering based on internal states. We want to update that internal state
with a separate async thread in UDF. Before implementing that thing I want
to get your options. Is there any limitation for UDF to have multi-thread
implementation ?  Our UDF is a scalar function. It will get 1 or 2 input
and return boolean.

I will appreciate your comments in advance.

Thanks


Re: Apache Beam SQL and UDF

2021-02-10 Thread Rui Wang
The problem that I can think of is maybe before the async call is
completed, the UDF life cycle has reached to the end.


-Rui

On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer 
wrote:

> Hi,
>
> We plan to use UDF on our sql. We want to achieve some kind of
> filtering based on internal states. We want to update that internal state
> with a separate async thread in UDF. Before implementing that thing I want
> to get your options. Is there any limitation for UDF to have multi-thread
> implementation ?  Our UDF is a scalar function. It will get 1 or 2 input
> and return boolean.
>
> I will appreciate your comments in advance.
>
> Thanks
>


Re: Apache Beam SQL and UDF

2021-02-10 Thread Talat Uyarer
Does beam create UDF function for every bundle or in setup of pipeline ?

I will keep internal state in memory. The Async thread will update that in
memory state based on an interval such as every hour etc. If beam keeps UDF
instance more than one bundle it is ok for me.


On Wed, Feb 10, 2021, 12:37 PM Rui Wang  wrote:

> The problem that I can think of is maybe before the async call is
> completed, the UDF life cycle has reached to the end.
>
>
> -Rui
>
> On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer <
> tuya...@paloaltonetworks.com> wrote:
>
>> Hi,
>>
>> We plan to use UDF on our sql. We want to achieve some kind of
>> filtering based on internal states. We want to update that internal state
>> with a separate async thread in UDF. Before implementing that thing I want
>> to get your options. Is there any limitation for UDF to have multi-thread
>> implementation ?  Our UDF is a scalar function. It will get 1 or 2 input
>> and return boolean.
>>
>> I will appreciate your comments in advance.
>>
>> Thanks
>>
>


Re: Apache Beam UX Research Findings

2021-02-10 Thread Ramesh Mathikumar
I am still awaiting my gift voucher :)


On Wed, 10 Feb 2021 at 16:51, Carlos Camacho 
wrote:

> Hi everyone,
> *Thank you for helping us choose a date and time for our User Experience
> Research Findings Readout for Apache Beam.*
>
> The winner option is *Thursday, February 11th at 11:00 AMCST / 6:00 PM
> CEST. *
>
> Some relevant information about the session:
>
>- It will be 60 minutes long
>- It will be held via Meet
>- The session will be recorded for those who are not able to attend
>
>
>
> *Don't forget to add the event to your calendar! *
> Thank you all, and we will see you soon!
>
> --
>
> Carlos Camacho | WIZELINE
>
> UX Designer
>
> carlos.cama...@wizeline.com
>
> Amado Nervo 2200, Esfera P6, Col. Jardines del Sol, 45050 Zapopan, Jal.
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*


Re: Apache Beam SQL and UDF

2021-02-10 Thread Talat Uyarer
Thanks Rui to remind me lifecycle of UDF. LOoks liek there is no any
lifecycle. I checked the code looks like we create UDF's instance for each
message:

org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.SqlFunctions.isTrue(new
> com.paloaltonetworks.cortex.streamcompute.functions.MyUDFFunction().apply(current.getString(2))


Do you think we should put the UDF instance in the setup function and call
in processlement ? Do you see anything besides this ? Or How can i achieve
my goal in a different way ? I thought I could use the Table Provider
approach. But it does not update data with any mechanisim.

Thanks


On Wed, Feb 10, 2021 at 12:41 PM Talat Uyarer 
wrote:

> Does beam create UDF function for every bundle or in setup of pipeline ?
>
> I will keep internal state in memory. The Async thread will update that in
> memory state based on an interval such as every hour etc. If beam keeps UDF
> instance more than one bundle it is ok for me.
>
>
> On Wed, Feb 10, 2021, 12:37 PM Rui Wang  wrote:
>
>> The problem that I can think of is maybe before the async call is
>> completed, the UDF life cycle has reached to the end.
>>
>>
>> -Rui
>>
>> On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> Hi,
>>>
>>> We plan to use UDF on our sql. We want to achieve some kind of
>>> filtering based on internal states. We want to update that internal state
>>> with a separate async thread in UDF. Before implementing that thing I want
>>> to get your options. Is there any limitation for UDF to have multi-thread
>>> implementation ?  Our UDF is a scalar function. It will get 1 or 2 input
>>> and return boolean.
>>>
>>> I will appreciate your comments in advance.
>>>
>>> Thanks
>>>
>>