Error when trying to read from S3
Hey, I'm getting this error: apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions {'s3://fiverr-data-science-de v/beam_poc/beam/wc/input.txt': BeamIOError("exists() operation failed with exceptions {'s3://fiverr-data-sc ience-dev/beam_poc/beam/wc/input.txt': ValueError('Must provide one of client or options')}")} [while runni ng 'ReadFromText/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/PairWithRestriction0'] with except ions None When trying to run a simple pipeline over Flink with an external environment. What could be the problem? Thanks!
Apache Beam UX Research Findings
BEGIN:VCALENDAR VERSION:2.0 PRODID:-//ical.marudot.com//iCal Event Maker CALSCALE:GREGORIAN BEGIN:VTIMEZONE TZID:America/Chicago TZURL:http://tzurl.org/zoneinfo-outlook/America/Chicago X-LIC-LOCATION:America/Chicago BEGIN:DAYLIGHT TZOFFSETFROM:-0600 TZOFFSETTO:-0500 TZNAME:CDT DTSTART:19700308T02 RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU END:DAYLIGHT BEGIN:STANDARD TZOFFSETFROM:-0500 TZOFFSETTO:-0600 TZNAME:CST DTSTART:19701101T02 RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20210208T204137Z UID:20210208t204137z-772146...@marudot.com DTSTART;TZID=America/Chicago:20210211T11 DTEND;TZID=America/Chicago:20210211T12 SUMMARY:Apache Beam UX Research Readout URL:https://meet.google.com/xfc-majk-byk DESCRIPTION:This event has a video call.\nJoin: https://meet.google.com/xfc-majk-byk\n\nSome relevant information about the session:\n-It will be 60 minutes long\n-It will be held via Meet\n-The session will be recorded for those who are not able to attend BEGIN:VALARM ACTION:DISPLAY DESCRIPTION:Apache Beam UX Research Readout TRIGGER:-PT5M END:VALARM END:VEVENT END:VCALENDAR BEGIN:VCALENDAR VERSION:2.0 PRODID:-//ical.marudot.com//iCal Event Maker CALSCALE:GREGORIAN BEGIN:VTIMEZONE TZID:America/Chicago TZURL:http://tzurl.org/zoneinfo-outlook/America/Chicago X-LIC-LOCATION:America/Chicago BEGIN:DAYLIGHT TZOFFSETFROM:-0600 TZOFFSETTO:-0500 TZNAME:CDT DTSTART:19700308T02 RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU END:DAYLIGHT BEGIN:STANDARD TZOFFSETFROM:-0500 TZOFFSETTO:-0600 TZNAME:CST DTSTART:19701101T02 RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20210208T204137Z UID:20210208t204137z-772146...@marudot.com DTSTART;TZID=America/Chicago:20210211T11 DTEND;TZID=America/Chicago:20210211T12 SUMMARY:Apache Beam UX Research Readout URL:https://meet.google.com/xfc-majk-byk DESCRIPTION:This event has a video call.\nJoin: https://meet.google.com/xfc-majk-byk\n\nSome relevant information about the session:\n-It will be 60 minutes long\n-It will be held via Meet\n-The session will be recorded for those who are not able to attend BEGIN:VALARM ACTION:DISPLAY DESCRIPTION:Apache Beam UX Research Readout TRIGGER:-PT5M END:VALARM END:VEVENT END:VCALENDAR
Re: Error when trying to read from S3
+d...@beam.apache.org Digging more into the code it looks more like a bug that was introduced in a recent PR (https://github.com/apache/beam/pull/13180). It seems that pipeline_options is absent when using external workers and it causes the S3IO to fail in an assertion in its constructor. I wonder what is the reason this assertion was added as it seems that it hasn't been there in the past. Any assistance would be appreciated! Nir On Wed, Feb 10, 2021 at 6:49 PM Nir Gazit wrote: > Hey, > I'm getting this error: > apache_beam.io.filesystem.BeamIOError: Match operation failed with > exceptions {'s3://fiverr-data-science-de > v/beam_poc/beam/wc/input.txt': BeamIOError("exists() operation failed with > exceptions {'s3://fiverr-data-sc > ience-dev/beam_poc/beam/wc/input.txt': ValueError('Must provide one of > client or options')}")} [while runni > ng > 'ReadFromText/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/PairWithRestriction0'] > with except > ions None > > When trying to run a simple pipeline over Flink with an external > environment. > > What could be the problem? > > Thanks! >
Apache Beam SQL and UDF
Hi, We plan to use UDF on our sql. We want to achieve some kind of filtering based on internal states. We want to update that internal state with a separate async thread in UDF. Before implementing that thing I want to get your options. Is there any limitation for UDF to have multi-thread implementation ? Our UDF is a scalar function. It will get 1 or 2 input and return boolean. I will appreciate your comments in advance. Thanks
Re: Apache Beam SQL and UDF
The problem that I can think of is maybe before the async call is completed, the UDF life cycle has reached to the end. -Rui On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer wrote: > Hi, > > We plan to use UDF on our sql. We want to achieve some kind of > filtering based on internal states. We want to update that internal state > with a separate async thread in UDF. Before implementing that thing I want > to get your options. Is there any limitation for UDF to have multi-thread > implementation ? Our UDF is a scalar function. It will get 1 or 2 input > and return boolean. > > I will appreciate your comments in advance. > > Thanks >
Re: Apache Beam SQL and UDF
Does beam create UDF function for every bundle or in setup of pipeline ? I will keep internal state in memory. The Async thread will update that in memory state based on an interval such as every hour etc. If beam keeps UDF instance more than one bundle it is ok for me. On Wed, Feb 10, 2021, 12:37 PM Rui Wang wrote: > The problem that I can think of is maybe before the async call is > completed, the UDF life cycle has reached to the end. > > > -Rui > > On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer < > tuya...@paloaltonetworks.com> wrote: > >> Hi, >> >> We plan to use UDF on our sql. We want to achieve some kind of >> filtering based on internal states. We want to update that internal state >> with a separate async thread in UDF. Before implementing that thing I want >> to get your options. Is there any limitation for UDF to have multi-thread >> implementation ? Our UDF is a scalar function. It will get 1 or 2 input >> and return boolean. >> >> I will appreciate your comments in advance. >> >> Thanks >> >
Re: Apache Beam UX Research Findings
I am still awaiting my gift voucher :) On Wed, 10 Feb 2021 at 16:51, Carlos Camacho wrote: > Hi everyone, > *Thank you for helping us choose a date and time for our User Experience > Research Findings Readout for Apache Beam.* > > The winner option is *Thursday, February 11th at 11:00 AMCST / 6:00 PM > CEST. * > > Some relevant information about the session: > >- It will be 60 minutes long >- It will be held via Meet >- The session will be recorded for those who are not able to attend > > > > *Don't forget to add the event to your calendar! * > Thank you all, and we will see you soon! > > -- > > Carlos Camacho | WIZELINE > > UX Designer > > carlos.cama...@wizeline.com > > Amado Nervo 2200, Esfera P6, Col. Jardines del Sol, 45050 Zapopan, Jal. > > > > > > > > > *This email and its contents (including any attachments) are being sent > toyou on the condition of confidentiality and may be protected by > legalprivilege. Access to this email by anyone other than the intended > recipientis unauthorized. If you are not the intended recipient, please > immediatelynotify the sender by replying to this message and delete the > materialimmediately from your system. Any further use, dissemination, > distributionor reproduction of this email is strictly prohibited. Further, > norepresentation is made with respect to any content contained in this > email.*
Re: Apache Beam SQL and UDF
Thanks Rui to remind me lifecycle of UDF. LOoks liek there is no any lifecycle. I checked the code looks like we create UDF's instance for each message: org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.SqlFunctions.isTrue(new > com.paloaltonetworks.cortex.streamcompute.functions.MyUDFFunction().apply(current.getString(2)) Do you think we should put the UDF instance in the setup function and call in processlement ? Do you see anything besides this ? Or How can i achieve my goal in a different way ? I thought I could use the Table Provider approach. But it does not update data with any mechanisim. Thanks On Wed, Feb 10, 2021 at 12:41 PM Talat Uyarer wrote: > Does beam create UDF function for every bundle or in setup of pipeline ? > > I will keep internal state in memory. The Async thread will update that in > memory state based on an interval such as every hour etc. If beam keeps UDF > instance more than one bundle it is ok for me. > > > On Wed, Feb 10, 2021, 12:37 PM Rui Wang wrote: > >> The problem that I can think of is maybe before the async call is >> completed, the UDF life cycle has reached to the end. >> >> >> -Rui >> >> On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer < >> tuya...@paloaltonetworks.com> wrote: >> >>> Hi, >>> >>> We plan to use UDF on our sql. We want to achieve some kind of >>> filtering based on internal states. We want to update that internal state >>> with a separate async thread in UDF. Before implementing that thing I want >>> to get your options. Is there any limitation for UDF to have multi-thread >>> implementation ? Our UDF is a scalar function. It will get 1 or 2 input >>> and return boolean. >>> >>> I will appreciate your comments in advance. >>> >>> Thanks >>> >>