Re: Question about unbounded in-memory PCollection

2019-05-07 Thread Rui Wang
Does TestStream.java [1] satisfy your need? -Rui [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestStream.java On Tue,

Re: SQL massively more resource-intensive? Memory leak?

2019-06-03 Thread Rui Wang
BeamSQL actually only converts SELECT COUNT(*) query to the Java pipeline that calls Java's builtin Count[1] transform. Could you implement your pipeline by Count transform to see whether this memory issue still exists? By doing so we could narrow down problem a bit. If using Java directly without

Re: SQL massively more resource-intensive? Memory leak?

2019-06-03 Thread Rui Wang
Ha sorry I was only reading screenshots but ignored your other comments. So count fn indeed worked. Can I ask if your sql pipeline works on direct runner? -Rui On Mon, Jun 3, 2019 at 10:39 AM Rui Wang wrote: > BeamSQL actually only converts SELECT COUNT(*) query to the Java pipeline >

Re: SQL massively more resource-intensive? Memory leak?

2019-06-04 Thread Rui Wang
rs of running it, but by the end the memory usage was high and > the CPU about 100% so it seems to be the same problem. > > Worth noting perhaps that when I use the DirectRunner I have to turn > enforceImmutability off because of > https://issues.apache.org/jira/browse/BEAM-1714 >

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-06-19 Thread Rui Wang
PR/8376 is merged and it should be in 2.14.0 release. -Rui On Mon, Apr 22, 2019 at 10:49 AM Rui Wang wrote: > I see. I created this PR [1] to ask feedback from the reviewer who knows > better on Avro in Beam. > > -Rui > > > [1]: https://github.com/apache/beam/pull/8376

Re:

2019-06-20 Thread Rui Wang
I wrote some tests on nested row selection in BeamSQL[1]. Those test cases test some behaviors of nested row selection that BeamSQL supports(but it's not a complete list). You could check what are tested so that are supported. Also it's welcome to extend those tests to cover more behaviors. [1]:

Re:

2019-06-20 Thread Rui Wang
/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java -Rui On Thu, Jun 20, 2019 at 9:58 AM Rui Wang wrote: > I wrote some tests on nested row selection in BeamSQL[1]. Those test cases > test some behaviors of nested row selection that BeamSQL supports(but it's >

Re:

2019-06-24 Thread Rui Wang
of > primitive types. In my case the value is of type Row. > So can you let me know whether this is supported or this is a bug ? > > > *Thanks & Regards,* > > *Vishwas * > > > On Thu, Jun 20, 2019 at 11:25 PM Rui Wang wrote: > >> Oops I made a mistake, I di

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
It could be just straightforward to create a SortedMapCoder for TreeMap. Just add checks on map instances and then change verifyDeterministic. If this is a common need we could just submit it into Beam repo. [1]: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/b

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
ether I > might be able to figure that out. I can duplicate MapCoder and try to make > changes, but how will beam know to pick up that coder for a groupByKey? > > Thanks! > Shannon > > On Thu, Jul 11, 2019 at 4:42 PM Rui Wang wrote: > >> It could be just straightforward

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
:55 PM Shannon Duncan wrote: > So ArrayList doesn't work either, so just a standard List? > > On Thu, Jul 11, 2019 at 4:53 PM Rui Wang wrote: > >> Shannon, I agree with Mike on List is a good workaround if your element >> within list is deterministic and you are eager t

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
coders. > > - Shannon > > On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan > wrote: > >> Will do. Thanks. A new coder for deterministic Maps would be great in the >> future. Thank you! >> >> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang wrote: >> >>>

Re: pubsub -> IO

2019-07-15 Thread Rui Wang
+user@beam.apache.org -Rui On Mon, Jul 15, 2019 at 6:55 AM Chaim Turkel wrote: > Hi, > I am looking to write a pipeline that read from a mongo collection. > I would like to listen to a pubsub that will have a object that will > tell me which collection and which time frame. > Is there a

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Rui Wang
Another approach is to let BeamSQL support it natively, as the title of this thread says: "as a Table in BeamSQL". We might be able to define a table with properties that says this table return a PCollectionView. By doing so we will have a trigger based PCollectionView available in SQL rel nodes,

Re: applying keyed state on top of stream from co-groupByKey output

2019-07-25 Thread Rui Wang
I also added an alternatively solution to your example to the SO question. -Rui On Thu, Jul 25, 2019 at 9:02 AM Kenneth Knowles wrote: > Thanks for the very detailed question! I have written up an answer and I > suggest we continue discussion there. > > Kenn > > On Tue, Jul 23, 2019 at 9:11 PM

Re: apache beam 2.16.0 ?

2019-09-18 Thread Rui Wang
Hi, You can search(or ask) in dev@ for the progress of 2.16.0. The answer is the release of 2.16.0 is ongoing and it will be released once blockers are solved. -Rui On Wed, Sep 18, 2019 at 9:34 PM Yu Watanabe wrote: > Hello. > > I would like to use 2.16.0 to diagnose container problem, howev

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Rui Wang
Thank you Udi for taking care of Beam 2.18.0 release! -Rui On Tue, Jan 28, 2020 at 10:59 AM Udi Meiri wrote: > The Apache Beam team is pleased to announce the release of version 2.18.0. > > Apache Beam is an open source unified programming model to define and > execute data processing pipelin

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-14 Thread Rui Wang
Calcite has improved to reconstruct ROW back in the output. See [1]. Beam need to update Calcite dependency to > 1.21 to adopt that. [1]: https://jira.apache.org/jira/browse/CALCITE-3138 -Rui On Thu, Feb 13, 2020 at 9:05 PM Talat Uyarer wrote: > Hi, > > I am trying to Beam SQL. But somethin

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-14 Thread Rui Wang
che/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L167 > > On Fri, Feb 14, 2020 at 10:33 AM Rui Wang wrote: > >> Calcite has improved to reconstruct ROW back in the output. See [1]. Beam >> need to update Calcite dependency to > 1.21 to adopt that. >> >> >&g

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-15 Thread Rui Wang
; Do you mean they were flattened before by calcite or Does beam flatten > them too ? > > > > On Fri, Feb 14, 2020 at 1:21 PM Rui Wang wrote: > >> Nested row types might be less well supported (r.g. Row) because they >> were flattened before anyway. >> >&

Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread Rui Wang
SQL does not support such joins with your requirement: write to sink after every 1 min after window closes. You might can use state and timer API to achieve your goal. -Rui On Mon, Feb 24, 2020 at 9:50 AM shanta chakpram wrote: > Hi, > > I am trying to join inputs from Unbounded Sources then

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-24 Thread Rui Wang
for it. > > Talat > > On Sat, Feb 15, 2020 at 6:26 PM Rui Wang wrote: > >> Because Calcite flattened Row so BeamSQL didn't need to deal with nested >> Row structure (as they were flattened in LogicalPlan). >> >> Depends on how that patch works. Neste

Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread Rui Wang
Sorry please remove " .apply(Window.into(FixedWindows.of(1 minute))" from the query above. -Rui On Mon, Feb 24, 2020 at 5:26 PM Rui Wang wrote: > I see. So I guess I wasn't fully understand the requirement: > > Do you want to have a 1-min window join on two unbounde

Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread Rui Wang
e waiting. Today the stream-to-stream > join will do a CoGroupByKey so it will wait. But SQL may in the future > adopt a better join for this case that can output records with lower > latency. > > It may be a bigger question whether HCatalogIO.write() has all the knobs > you woul

[ANNOUNCE] Beam 2.20.0 Released

2020-04-16 Thread Rui Wang
. -- Rui Wang, on behalf of The Apache Beam team

Re: [ANNOUNCE] Beam 2.20.0 Released

2020-04-16 Thread Rui Wang
Note that due to a bug on infrastructure, the website change failed to publish. But 2.20.0 artifacts are available to use right now. -Rui On Thu, Apr 16, 2020 at 11:45 AM Rui Wang wrote: > The Apache Beam team is pleased to announce the release of version 2.20.0. > > Apache Beam i

Re: Error when using @DefaultCoder(AvroCoder.class)

2020-07-08 Thread Rui Wang
Tried some code search in Beam repo but I didn't find the exact line of code that throws your exception. However, I believe for Java Classes you used in primitives (ParDo, CombineFn) and coders, it's very likely you need to make them serializable (i.e. implements Serializable). -Rui On Wed, Jul

Re: Out-of-orderness of window results when testing stateful operators with TextIO

2020-08-23 Thread Rui Wang
Current Beam model does not guarantee an ordering after a GBK (i.e. Combine.perKey() in your). So you cannot expect that the C step sees elements in a specific order. As I recall on Dataflow runner, there is very limited ordering support. Hi +Reuven Lax can share your insights about it? -Rui

Re: How to integrate Beam SQL windowing query with KafkaIO?

2020-08-24 Thread Rui Wang
Hi, I checked the query in your SO question and I think the SQL usage is correct. My current guess is that the problem is how does watermark generate and advance in KafkaIO. It could be either the watermark didn't pass the end of your SQL window for aggregation or the data was lagging behind the

Re: How to integrate Beam SQL windowing query with KafkaIO?

2020-08-27 Thread Rui Wang
it can correctly materialize the window output as > expected. Thank you so much for your help! > > Thanks & Regards, > Minreng > > > On Mon, Aug 24, 2020 at 1:58 PM Rui Wang wrote: > >> Hi, >> >> I checked the query in your SO question and I thi

Re: Count based triggers and latency

2020-10-12 Thread Rui Wang
On Mon, Oct 12, 2020 at 9:23 PM KV 59 wrote: > Thanks for your responses. > > I have a follow-up question, when you say > >> elementCountAtLeast means that the runner can buffer as many as it wants >> and can decide to offer a low latency pipeline by triggering often or >> better throughput throu

Re: [ANNOUNCE] Beam 2.25.0 Released

2020-10-26 Thread Rui Wang
Thank you Robin! -Rui On Mon, Oct 26, 2020 at 11:44 AM Pablo Estrada wrote: > Thanks Robin! > > On Mon, Oct 26, 2020 at 11:06 AM Robin Qiu wrote: > >> The Apache Beam team is pleased to announce the release of version 2.25.0. >> >> Apache Beam is an open source unified programming model to de

Re: Beam SQL UDF with variable arguments list?

2021-01-26 Thread Rui Wang
Yes I think Calcite does not support varargs in for scalar function (so in UDF). Please check this JIRA: https://issues.apache.org/jira/browse/CALCITE-2772 -Rui On Tue, Jan 26, 2021 at 2:04 AM Niels Basjes wrote: > Hi, > > I want to define a Beam SQL user defined function that accepts a variab

Re: Apache Beam SQL and UDF

2021-02-10 Thread Rui Wang
The problem that I can think of is maybe before the async call is completed, the UDF life cycle has reached to the end. -Rui On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer wrote: > Hi, > > We plan to use UDF on our sql. We want to achieve some kind of > filtering based on internal states. We wa

Re: Regarding the over window query support from Beam SQL

2021-03-02 Thread Rui Wang
On Tue, Mar 2, 2021 at 9:24 AM Tao Li wrote: > + Rui Wang. Looks like Rui has been working on this jira. > > > > > > *From: *Tao Li > *Date: *Monday, March 1, 2021 at 9:51 PM > *To: *"user@beam.apache.org" > *Subject: *Regarding the over window que

Re: Regarding the over window query support from Beam SQL

2021-03-05 Thread Rui Wang
*Tao Li > *Reply-To: *"user@beam.apache.org" > *Date: *Tuesday, March 2, 2021 at 3:37 PM > *To: *Rui Wang > *Cc: *"user@beam.apache.org" > *Subject: *Re: Regarding the over window query support from Beam SQL > > > > Hi Rui, > > > > Tha

Re: coder issue?

2018-08-16 Thread Rui Wang
Hi Mahesh, I think I had the same NPE when I explored self defined combineFn. I think your combineFn might still need to define a coder to help Beam run it in distributed environment. Beam tries to invoke coder somewhere and then throw a NPE as there is no one defined. Here is a PR I wrote that d

Re: coder issue?

2018-08-16 Thread Rui Wang
Sorry I forgot to attach the PR link: https://github.com/apache/beam/pull/6154/files#diff-7358f3f0511940ea565e6584f652ed02R342 -Rui On Thu, Aug 16, 2018 at 12:13 PM Rui Wang wrote: > Hi Mahesh, > > I think I had the same NPE when I explored self defined combineFn. I think > yo

Re: coder issue?

2018-08-17 Thread Rui Wang
accessed, you got the >> exception. >> >> By initializing the Accum class you should be able to fix this problem. >> (e.g. String line = "";, instead of only String line;) >> >> Hope this helps, >> Robin >> >> On Thu, Aug 16, 2018 at 12

Fwd: [external-thread] at-least once with job changes on Beam KinesisIO

2018-11-08 Thread Rui Wang
to user@beam. -- Forwarded message - From: Pramod Rao Date: Wed, Nov 7, 2018 at 5:59 PM Subject: Re: [external-thread] at-least once with job changes on Beam KinesisIO To: Fei Xue Cc: , , Parviz Deyhim < dey...@google.com>, Ryan McDowell I see that the KinesisIO is checkpoin

Re: [Call for items] November Beam Newsletter

2018-11-13 Thread Rui Wang
Hi, I just added some thing related to BeamSQL. -Rui On Tue, Nov 13, 2018 at 3:26 AM Etienne Chauchot wrote: > Hi, > I just added some things that were done. > > Etienne > > Le lundi 12 novembre 2018 à 12:22 +, Matthias Baetens a écrit : > > Looks great, thanks for the effort and for inclu

Re: BigQueryIO failure handling for writes

2018-11-16 Thread Rui Wang
To the first issue your are facing: In BeamSQL, we tried to solve the similar requirement. BeamSQL supports reading JSON format message from Pubsub, writing to Bigquery and writing messages that fail to parse in another Pubsub topic. BeamSQL uses the pre-processing transform to parse JSON payload

Re: Suggestion or Alternative simples to read file from FTP

2019-01-03 Thread Rui Wang
For the calling external service, it's described in [1] as a pattern which has a small sample of code instruction. However, why not write a script to prepare the data first and then write a pipeline to process it? 1. https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-ca

How to call Oracle stored proc in JdbcIO

2019-01-25 Thread Rui Wang
Hi Community, There is a stackoverflow question [1] asking how to call Oracle stored proc in Beam via JdbcIO. I know very less on JdbcIO and Oracle, so just help ask here to say if anyone know: does JdbcIO support call stored proc? If there is no such support, I will create a JIRA for it and repl

Re: First steps: Beam SQL over ProtoBuf

2019-02-01 Thread Rui Wang
It's awesome! Proto to/from Row will be a great utility! -Rui On Fri, Feb 1, 2019 at 9:59 AM Alex Van Boxel wrote: > Hi all, > > I got my first test pipeline running with *Beam SQL over ProtoBuf*. I was > so excited I need to shout this to the world ;-) > > > https://github.com/anemos-io/proto-

Re: How to call Oracle stored proc in JdbcIO

2019-02-05 Thread Rui Wang
Assuming this is a missing feature. Created https://jira.apache.org/jira/browse/BEAM-6525 to track it. -Rui On Fri, Jan 25, 2019 at 10:35 AM Rui Wang wrote: > Hi Community, > > There is a stackoverflow question [1] asking how to call Oracle stored > proc in Beam via JdbcIO. I know

Re: How to call Oracle stored proc in JdbcIO

2019-02-05 Thread Rui Wang
omething like this: > > {call procedure_name(?, ?, ?)} > > But then question is what do you expect from it? > > BTW JdbcIO is just a very simple ParDo which you can create your own when > dealing with anything special from oracle. > > Best regards > > JC > > > > Am D

Re: Pipeline manager/scheduler frameworks

2019-02-08 Thread Rui Wang
Apache Airflow is a scheduling system that can help manage data pipelines. I have seen Airflow is used to manage a few thousand hive/spark/presto pipelines. -Rui On Fri, Feb 8, 2019 at 4:08 PM Sridevi Nookala < snook...@parallelwireless.com> wrote: > Hi, > > > Our analytics app has many data pi

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-04-15 Thread Rui Wang
Read from the code and seems like as the logical type "timestamp-millis" means, it's expecting millis in Long as values under this logical type. So if you can convert joda-time to millis before calling "AvroUtils.toBeamRowStrict(genericRecord, this.beamSchema)", your exception will gone. -Rui O

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-04-15 Thread Rui Wang
;connect.version": 1, > "connect.name": > "org.apache.kafka.connect.data.Timestamp" > } > } > > *Attribute type in generated class:* > private org.joda.time.DateTime timeOfRelease; > > > So not sure why this typ

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-04-22 Thread Rui Wang
aTime to > Long* > } > > private static Object convertDateTimeStrict (Long value, Schema.FieldType > fieldType) { > checkTypeName(fieldType.getTypeName(), TypeName.DATETIME, " > dateTime"); > return new Instant(value); <-- *Creates a JodaTime &g