BeamSQL actually only converts SELECT COUNT(*) query to the Java pipeline that calls Java's builtin Count[1] transform.
Could you implement your pipeline by Count transform to see whether this memory issue still exists? By doing so we could narrow down problem a bit. If using Java directly without going through SQL code path and it works, we will know that BeamSQL does not generate a working pipeline for yoru SELECT COUNT(*) query. [1]: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Count.java#L54 On Sat, Jun 1, 2019 at 9:30 AM dekl...@gmail.com <dekl...@gmail.com> wrote: > Is using Beam SQL just massively more resource-intensive or is there a bug > somewhere here (in my code or elsewhere)? (I'm using Flink runner) > > Here is my code: > https://pastebin.com/nNtc9ZaG > > Here is the error I get (truncated at the end because it's so long and > seemingly repetitive) when I run the SQL transform and my memory/CPU usage > skyrockets: > https://pastebin.com/mywmkCQi > > For example, > > After several early firing triggers, 13-15% CPU, 1.5GB-2GB RAM, everything > working fine: > > `rowStream.apply(Combine.globally(Count.<Row>combineFn()).withoutDefaults()).apply(myPrint());` > > After a single early firing trigger, CPU usage shoots to 90%+, 4.7GB+ RAM, > soon crashes: > `rowStream.apply(SqlTransform.query("SELECT COUNT(*) FROM > PCOLLECTION")).apply(myPrint());` > > I can't imagine this is expected behavior but maybe I'm just ignorant of > how SQL is implemented. > > Most of this code is just getting Schema Registry Avro Kafka messages into > a Row stream. There have been discussions on the mailing list recently > about how to do that. This is the best I could do. If that part is > incorrect I'd be glad to know. > > Any help appreciated. Thank you. > >