Hi Shlomi, If you intend to make use of GPU's for the purposes of Machine Learning Inference, the following resources may also be of interest to you:
RunInference transform information: https://beam.apache.org/documentation/sdks/python-machine-learning/ You may also want to have a look at: https://cloud.google.com/dataflow/docs/machine-learning Cheers Reza On Mon, 6 Feb 2023 at 13:24, Bruno Volpato via dev <dev@beam.apache.org> wrote: > Hi Shlomi, > > Unfortunately, those cited references are about as much as we have > available. I acknowledge that they are not very comprehensive -- so I'll > try to share some insight. > > Related to your sample, I believe there are relevant missing pieces, as I > am note sure what the input looks like (bounded / unbounded, how the > triggering looks like if unbounded) or how KVs became Rows. > But regarding ResourceHints, they are applicable to any PTransform, so in > your example, you can apply it directly when composing > AvroIO.parseFilesGenericRecords: > > .apply("Match file names", FileIO.*matchAll*()) > .apply("Read Avro files", FileIO.*readMatches*()) > *.apply**(**"Parse Avro files into GenericRecord"**, **AvroIO**.* > *parseFilesGenericRecords**(**new **CustomerTransformFn**()* > *) **.withCoder**(**KvCoder**.**of**(**Customer**.**keyCoder**()* > *, **Customer**.**valueCoder**()**)**)* > > .setResourceHints(ResourceHints.create().withMinRam("50GB")*)* > > .apply("Chunk customer", GroupIntoBatches.<Row, Row>*ofSize*(size) > .withMaxBufferingDuration(Duration.*standardSeconds*(duration))) > > > Accelerators are mostly related to usage of GPUs ( > https://cloud.google.com/dataflow/docs/guides/using-gpus) that may > overcome CPUs in certain scenarios (such as graphics or ML workloads that > require highly parallelization/vectorization), but I don't think those > transforms mentioned here are ready to leverage them. > > Besides providing good resource hints so the workers are sized > accordingly, I'd suggest analyzing which steps are being fused together > (please check > https://cloud.google.com/dataflow/docs/guides/right-fitting#right_fitting_and_fusion), > as it may be the case that you could separate file discovery / matching > (again, without analyzing the missing parts of the graph, it may be hard to > make good suggestions). > > > Best, > Bruno > > On Mon, Feb 6, 2023 at 2:50 PM Ahmet Altay <al...@google.com> wrote: > >> Adding @John Casey <johnjca...@google.com> @Bruno Volpato >> <bvolp...@google.com> - who might be able to point to relevant docs. >> >> On Sat, Feb 4, 2023 at 11:59 AM Shlomi Elbaz <shlom...@optimove.com> >> wrote: >> >>> Hello All, >>> >>> >>> >>> We developed a service with Apache Beam where we read an Avro file that >>> locate in GCP bucket, >>> >>> We had a load and benchmark tests, during the pipeline we got a >>> bottleneck and *out-of-memory* issues in the stage where the service >>> accesses the Avro’s by AvroIO.*parseFilesGenericRecords* >>> >>> >>> >>> The issue happened in highlight part: >>> >>> .apply("Match file names", FileIO.*matchAll*()) >>> .apply("Read Avro files", FileIO.*readMatches*()) >>> *.apply**(**"Parse Avro files into GenericRecord"**, **AvroIO**.* >>> *parseFilesGenericRecords**(**new **CustomerTransformFn**()* >>> *) **.withCoder**(**KvCoder**.**of**(**Customer**.**keyCoder* >>> *()**, **Customer**.**valueCoder**()**)**)**)* >>> .apply("Chunk customer", GroupIntoBatches.<Row, Row>*ofSize*(size) >>> .withMaxBufferingDuration(Duration.*standardSeconds*(duration))) >>> >>> >>> >>> Issues we saw a tutorial regarding resource-hints in Apache Beam >>> website, but there is no examples/information how to use with *AvroIO* >>> *.**parseFilesGenericRecords*. >>> >>> https://beam.apache.org/documentation/runtime/resource-hints/ >>> >>> >>> >>> is there more information or examples where we can read about ResourceHints >>> and Accelerator’s? >>> >>> >>> >>> Also, would you please recommend us for optimal settings of using >>> ResourceHints? >>> >>> >>> >>> The additional tutorials that we rely on: >>> >>> https://www.youtube.com/watch?v=9fc2MNQHQ2s >>> >>> https://cloud.google.com/dataflow/docs/guides/right-fitting >>> >>> >>> https://cloud.google.com/blog/products/data-analytics/introducing-vertical-autoscaling-in-dataflow-prime >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Shlomi Elbaz, >>> >>> >>> >>> >>> >>> --- >>> Optimove Named a Leader in the 2022 IDC MarketScape for Retail CDP - >>> <https://www.optimove.com/lp/optimove-leader-forrester-wave2021?utm_source=signature&utm_medium=email&utm_campaign=forrester2021_signature&utm_id=Forrester2021> >>> <https://www.optimove.com/blog/optimove-recognized-as-a-leader-in-cross-channel-campaign-management-by-forrester>Download >>> report here >>> <https://www.optimove.com/blog/optimove-named-a-leader-in-the-2022-idc-marketscape-for-retail-cdp?utm_campaign=Tech_org&utm_source=Email&utm_medium=Signature> >>> >>> Say Hello to Optitext - Optimove Adds Native SMS Capabilities- >>> <https://www.optimove.com/blog/gartner-peer-insights-optimove-receives-95-willingness-to-recommend-by-clients> >>> <https://www.optimove.com/blog/optimove-acquires-advanced-mobile-marketing-platform-kumulos-heres-whats-in-it-for-you> >>> <https://optimove.com/blog/optimove-acquires-advanced-mobile-marketing-platform-kumulos-heres-whats-in-it-for-you?utm_source=signature&utm_medium=email&utm_campaign=kumulos_signature&utm_id=kumulos22>read >>> about it here >>> <https://www.optimove.com/blog/optimove-acquires-real-time-personalization-platform-graphyte?utm_campaign=Tech_org&utm_medium=Signature&utm_source=Email> >>> >>> --- >>> >>> *Shlomi Elbaz* >>> Fullstack Developer >>> >>> >>> <https://www.optimove.com/?utm_source=emailSig&utm_medium=email&utm_campaign=sig-Logo> >>> CRM Journeys, Mapped by AI >>> >>> Connect with us on LinkedIn <https://www.linkedin.com/company/optimove> >>> | Twitter <https://twitter.com/optimove> | Facebook >>> <https://www.facebook.com/optimove> | Youtube >>> <https://www.youtube.com/optimove> >>> Read our thoughts on the Optimove Blog >>> <https://www.optimove.com/blog?utm_source=emailSig&utm_medium=email&utm_campaign=sig-Blog> >>> >>> >>>