Hi Shlomi,

Unfortunately, those cited references are about as much as we have
available. I acknowledge that they are not very comprehensive -- so I'll
try to share some insight.
Related to your sample, I believe there are relevant missing pieces, as I
am note sure what the input looks like (bounded / unbounded, how the
triggering looks like if unbounded) or how KVs became Rows.
But regarding ResourceHints, they are applicable to any PTransform, so in
your example, you can apply it directly when composing
AvroIO.parseFilesGenericRecords:

.apply("Match file names", FileIO.*matchAll*())
.apply("Read Avro files", FileIO.*readMatches*())
*.apply**(**"Parse Avro files into GenericRecord"**, **AvroIO**.*
*parseFilesGenericRecords**(**new **CustomerTransformFn**()*
*)        **.withCoder**(**KvCoder**.**of**(**Customer**.**keyCoder**()**, *
*Customer**.**valueCoder**()**)**)*

        .setResourceHints(ResourceHints.create().withMinRam("50GB")*)*

.apply("Chunk customer", GroupIntoBatches.<Row, Row>*ofSize*(size)
        .withMaxBufferingDuration(Duration.*standardSeconds*(duration)))


Accelerators are mostly related to usage of GPUs (
https://cloud.google.com/dataflow/docs/guides/using-gpus) that may overcome
CPUs in certain scenarios (such as graphics or ML workloads that require
highly parallelization/vectorization), but I don't think those transforms
mentioned here are ready to leverage them.

Besides providing good resource hints so the workers are sized accordingly,
I'd suggest analyzing which steps are being fused together (please check
https://cloud.google.com/dataflow/docs/guides/right-fitting#right_fitting_and_fusion),
as it may be the case that you could separate file discovery / matching
(again, without analyzing the missing parts of the graph, it may be hard to
make good suggestions).


Best,
Bruno

On Mon, Feb 6, 2023 at 2:50 PM Ahmet Altay <al...@google.com> wrote:

> Adding @John Casey <johnjca...@google.com> @Bruno Volpato
> <bvolp...@google.com> - who might be able to point to relevant docs.
>
> On Sat, Feb 4, 2023 at 11:59 AM Shlomi Elbaz <shlom...@optimove.com>
> wrote:
>
>> Hello All,
>>
>>
>>
>> We developed a service with Apache Beam where we read an Avro file that
>> locate in GCP bucket,
>>
>> We had a load and benchmark tests, during the pipeline we got a
>> bottleneck and *out-of-memory* issues in the stage where the service
>> accesses the Avro’s by AvroIO.*parseFilesGenericRecords*
>>
>>
>>
>> The issue happened in highlight part:
>>
>> .apply("Match file names", FileIO.*matchAll*())
>> .apply("Read Avro files", FileIO.*readMatches*())
>> *.apply**(**"Parse Avro files into GenericRecord"**, **AvroIO**.*
>> *parseFilesGenericRecords**(**new **CustomerTransformFn**()*
>> *)         **.withCoder**(**KvCoder**.**of**(**Customer**.**keyCoder**()**,
>> **Customer**.**valueCoder**()**)**)**)*
>> .apply("Chunk customer", GroupIntoBatches.<Row, Row>*ofSize*(size)
>>         .withMaxBufferingDuration(Duration.*standardSeconds*(duration)))
>>
>>
>>
>> Issues we saw a tutorial regarding resource-hints in Apache Beam website,
>> but there is no examples/information how to use with *AvroIO**.*
>> *parseFilesGenericRecords*.
>>
>> https://beam.apache.org/documentation/runtime/resource-hints/
>>
>>
>>
>> is there more information or examples where we can read about ResourceHints
>> and Accelerator’s?
>>
>>
>>
>> Also, would you please recommend us for optimal settings of using
>> ResourceHints?
>>
>>
>>
>> The additional tutorials that we rely on:
>>
>> https://www.youtube.com/watch?v=9fc2MNQHQ2s
>>
>> https://cloud.google.com/dataflow/docs/guides/right-fitting
>>
>>
>> https://cloud.google.com/blog/products/data-analytics/introducing-vertical-autoscaling-in-dataflow-prime
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Shlomi Elbaz,
>>
>>
>>
>>
>>
>> ---
>> Optimove Named a Leader in the 2022 IDC MarketScape for Retail CDP -
>> <https://www.optimove.com/lp/optimove-leader-forrester-wave2021?utm_source=signature&utm_medium=email&utm_campaign=forrester2021_signature&utm_id=Forrester2021>
>> <https://www.optimove.com/blog/optimove-recognized-as-a-leader-in-cross-channel-campaign-management-by-forrester>Download
>> report here
>> <https://www.optimove.com/blog/optimove-named-a-leader-in-the-2022-idc-marketscape-for-retail-cdp?utm_campaign=Tech_org&utm_source=Email&utm_medium=Signature>
>>
>> Say Hello to Optitext - Optimove Adds Native SMS Capabilities-
>> <https://www.optimove.com/blog/gartner-peer-insights-optimove-receives-95-willingness-to-recommend-by-clients>
>> <https://www.optimove.com/blog/optimove-acquires-advanced-mobile-marketing-platform-kumulos-heres-whats-in-it-for-you>
>> <https://optimove.com/blog/optimove-acquires-advanced-mobile-marketing-platform-kumulos-heres-whats-in-it-for-you?utm_source=signature&utm_medium=email&utm_campaign=kumulos_signature&utm_id=kumulos22>read
>> about it here
>> <https://www.optimove.com/blog/optimove-acquires-real-time-personalization-platform-graphyte?utm_campaign=Tech_org&utm_medium=Signature&utm_source=Email>
>>
>> ---
>>
>> *Shlomi Elbaz*
>> Fullstack Developer
>>
>>
>> <https://www.optimove.com/?utm_source=emailSig&utm_medium=email&utm_campaign=sig-Logo>
>> CRM Journeys, Mapped by AI
>>
>> Connect with us on LinkedIn <https://www.linkedin.com/company/optimove> |
>> Twitter <https://twitter.com/optimove> | Facebook
>> <https://www.facebook.com/optimove> | Youtube
>> <https://www.youtube.com/optimove>
>> Read our thoughts on the Optimove Blog
>> <https://www.optimove.com/blog?utm_source=emailSig&utm_medium=email&utm_campaign=sig-Blog>
>>
>>
>>

Reply via email to