Hi Pavel,

Thanks for your response! I took a look at running Beam on Kinesis
(analytics), as it is the AWS-recommended way to run Beam jobs. It seems
like it doesn't work with the portable runner model. Our project is a
daemon running in a kubernetes cluster that has Beam code running as part
of certain tasks, so I'm not exactly sure how that would work with Kinesis
as I don't see a way to grab the master URL (and I'm not entirely sure if
the flink image being run by Kinesis would work for Beam). I'd really like
to avoid using any of the non-portable runners if possible.

That's part of why I am looking at Spark (although flink looks fairly
similar): EKS supports autoscaling and other features dataflow does. I
don't want to make a huge divergence between the GCP and AWS behaviour if
possible. It seems possible, but the docs for the other runners are a bit
ambiguous on exactly how much of submitting jobs is handled by the runner.

On Wed, Jun 21, 2023 at 12:28 PM Pavel Solomin <p.o.solo...@gmail.com>
wrote:

> Hello!
>
> > to also run on AWS
>
> > A spark cluster on EKS seems the closest analog
>
> There's another way of running Beam apps in AWS -
> https://aws.amazon.com/kinesis/data-analytics/ - which is basically
> "serverless" Flink. It says Kinesis, but you can run any Flink / Beam job
> there, you don't have to use Kinesis streams. I used KDA in multiple
> projects so far, works OK. FlinkRunner also seems to have more docs as far
> as I can see.
>
> Here's a pom.xml example:
> https://github.com/aws-samples/amazon-kinesis-data-analytics-examples/blob/master/Beam/pom.xml
>
> Best Regards,
> Pavel Solomin
>
> Tel: +351 962 950 692 <+351%20962%20950%20692> | Skype: pavel_solomin |
> Linkedin <https://www.linkedin.com/in/pavelsolomin>
>
>
>
>
>
> On Wed, 21 Jun 2023 at 16:31, Jon Molle via user <user@beam.apache.org>
> wrote:
>
>> Hi,
>>
>> I've been looking at the Spark Portable Runner docs, specifically Java
>> when possible, and I'm a little confused about the organization. The docs
>> seem to say that the JobService both submits the code to the linked spark
>> cluster (described in the master url) and requires you to run a
>> spark-submit command after on whatever artifacts it builds.
>>
>> Unfortunately I'm not that familiar with Spark generally, so I'm probably
>> misunderstanding more here, but the job server images either totally lack
>> documentation or just repeat the spark runner page in the main docs.
>>
>> For context, I'm trying to port some code that we're currently running on
>> a Dataflow runner (on GCP) to also run on AWS. A spark cluster on EKS
>> (either self-managed or potentially through EMR, but likely not based on
>> what I am reading into the docs and some brief testing) seems the closest
>> analog.
>>
>> The new Tour does the same thing, in addition to only really having
>> examples for python and a few more typos. I haven't found any existing
>> questions like this elsewhere, so I assume that I'm just missing something
>> that should be obvious.
>>
>> Thanks for your time.
>>
>

Reply via email to