Hi Matyas,

Again thank you very much for the information.  I'm a beginner and all
the help is really appreciated.  After some diving into the script
behind s3-artifiact-fetcher I kind of figured it out.  Have an folder
sync'ed into the pod container of the task manager.  Then I guess we should
be able to find the files locally.

At its core what we're trying to do with a project like Apache Hop is sit
on the side of the organizations that use the software since we want to
lower complexity, maintenance costs, learning curves and so on.  Every time
I see a cryptic scarcely documented Yaml file or complicated k8s setup I
need to ask myself in which way I'm sending our users on a week-long
mission.

In a way it makes me appreciate the work Google did with Dataflow a bit
more because they looked at this problem in a holistic way and considered
the platform (GCP), the engine (Dataflow cluster on GCP k8s) and the
executing pipeline (Beam API Jar files) to be different facets of the same
problem.  Jar files get uploaded automatically, the cluster automatically
instantiated, the pipeline run, monitored and scaled automatically and at
the end shut down properly.

I want to figure out a way to do this with Flink as well since I believe,
especially on AWS (even with Spark centric options on EMR, EMR serverless),
that running a pipeline is just too complicated.  Your work really helps!

All the best,
Matt

On Tue, Jun 21, 2022 at 4:53 PM Őrhidi Mátyás <matyas.orh...@gmail.com>
wrote:

> Hi Matt,
>
> I believe an artifact fetcher (e.g
> https://hub.docker.com/r/agiledigital/s3-artifact-fetcher ) + the pod
> template (
> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/pod-template/#pod-template)
> is an elegant way to solve your problem.
>
> The operator uses K8s native integration under the hood:
> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/#application-mode
>  In
> application mode,  the main() method of the application is executed on the
> JobManager, hence we need the jar locally.
>
> You can launch a session cluster (without job spec) on the operator that
> allows submitting jars if you would like to avoid dealing with
> authentication, but the recommended and safe approach is to use
> sessionjobs for this purpose.
>
>
> Cheers,
> Matyas
>
> On Tue, Jun 21, 2022 at 4:03 PM Matt Casters <
> matt.cast...@neotechnology.com> wrote:
>
>> Thank you very much for the help Matyas and Gyula!
>>
>> I just saw a video today where you were presenting the FKO.  Really nice
>> stuff!
>>
>> So I'm guessing we're executing "flink run" at some point on the master
>> and that this is when we need the jar file to be local?
>> Am I right in assuming that this happens after the flink cluster in
>> question was started, as part of the job execution?
>>
>> On the one hand I agree with the underlying idea that authentication and
>> security should not be a responsibility of the operator.   On the other
>> hand I could add a flink-s3 driver but then I'd also have to configure it
>> and so on and it's just hard to get that configuration to be really clean.
>>
>> Do we have some service running on the flink cluster which would allow us
>> to post/copy files from the client (running kubectl) to the master?  If so,
>> could we add an option to the job specification to that effect?  Just
>> brainstorming ;-) (and forking apache/flink-kubernetes-operator)
>>
>> All the best,
>> Matt
>>
>> On Tue, Jun 21, 2022 at 2:52 PM Őrhidi Mátyás <matyas.orh...@gmail.com>
>> wrote:
>>
>>> Hi Matt,
>>>
>>> - In FlinkDeployments you can utilize an init container to download your
>>> artifact onto a shared volume, then you can refer to it as local:/.. from
>>> the main container. FlinkDeployments comes with pod template support
>>> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/pod-template/#pod-template
>>>
>>> - FlinkSessionJobs comes with an artifact fetcher, but it may need some
>>> tweaking to make it work on your environment:
>>>
>>> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/overview/#flinksessionjob-spec-overview
>>>
>>> I hope it helps, let us know if you have further questions.
>>>
>>> Cheers,
>>> Matyas
>>>
>>>
>>>
>>> On Tue, Jun 21, 2022 at 2:35 PM Matt Casters <
>>> matt.cast...@neotechnology.com> wrote:
>>>
>>>> Hi Flink team!
>>>>
>>>> I'm interested in getting the new Flink Kubernetes Operator to work on
>>>> AWS EKS.  Following the documentation I got pretty far.  However, when
>>>> trying to run a job I got the following error:
>>>>
>>>> Only "local" is supported as schema for application mode. This assumes t
>>>>> hat the jar is located in the image, not the Flink client. An example
>>>>> of such path is: local:///opt/flink/examples/streaming/WindowJoin.jar
>>>>
>>>>
>>>>  I have an Apache Hop/Beam fat jar capable of running the Flink
>>>> pipeline in my yml file:
>>>>
>>>> jarURI: s3://hop-eks/hop/hop-2.1.0-fat.jar
>>>>
>>>> So how could I go about getting the fat jar in a desired location for
>>>> the operator?
>>>>
>>>> Getting this to work would be really cool for both short and long-lived
>>>> pipelines in the service of all sorts of data integration work.  It would
>>>> do away with the complexity of setting up and maintaining your own Flink
>>>> cluster.
>>>>
>>>> Thanks in advance!
>>>>
>>>> All the best,
>>>>
>>>> Matt (mcasters, Apache Hop PMC)
>>>>
>>>>

Reply via email to