Re: flink-kubernetes-operator: image entrypoint misbehaves due to inability to write

Gyula Fóra Thu, 01 Dec 2022 08:34:16 -0800

As I also mentioned in the email, this is on our roadmap for the operator
but we have not implemented it yet because this feature only became
available as of Flink 1.16.


Ideally in the operator FlinkDeployment spec.flinkConfiguration section the
user should be able to use env vars if this is added.

Gyula

On Thu, Dec 1, 2022 at 5:18 PM Andrew Otto <o...@wikimedia.org> wrote:

> > Andrew please see my previous response, that covers the secrets case.
> > kubernetes.jobmanager.entrypoint.args: -D
> datadog.secret.conf=$MY_SECRET_ENV
>
> This way^?  Ya that makes sense.  It'd be nice if there was a way to get
> Secrets into the values used for rendering flink-conf.yaml too, so the
> confs will be all in the same place.
>
>
>
>
>
> On Thu, Dec 1, 2022 at 9:30 AM Gyula Fóra <gyula.f...@gmail.com> wrote:
>
>> Andrew please see my previous response, that covers the secrets case.
>>
>> Gyula
>>
>> On Thu, Dec 1, 2022 at 2:54 PM Andrew Otto <o...@wikimedia.org> wrote:
>>
>>> > several failures to write into $FLINK_HOME/conf/.
>>> I'm working on
>>> <https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/858356/>
>>> building Flink and flink-kubernetes-operator images for the Wikimedia
>>> Foundation, and I found this strange as well.  It makes sense in a docker /
>>> docker-compose only environment, but in k8s where you have ConfigMap
>>> responsible for flink-conf.yaml, and (also logs all going to the console,
>>> not FLINK_HOME/log), I'd prefer if the image was not modified by the
>>> ENTRYPOINT.
>>>
>>> I believe that for flink-kubernetes-operator, the docker-entrypoint.sh
>>> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/docker-entrypoint.sh>
>>> provided by flink-docker is not really needed.  It seems to be written more
>>> for deployments outside of kubernetes.
>>>  flink-kubernetes-operator never calls the built in subcommands (e.g.
>>> standalone-job), and always runs in 'pass-through' mode, just execing the
>>> args passed to it.  At WMF we build
>>> <https://doc.wikimedia.org/docker-pkg/> our own images, so I'm planning
>>> on removing all of the stuff in ENTRYPOINTs that mangles the image.
>>> Anything that I might want to keep from docker-entrypoint.sh (like enabling
>>> jemoalloc
>>> <https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/858356/6/images/flink/Dockerfile.template#73>)
>>> I should be able to do in the Dockerfile at image creation time.
>>>
>>> >  want to set an API key as part of the flink-conf.yaml file, but we
>>> don't want it to be persisted in Kubernetes or in our version control
>>> I personally am still pretty green at k8s, but would using kubernetes
>>> Secrets
>>> <https://kubernetes.io/docs/concepts/configuration/secret/#use-case-secret-visible-to-one-container-in-a-pod>
>>> work for your use case? I know we use them at WMF, but from a quick glance
>>> I'm not sure how to combine them in flink-kubernetes-operator's ConfigMap
>>> that renders flink-conf.yaml, but I feel like there should be a way.
>>>
>>>
>>>
>>>
>>> On Wed, Nov 30, 2022 at 4:59 PM Gyula Fóra <gyula.f...@gmail.com> wrote:
>>>
>>>> Hi Lucas!
>>>>
>>>> The Flink kubernetes integration itself is responsible for mounting the
>>>> configmap and overwriting the entrypoint not the operator. Therefore this
>>>> is not something we can easily change from the operator side. However I
>>>> think we are looking at the problem from the wrong side and there may be a
>>>> solution already :)
>>>>
>>>> Ideally what you want is ENV replacement in Flink configuration. This
>>>> is not something that the Flink community has added yet unfortunately but
>>>> we have it on our radar for the operator at least (
>>>> https://issues.apache.org/jira/browse/FLINK-27491). It will probably
>>>> be added in the next 1.4.0 version.
>>>>
>>>> This will be possible from Flink 1.16 which introduced a small feature
>>>> that allows us to inject parameters to the kubernetes entrypoints:
>>>> https://issues.apache.org/jira/browse/FLINK-29123
>>>>
>>>> https://github.com/apache/flink/commit/c37643031dca2e6d4c299c0d704081a8bffece1d
>>>>
>>>> While it's not implemented in the operator yet, you could try setting
>>>> the following config in Flink 1.16.0:
>>>> kubernetes.jobmanager.entrypoint.args: -D
>>>> datadog.secret.conf=$MY_SECRET_ENV
>>>> kubernetes.taskmanager.entrypoint.args: -D
>>>> datadog.secret.conf=$MY_SECRET_ENV
>>>>
>>>> If you use this configuration together with the default native mode in
>>>> the operator, it should work I believe.
>>>>
>>>> Please try and let me know!
>>>> Gyula
>>>>
>>>> On Wed, Nov 30, 2022 at 10:36 PM Lucas Caparelli <
>>>> lucas.capare...@gympass.com> wrote:
>>>>
>>>>> Hello folks,
>>>>>
>>>>> Not sure if this is the best list for this, sorry if it isn't. I'd
>>>>> appreciate some pointers :-)
>>>>>
>>>>> When using flink-kubernetes-operator [1], docker-entrypoint.sh [2]
>>>>> goes through several failures to write into $FLINK_HOME/conf/. We believe
>>>>> this is due to this volume being mounted from a ConfigMap, which means 
>>>>> it's
>>>>> read-only.
>>>>>
>>>>> This has been reported in the past in GCP's operator, but I was unable
>>>>> to find any kind of resolution for it:
>>>>> https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/213
>>>>>
>>>>> In our use case, we want to set an API key as part of the
>>>>> flink-conf.yaml file, but we don't want it to be persisted in Kubernetes 
>>>>> or
>>>>> in our version control, since it's sensitive data. This API Key is used by
>>>>> Flink to report metrics to Datadog [3].
>>>>>
>>>>> We have automation in place which allows us to accomplish this by
>>>>> setting environment variables pointing to a path in our secret manager,
>>>>> which only gets injected during runtime. That part is working fine.
>>>>>
>>>>> However, we're trying to inject this secret using the FLINK_PROPERTIES
>>>>> variable, which is appended [4] to the flink-conf.yaml file in the
>>>>> docker-entrypoint script, which fails due to the filesystem where the file
>>>>> is being read-only.
>>>>>
>>>>> We attempted working around this in 2 different ways:
>>>>>
>>>>>   - providing our own .spec.containers[0].command, where we copied
>>>>> over /opt/flink to /tmp/flink and set FLINK_HOME=/tmp/flink. This did not
>>>>> work because the operator overwrote it and replaced it with its original
>>>>> command/args;
>>>>>   - providing an initContainer sharing the volumes so it could make
>>>>> the copy without being overridden by the operator's command/args. This did
>>>>> not work because the initContainer present in the spec never makes it to
>>>>> the resulting Deployment, it seems the operator ignores it.
>>>>>
>>>>> We have some questions:
>>>>>
>>>>> 1. Is this overriding of the pod template present in FlinkDeployment
>>>>> intentional? That is, should our custom command/args and initContainers
>>>>> have been overwritten? If so, I find it a bit confusing that these fields
>>>>> are present and available for use at all.
>>>>> 2. Since the ConfigMap volume will always be mounted as read-only, it
>>>>> seems to me there's some adjustments to be made in order for this script 
>>>>> to
>>>>> work correctly. Do you think it would make sense for the script to copy
>>>>> over contents from the ConfigMap volume to a writable directory during
>>>>> initialization, and then use this copy for any subsequent operation?
>>>>> Perhaps copying over to $FLINK_HOME, which the user could set themselves,
>>>>> maybe even with a sane default which wouldn't fail on writes (eg
>>>>> /tmp/flink).
>>>>>
>>>>> Thanks in advance for your attention and hard work on the project!
>>>>>
>>>>> [1]: https://github.com/apache/flink-kubernetes-operator
>>>>> [2]:
>>>>> https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/docker-entrypoint.sh
>>>>> [3]: https://docs.datadoghq.com/integrations/flink/
>>>>> [4]:
>>>>> https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/docker-entrypoint.sh#L86-L88
>>>>>
>>>>

Re: flink-kubernetes-operator: image entrypoint misbehaves due to inability to write

Reply via email to