Re: How to set hdfs configuration in flink kubernetes operator?

李琳 Thu, 22 Jun 2023 07:57:33 -0700

Hi Dongwoo,

Thank you very much for your response. It has been very helpful to me.


Your email mentioned the configuration of keytab and krb.file, as well as how 
to configure and write them into HDFS security.

However, if the pod doesn't know the location of the HDFS namenode, it needs to 
load the hdfs-core.xml file into the Flink environment and notify the HDFS 
namenode to write data to HDFS.

In Flink on YARN mode, we can set the "export HADOOP_CONF_DIR" environment 
variable, and the hdfs-core.xml file can be saved in the HADOOP_CONF_DIR. Flink 
can automatically detect the namenode. My main question is how to load the 
hdfs-core.xml file in the Flink Kubernetes operator. If you know how to do 
that, please let me know.

I hope to receive your response via email. Thank you!

________________________________
发件人: Dongwoo Kim <dongwoo7....@gmail.com>
发送时间: Wednesday, June 21, 2023 7:56:52 PM
收件人: 李 琳 <leili...@outlook.com>
抄送: user@flink.apache.org <user@flink.apache.org>
主题: Re: How to set hdfs configuration in flink kubernetes operator?

Hi leilinee,

I'm not sure whether this is the best practice but I would like to share our 
experience about configuring HDFS as checkpoint storage while using flink 
kubernetes operator.
There are two steps.

Step 1) Mount krb5-conf & keytab file to flink kubernetes operator pod

You have to create configmap and secret for krb5.conf and keytab respectively, 
and apply below configs to flink kuberentes operator's values.yaml

operatorVolumeMounts:
  create: true
  data:
    - mountPath: /opt/flink/krb5.conf
      name: krb5-conf
      subPath: krb5.conf
    - mountPath: /opt/flink/{keytab_file}
      name: custom-keytab
      subPath: {keytab_file}
operatorVolumes:
  create: true
  data:
    - configMap:
        name: krb5-configmap
      name: krb5-conf
    - name: custom-keytab
      secret:
        secretName: custom-keytab

Step 2) Configure FlinkDeployment like below in your application

apiVersion: flink.apache.org/v1beta1<http://flink.apache.org/v1beta1>
kind: FlinkDeployment
spec:
  flinkConfiguration:
    state.checkpoint-storage: "filesystem"
    state.checkpoints.dir: "hdfs:{path_for_checkpoint}"
    security.kerberos.login.keytab: "/opt/flink/{keytab_file}"   # Absolute 
path in flink k8s operator pod
    security.kerberos.login.principal: "{principal_name}"
    security.kerberos.relogin.period: "5m"
    security.kerberos.krb5-conf.path: "/opt/flink/krb5.conf"     # Absolute 
path in flink k8s operator pod

I hope this could help your work.

Best regards
dongwoo



2023년 6월 21일 (수) 오후 7:36, 李 琳 
<leili...@outlook.com<mailto:leili...@outlook.com>>님이 작성:
Hi all,

Recently, I have been testing the Flink Kubernetes Operator. In the official 
example, the checkpoint/savepoint path is configured with a file system:


state.savepoints.dir: file:///flink-data/savepoints
state.checkpoints.dir: file:///flink-data/checkpoints
high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: file:///flink-data/ha

However, in our production environment, we use HDFS to store checkpoint data. 
I'm wondering if it's possible to store checkpoint data in the Flink Kubernetes 
Operator as well. If so, could you please guide me on how to set up HDFS 
configuration in the Flink Kubernetes Operator?

I would greatly appreciate any assistance you can provide. Thank you!

Re: How to set hdfs configuration in flink kubernetes operator?

Reply via email to