I will provide more details as to how I was able to use AKV with CSI. Also, I 
looked in the Flink source at the ADLS FileSystem factory and I think despite 
what it says in the docs configuration options prefixed with flink.hadoop won’t 
get forwarded.

You can expose the key vault as Kubernetes secrets that can be exposed as 
environment variables (see the docs I previously sent). Then you can provide a 
KeyProvider class (org.apache.hadoop.fs.azure.KeyProvider) that reads from the 
environment variables. In the flink-conf.yaml you can configure Hadoop to use 
the KeyProvider (all keys with fs.azure prefix are forwarded to Hadoop). A jar 
with the KeyProvider should be included in the same directory as the ADLS 
plugin.

```flink-conf.yaml
fs.azure.account.keyprovider.<StorageAccount>.dfs.core.windows.net: 
<KeyProvider Full Classpath>
fs.azure.account.keyprovider.<StorageAccount>.blob.core.windows.net: 
<KeyProvider Full Classpath>
```

Keep in mind that all the available releases of Flink have one of two bugs 
causing problems reading and/or writing to ADLS, so you will need to re-build 
the ADLS plugin from source by checking out the release commit (probably 
1.17.0) and cherry-picking the bug fix (or wait for 1.17.1 or 1.18.0 which will 
have the fixes).

I’m new to using Flink, and it took me a while to figure out this; but 
hopefully it is helpful to you. I get the sense that few people are using ADLS 
with newer Flink versions or something because the docs and support seem 
half-baked.

Let me know if you make progress using MSI.

Best of luck,

Ivan

From: DEROCCO, CHRISTOPHER<mailto:[email protected]>
Sent: Wednesday, May 17, 2023 6:20 AM
To: Ivan Webber<mailto:[email protected]>
Cc: [email protected]<mailto:[email protected]>; Shammon 
FY<mailto:[email protected]>
Subject: [EXTERNAL] RE: MSI Auth to Azure Storage Account with Flink Apache 
Operator not working

You don't often get email from [email protected]. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Ivan,

How did you use Azure Key Vault with CSI because the flink operator uses a 
configmap and not a Kubernetes secret to create the flink-conf file? I have 
also tried using pod-identities as well as the new workload identity 
(https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview) to no 
avail. It seems to be an issue with configuring 
flink-azure-fs-hadoop-1.16.0.jar with using the flink operator.

From: Ivan Webber <[email protected]>
Sent: Tuesday, May 16, 2023 8:01 PM
To: DEROCCO, CHRISTOPHER <[email protected]>; Shammon FY <[email protected]>
Cc: [email protected]
Subject: RE: MSI Auth to Azure Storage Account with Flink Apache Operator not 
working

When you create your cluster you probably need to ensure the following settings 
are set. I briefly looked into MSI but ended up using Azure Key Vault with 
CSI-storage driver for initial prototype 
(https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/aks/csi-secrets-store-driver.md#upgrade-an-existing-aks-cluster-with-azure-key-vault-provider-for-secrets-store-csi-driver-support<https://urldefense.com/v3/__https:/github.com/MicrosoftDocs/azure-docs/blob/main/articles/aks/csi-secrets-store-driver.md*upgrade-an-existing-aks-cluster-with-azure-key-vault-provider-for-secrets-store-csi-driver-support__;Iw!!BhdT!mHwlf9O2NhRKRzvy33T-EsBBZAXaZUufAyM2f5Vb5TGEsM28sEbfx9QcxOo9iJhwAuVMoPGdXSRlw7kmmSCxtw$>).

For me it helped to think about it as Hadoop configuration.

If you do get MSI working I would be interested in hearing what made it work 
for you, so be sure to update the docs or put it on this thread.

#### To create from scratch
Create an AKS cluster with the required settings.
```bash
# create an AKS cluster with pod-managed identity and Azure CNI
az aks create --resource-group $RESOURCE_GROUP --name $CLUSTER 
--enable-managed-identity --network-plugin azure --enable-pod-identity
```

I hope that is somehow helpful.

Best of luck,

Ivan

From: DEROCCO, CHRISTOPHER<mailto:[email protected]>
Sent: Monday, May 8, 2023 3:40 PM
To: Shammon FY<mailto:[email protected]>
Cc: [email protected]<mailto:[email protected]>
Subject: [EXTERNAL] RE: MSI Auth to Azure Storage Account with Flink Apache 
Operator not working

You don't often get email from [email protected]<mailto:[email protected]>. Learn why 
this is 
important<https://urldefense.com/v3/__https:/aka.ms/LearnAboutSenderIdentification__;!!BhdT!mHwlf9O2NhRKRzvy33T-EsBBZAXaZUufAyM2f5Vb5TGEsM28sEbfx9QcxOo9iJhwAuVMoPGdXSRlw7kwjMvaEQ$>

Shammon,



I’m still having trouble setting the package in my cluster environment. I have 
these lines added to my dockerfile

mkdir ./plugins/azure-fs-hadoop

cp ./opt/flink-azure-fs-hadoop-1.16.0.jar ./plugins/azure-fs-hadoop/

according to the flink docs here 
(https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/filesystems/azure/<https://urldefense.com/v3/__https:/nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/filesystems/azure/__;!!BhdT!mHwlf9O2NhRKRzvy33T-EsBBZAXaZUufAyM2f5Vb5TGEsM28sEbfx9QcxOo9iJhwAuVMoPGdXSRlw7nuTm7jpw$>)
This should enable the flink-azure-fs-hadoop jar in the environment which has 
the classes to enable the adls2 MSI authentication.
I also have the following dependency in my pom to add it to the FAT Jar.

<dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-azure-fs-hadoop</artifactId>
            <version>${flink.version}</version>
</dependency>

However, I still get the class not found error and the flink job is not able to 
authenticate to the azure storage account to store its checkpoints. I’m not 
sure what other configuration pieces I’m missing. Has anyone had successful 
with writing checkpoints to Azure ADLS2gen Storage with managed service 
identity (MSI) authentication.?



From: Shammon FY <[email protected]<mailto:[email protected]>>
Sent: Friday, May 5, 2023 8:38 PM
To: DEROCCO, CHRISTOPHER <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: MSI Auth to Azure Storage Account with Flink Apache Operator not 
working

Hi DEROCCO,

I think you can check the startup command of the job on k8s to see if the jar 
file is in the classpath.

If your job is DataStream, you need to add hadoop azure dependency in your 
project, and if it is an SQL job, you need to include this jar file in your 
Flink release package. Or you can also add this package in your cluster 
environment.

Best,
Shammon FY


On Fri, May 5, 2023 at 10:21 PM DEROCCO, CHRISTOPHER 
<[email protected]<mailto:[email protected]>> wrote:
How can I add the package to the flink job or check if it is there?

From: Shammon FY <[email protected]<mailto:[email protected]>>
Sent: Thursday, May 4, 2023 9:59 PM
To: DEROCCO, CHRISTOPHER <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: MSI Auth to Azure Storage Account with Flink Apache Operator not 
working

Hi DEROCCO,

I think you need to check whether there is a hadoop-azure jar file in the 
classpath of your flink job. From an error message 'Caused by: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider not found.', your flink 
job may be missing this package.

Best,
Shammon FY


On Fri, May 5, 2023 at 4:40 AM DEROCCO, CHRISTOPHER 
<[email protected]<mailto:[email protected]>> wrote:

I receive the error:  Caused by: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider not found.
I’m using flink 1.16 running in Azure Kubernetes using the Flink Apache 
Kubernetes Operator.
I have the following specified in the spec.flinkConfiguration: as per the 
Apache Kubernetes operator documentation.

    fs.azure.createRemoteFileSystemDuringInitialization: "true"
    
fs.azure.account.auth.type.storageaccountname.dfs.core.windows.net<https://urldefense.com/v3/__http:/fs.azure.account.auth.type.storageaccountname.dfs.core.windows.net__;!!BhdT!nslIUVS9K-rzMRvjMFpWqBpcsAIiVPAfG6uroDOiSSQfmARHAQCYweWSe-TmKGHGzKD4HpyjvOZFkA$>:
 OAuth
    
fs.azure.account.oauth.provider.type.<storageaccountname>.dfs.core.windows.net<https://urldefense.com/v3/__http:/dfs.core.windows.net__;!!BhdT!nslIUVS9K-rzMRvjMFpWqBpcsAIiVPAfG6uroDOiSSQfmARHAQCYweWSe-TmKGHGzKD4Hpycm9yrUw$>:
 org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider
    fs.azure.account.oauth2.msi.tenant. 
<storageaccountname>.dfs.core.windows.net<https://urldefense.com/v3/__http:/dfs.core.windows.net__;!!BhdT!nslIUVS9K-rzMRvjMFpWqBpcsAIiVPAfG6uroDOiSSQfmARHAQCYweWSe-TmKGHGzKD4Hpycm9yrUw$>:
 <MY TENANT ID>
    
fs.azure.account.oauth2.client.id<https://urldefense.com/v3/__http:/fs.azure.account.oauth2.client.id__;!!BhdT!nslIUVS9K-rzMRvjMFpWqBpcsAIiVPAfG6uroDOiSSQfmARHAQCYweWSe-TmKGHGzKD4HpwRB0LkWg$>.
 
<storageaccountname>.dfs.core.windows.net<https://urldefense.com/v3/__http:/dfs.core.windows.net__;!!BhdT!nslIUVS9K-rzMRvjMFpWqBpcsAIiVPAfG6uroDOiSSQfmARHAQCYweWSe-TmKGHGzKD4Hpycm9yrUw$>:
 <MY CLIENT ID of VM>
    fs.azure.account.oauth2.client.endpoint. 
<storageaccountname>.dfs.core.windows.net<https://urldefense.com/v3/__http:/dfs.core.windows.net__;!!BhdT!nslIUVS9K-rzMRvjMFpWqBpcsAIiVPAfG6uroDOiSSQfmARHAQCYweWSe-TmKGHGzKD4Hpycm9yrUw$>:
 
https://login.microsoftonline.com/<https://urldefense.com/v3/__https:/login.microsoftonline.com/__;!!BhdT!nslIUVS9K-rzMRvjMFpWqBpcsAIiVPAfG6uroDOiSSQfmARHAQCYweWSe-TmKGHGzKD4HpzeWh7XLg$><MY
 TENANT ID>/oauth2/token

I also have this specified in the container environment variables.
- name: ENABLE_BUILT_IN_PLUGINS
   value: flink-azure-fs-hadoop-1.16.1.jar

I think I’m missing a configuration step because the MsiTokenProvider class is 
not found based on the logs. Any help would be appreciated.


Chris deRocco
Senior – Cybersecurity
Chief Security Office | STORM Threat Analytics

AT&T
Middletown, NJ
Phone: 732-639-9342
Email: [email protected]<mailto:[email protected]>
[cid:[email protected]]



Reply via email to