Yes, hadoop-azure and azure-storage are both on the classpath. hadoop-azure is 
declared as a dependency in my build.sbt file and I’m using assembly to copy 
all of the dependencies into a single jar which is submitted to Flink. I 
suspect the wasb format needs to be explicitly registered with Hadoop. I think 
that’s accomplished by inserting the following into core-site.xml (I’m not that 
familiar with Hadoop):


<property>
  <name>fs.AbstractFileSystem.wasb.Impl</name>
  <value>org.apache.hadoop.fs.azure.Wasb</value>
</property>

However, I’m wondering if it’s possible to achieve the same result from within 
the job since it’s difficult to modify files on the task manager in our 
configuration.

On Aug 29, 2017, at 5:32 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

Was hadoop-azure jar on the classpath ?

Please also see the following from 
https://hadoop.apache.org/docs/current/hadoop-azure/index.html<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fcurrent%2Fhadoop-azure%2Findex.html&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=EWljUjSiHmqNxdf221hJkcXB%2FMce5GBiMV9KZW1D5EQ%3D&reserved=0>
 :

The built jar file, named hadoop-azure.jar, also declares transitive 
dependencies on the additional artifacts it requires, notably the Azure Storage 
SDK for Java.

On Tue, Aug 29, 2017 at 3:24 PM, Joshua Griffith 
<jgriff...@campuslabs.com<mailto:jgriff...@campuslabs.com>> wrote:
I’m attempting to write to Azure Blob Storage using Flink's FileOutputFormat. 
I’ve included 
hadoop-azure<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fcurrent%2Fhadoop-azure%2Findex.html%23Configuring_Credentials&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=hiUySJWVf7DJwywWtXFu4hm3%2FUc0DKQ6LA9DvORggfM%3D&reserved=0>
 within the jar I submit to Flink and configured the paths to be prefixed with 
wasb://{CONTAINERNAME}@{ACCOUNTNAME}.blob.core.windows.net/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblob.core.windows.net%2F&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=1dEvfsEAuAAQBNfsHMM8b1MNxI7oDdac7%2BO7DiIYZGg%3D&reserved=0>.

When the file output format initializes, I get the following error: ERROR ROOT 
- Run 4bfb099a-8d07-11e7-8d3a-fb4d07562cc0 failed with error: 
'org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Cannot initialize task 'DataSink (/out/data)': No file system 
found with scheme wasb, referenced in file URI 
'wasb://blob@{ACCOUNTNAME}.blob.core.windows.net/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblob.core.windows.net%2F&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=1dEvfsEAuAAQBNfsHMM8b1MNxI7oDdac7%2BO7DiIYZGg%3D&reserved=0>out/data’.

Can I register the format programmatically from within the job (without putting 
credentials into a core-site.xml file on the task manager)? Can I still use 
Flink’s FileOutputFormat or should I be using a Hadoop OutputFormat?

Thanks,

Joshua


Reply via email to