Yes, hadoop-azure and azure-storage are both on the classpath. hadoop-azure is declared as a dependency in my build.sbt file and I’m using assembly to copy all of the dependencies into a single jar which is submitted to Flink. I suspect the wasb format needs to be explicitly registered with Hadoop. I think that’s accomplished by inserting the following into core-site.xml (I’m not that familiar with Hadoop):
<property> <name>fs.AbstractFileSystem.wasb.Impl</name> <value>org.apache.hadoop.fs.azure.Wasb</value> </property> However, I’m wondering if it’s possible to achieve the same result from within the job since it’s difficult to modify files on the task manager in our configuration. On Aug 29, 2017, at 5:32 PM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote: Was hadoop-azure jar on the classpath ? Please also see the following from https://hadoop.apache.org/docs/current/hadoop-azure/index.html<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fcurrent%2Fhadoop-azure%2Findex.html&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=EWljUjSiHmqNxdf221hJkcXB%2FMce5GBiMV9KZW1D5EQ%3D&reserved=0> : The built jar file, named hadoop-azure.jar, also declares transitive dependencies on the additional artifacts it requires, notably the Azure Storage SDK for Java. On Tue, Aug 29, 2017 at 3:24 PM, Joshua Griffith <jgriff...@campuslabs.com<mailto:jgriff...@campuslabs.com>> wrote: I’m attempting to write to Azure Blob Storage using Flink's FileOutputFormat. I’ve included hadoop-azure<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fcurrent%2Fhadoop-azure%2Findex.html%23Configuring_Credentials&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=hiUySJWVf7DJwywWtXFu4hm3%2FUc0DKQ6LA9DvORggfM%3D&reserved=0> within the jar I submit to Flink and configured the paths to be prefixed with wasb://{CONTAINERNAME}@{ACCOUNTNAME}.blob.core.windows.net/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblob.core.windows.net%2F&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=1dEvfsEAuAAQBNfsHMM8b1MNxI7oDdac7%2BO7DiIYZGg%3D&reserved=0>. When the file output format initializes, I get the following error: ERROR ROOT - Run 4bfb099a-8d07-11e7-8d3a-fb4d07562cc0 failed with error: 'org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Cannot initialize task 'DataSink (/out/data)': No file system found with scheme wasb, referenced in file URI 'wasb://blob@{ACCOUNTNAME}.blob.core.windows.net/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblob.core.windows.net%2F&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=1dEvfsEAuAAQBNfsHMM8b1MNxI7oDdac7%2BO7DiIYZGg%3D&reserved=0>out/data’. Can I register the format programmatically from within the job (without putting credentials into a core-site.xml file on the task manager)? Can I still use Flink’s FileOutputFormat or should I be using a Hadoop OutputFormat? Thanks, Joshua