Hello, I have a Flink 1.10 job which runs in AWS EMR, checkpointing to S3a as well as writing output to S3a using StreamingFileSink. The job runs well until I add the Java Hadoop properties: /-Dfs.s3a.acl.default= BucketOwnerFullControl/. Since after that, the checkpoint process fails to complete.
/Caused by: org.xml.sax.SAXException: SAX2 driver class org.apache.xerces.parsers.SAXParser not found/ I tried to add a jar file with that class (https://mvnrepository.com/artifact/xerces/xercesImpl/2.12.0) to my flink/lib/ directory, then got the same error but different stacktrace: /Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class org.apache.xerces.parsers.SAXParser not found/ This seems to be a dependencies conflict, but I couldn't track its root. In my IDE I didn't have any dependencies issue, while I couldn't find SAXParser in the dependencies tree. *Here is the stacktrace when the jar file is not there:* /Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://mybucket/checkpoint/a9502b1c81ced10dfcbb21ac43f03e61/chk-2/41f51c24-60fd-474b-9f89-3d65d87037c7: com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create an XMLReader: Couldn't initialize a SAX driver to create an XMLReader at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145) at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088) at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038) at org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141) at org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37) at org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164) at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126) at org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61) at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356) ... 17 more Caused by: com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create an XMLReader at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118) at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87) at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31) at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266) at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876) at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262) at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317) at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280) at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255) at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223) ... 29 more Caused by: org.xml.sax.SAXException: SAX2 driver class org.apache.xerces.parsers.SAXParser not found java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser at org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230) at org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115) ... 52 more/ *And here is the stacktrace when that jar file added to /lib/ folder* /Could not materialize checkpoint 1 for operator Source: <my_operators_chain> (1/2). at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:1238) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1180) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: Could not open output stream for state backend at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:461) at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143) ... 3 common frames omitted Caused by: org.apache.flink.util.SerializedThrowable: Could not open output stream for state backend at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:367) at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flush(FsCheckpointStreamFactory.java:234) at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:209) at java.io.DataOutputStream.write(DataOutputStream.java:107) at java.io.FilterOutputStream.write(FilterOutputStream.java:97) at org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:78) at org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:33) at org.apache.flink.runtime.state.PartitionableListState.write(PartitionableListState.java:116) at org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:155) at org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:108) at org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:75) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458) ... 5 common frames omitted Caused by: org.apache.flink.util.SerializedThrowable: getFileStatus on s3a://mybucket/checkpoint/d8ed6d1524169c942bbc455d2c519a39/chk-1/7f2d8fd6-4f3f-4da7-9ffd-5a7e3ea8e7e3: com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create an XMLReader: Couldn't initialize a SAX driver to create an XMLReader at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145) at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088) at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038) at org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141) at org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37) at org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164) at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126) at org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61) at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356) ... 17 common frames omitted Caused by: org.apache.flink.util.SerializedThrowable: Couldn't initialize a SAX driver to create an XMLReader at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118) at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87) at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31) at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266) at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876) at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262) at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317) at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280) at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255) at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223) ... 29 common frames omitted Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class org.apache.xerces.parsers.SAXParser not found at org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230) at org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115) ... 52 common frames omitted Caused by: org.apache.flink.util.SerializedThrowable: org.apache.xerces.parsers.SAXParser at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at org.apache.flink.core.plugin.PluginLoader$PluginClassLoader.loadClass(PluginLoader.java:149) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.xml.sax.helpers.NewInstance.newInstance(NewInstance.java:82) at org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:228) ... 54 common frames omitted / -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/