Hi Averell, This is a known bug [1] caused by the used AWS S3 library not respecting the classloader [2].
The best solution is to upgrade to 1.10.1 (or take the s3-hadoop jar from 1.10.1). Don't try to put Xerces manually anywhere. [1] https://issues.apache.org/jira/browse/FLINK-16014 [2] https://github.com/aws/aws-sdk-java/issues/2242 On Thu, Aug 27, 2020 at 4:34 PM Robert Metzger <rmetz...@apache.org> wrote: > Hi, > I guess you've loaded the S3 filesystem using the s3 FS plugin. > > You need to put the right jar file containing the SAX2 driver class into > the plugin directory where you've also put the S3 filesystem plugin. > You can probably find out the name of the right sax2 jar file from your > local setup where everything is working. > > I hope that helps! > > Best, > Robert > > On Thu, Aug 27, 2020 at 1:38 PM Averell <lvhu...@gmail.com> wrote: > >> Hello, >> >> I have a Flink 1.10 job which runs in AWS EMR, checkpointing to S3a as >> well >> as writing output to S3a using StreamingFileSink. The job runs well until >> I >> add the Java Hadoop properties: /-Dfs.s3a.acl.default= >> BucketOwnerFullControl/. Since after that, the checkpoint process fails to >> complete. >> >> /Caused by: org.xml.sax.SAXException: SAX2 driver class >> org.apache.xerces.parsers.SAXParser not found/ >> I tried to add a jar file with that class >> (https://mvnrepository.com/artifact/xerces/xercesImpl/2.12.0) to my >> flink/lib/ directory, then got the same error but different stacktrace: >> /Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class >> org.apache.xerces.parsers.SAXParser not found/ >> >> This seems to be a dependencies conflict, but I couldn't track its root. >> In my IDE I didn't have any dependencies issue, while I couldn't find >> SAXParser in the dependencies tree. >> >> *Here is the stacktrace when the jar file is not there:* >> /Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus >> on >> >> s3a://mybucket/checkpoint/a9502b1c81ced10dfcbb21ac43f03e61/chk-2/41f51c24-60fd-474b-9f89-3d65d87037c7: >> com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to >> create >> an XMLReader: Couldn't initialize a SAX driver to create an XMLReader >> at >> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177) >> at >> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088) >> at >> org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749) >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169) >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149) >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038) >> at >> >> org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141) >> at >> >> org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37) >> at >> >> org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164) >> at >> >> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126) >> at >> >> org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61) >> at >> >> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356) >> ... 17 more >> Caused by: com.amazonaws.SdkClientException: Couldn't initialize a SAX >> driver to create an XMLReader >> at >> >> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118) >> at >> >> com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87) >> at >> >> com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77) >> at >> >> com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) >> at >> >> com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31) >> at >> >> com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) >> at >> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) >> at >> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325) >> at >> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272) >> at >> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266) >> at >> >> com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262) >> at >> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317) >> at >> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223) >> ... 29 more >> Caused by: org.xml.sax.SAXException: SAX2 driver class >> org.apache.xerces.parsers.SAXParser not found >> java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser >> at >> org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230) >> at >> >> org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191) >> at >> >> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115) >> ... 52 more/ >> >> *And here is the stacktrace when that jar file added to /lib/ folder* >> >> /Could not materialize checkpoint 1 for operator Source: >> <my_operators_chain> (1/2). >> at >> >> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:1238) >> at >> >> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1180) >> at >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: >> Could not open output stream for state backend >> at java.util.concurrent.FutureTask.report(FutureTask.java:122) >> at java.util.concurrent.FutureTask.get(FutureTask.java:192) >> at >> >> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:461) >> at >> >> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53) >> at >> >> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143) >> ... 3 common frames omitted >> Caused by: org.apache.flink.util.SerializedThrowable: Could not open >> output >> stream for state backend >> at >> >> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:367) >> at >> >> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flush(FsCheckpointStreamFactory.java:234) >> at >> >> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:209) >> at java.io.DataOutputStream.write(DataOutputStream.java:107) >> at java.io.FilterOutputStream.write(FilterOutputStream.java:97) >> at >> >> org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:78) >> at >> >> org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:33) >> at >> >> org.apache.flink.runtime.state.PartitionableListState.write(PartitionableListState.java:116) >> at >> >> org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:155) >> at >> >> org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:108) >> at >> >> org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:75) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> >> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458) >> ... 5 common frames omitted >> Caused by: org.apache.flink.util.SerializedThrowable: getFileStatus on >> >> s3a://mybucket/checkpoint/d8ed6d1524169c942bbc455d2c519a39/chk-1/7f2d8fd6-4f3f-4da7-9ffd-5a7e3ea8e7e3: >> com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to >> create >> an XMLReader: Couldn't initialize a SAX driver to create an XMLReader >> at >> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177) >> at >> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088) >> at >> org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749) >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169) >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149) >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038) >> at >> >> org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141) >> at >> >> org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37) >> at >> >> org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164) >> at >> >> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126) >> at >> >> org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61) >> at >> >> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356) >> ... 17 common frames omitted >> Caused by: org.apache.flink.util.SerializedThrowable: Couldn't initialize >> a >> SAX driver to create an XMLReader >> at >> >> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118) >> at >> >> com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87) >> at >> >> com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77) >> at >> >> com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) >> at >> >> com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31) >> at >> >> com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) >> at >> >> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) >> at >> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) >> at >> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325) >> at >> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272) >> at >> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266) >> at >> >> com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262) >> at >> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317) >> at >> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255) >> at >> >> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223) >> ... 29 common frames omitted >> Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class >> org.apache.xerces.parsers.SAXParser not found >> at >> org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230) >> at >> >> org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191) >> at >> >> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115) >> ... 52 common frames omitted >> Caused by: org.apache.flink.util.SerializedThrowable: >> org.apache.xerces.parsers.SAXParser >> at java.net.URLClassLoader.findClass(URLClassLoader.java:382) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:418) >> at >> >> org.apache.flink.core.plugin.PluginLoader$PluginClassLoader.loadClass(PluginLoader.java:149) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:351) >> at >> org.xml.sax.helpers.NewInstance.newInstance(NewInstance.java:82) >> at >> org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:228) >> ... 54 common frames omitted >> / >> >> >> >> -- >> Sent from: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >> > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng