Hi Averell,

This is a known bug [1] caused by the used AWS S3 library not respecting
the classloader [2].

The best solution is to upgrade to 1.10.1 (or take the s3-hadoop jar from
1.10.1). Don't try to put Xerces manually anywhere.

[1] https://issues.apache.org/jira/browse/FLINK-16014
[2] https://github.com/aws/aws-sdk-java/issues/2242

On Thu, Aug 27, 2020 at 4:34 PM Robert Metzger <rmetz...@apache.org> wrote:

> Hi,
> I guess you've loaded the S3 filesystem using the s3 FS plugin.
>
> You need to put the right jar file containing the SAX2 driver class into
> the plugin directory where you've also put the S3 filesystem plugin.
> You can probably find out the name of the right sax2 jar file from your
> local setup where everything is working.
>
> I hope that helps!
>
> Best,
> Robert
>
> On Thu, Aug 27, 2020 at 1:38 PM Averell <lvhu...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a Flink 1.10 job which runs in AWS EMR, checkpointing to S3a as
>> well
>> as writing output to S3a using StreamingFileSink. The job runs well until
>> I
>> add the Java Hadoop properties:  /-Dfs.s3a.acl.default=
>> BucketOwnerFullControl/. Since after that, the checkpoint process fails to
>> complete.
>>
>> /Caused by: org.xml.sax.SAXException: SAX2 driver class
>> org.apache.xerces.parsers.SAXParser not found/
>> I tried to add a jar file with that class
>> (https://mvnrepository.com/artifact/xerces/xercesImpl/2.12.0) to my
>> flink/lib/ directory, then got the same error but different stacktrace:
>> /Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class
>> org.apache.xerces.parsers.SAXParser not found/
>>
>> This seems to be a dependencies conflict, but I couldn't track its root.
>> In my IDE I didn't have any dependencies issue, while I couldn't find
>> SAXParser in the dependencies tree.
>>
>> *Here is the stacktrace when the jar file is not there:*
>> /Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus
>> on
>>
>> s3a://mybucket/checkpoint/a9502b1c81ced10dfcbb21ac43f03e61/chk-2/41f51c24-60fd-474b-9f89-3d65d87037c7:
>> com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to
>> create
>> an XMLReader: Couldn't initialize a SAX driver to create an XMLReader
>>         at
>> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
>>         at
>> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
>>         at
>> org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
>>         at
>>
>> org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
>>         at
>>
>> org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
>>         at
>>
>> org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164)
>>         at
>>
>> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126)
>>         at
>>
>> org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61)
>>         at
>>
>> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356)
>>         ... 17 more
>> Caused by: com.amazonaws.SdkClientException: Couldn't initialize a SAX
>> driver to create an XMLReader
>>         at
>>
>> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118)
>>         at
>>
>> com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87)
>>         at
>>
>> com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
>>         at
>>
>> com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
>>         at
>>
>> com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
>>         at
>>
>> com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
>>         at
>> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
>>         at
>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
>>         at
>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
>>         at
>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
>>         at
>>
>> com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262)
>>         at
>> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
>>         at
>> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223)
>>         ... 29 more
>> Caused by: org.xml.sax.SAXException: SAX2 driver class
>> org.apache.xerces.parsers.SAXParser not found
>> java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser
>>         at
>> org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230)
>>         at
>>
>> org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191)
>>         at
>>
>> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115)
>>         ... 52 more/
>>
>> *And here is the stacktrace when that jar file added to /lib/ folder*
>>
>> /Could not materialize checkpoint 1 for operator Source:
>> <my_operators_chain> (1/2).
>>         at
>>
>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:1238)
>>         at
>>
>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1180)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>         at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException:
>> Could not open output stream for state backend
>>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>>         at
>>
>> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:461)
>>         at
>>
>> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53)
>>         at
>>
>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143)
>>         ... 3 common frames omitted
>> Caused by: org.apache.flink.util.SerializedThrowable: Could not open
>> output
>> stream for state backend
>>         at
>>
>> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:367)
>>         at
>>
>> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flush(FsCheckpointStreamFactory.java:234)
>>         at
>>
>> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:209)
>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>         at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>         at
>>
>> org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:78)
>>         at
>>
>> org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:33)
>>         at
>>
>> org.apache.flink.runtime.state.PartitionableListState.write(PartitionableListState.java:116)
>>         at
>>
>> org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:155)
>>         at
>>
>> org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:108)
>>         at
>>
>> org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:75)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at
>>
>> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458)
>>         ... 5 common frames omitted
>> Caused by: org.apache.flink.util.SerializedThrowable: getFileStatus on
>>
>> s3a://mybucket/checkpoint/d8ed6d1524169c942bbc455d2c519a39/chk-1/7f2d8fd6-4f3f-4da7-9ffd-5a7e3ea8e7e3:
>> com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to
>> create
>> an XMLReader: Couldn't initialize a SAX driver to create an XMLReader
>>         at
>> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
>>         at
>> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
>>         at
>> org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
>>         at
>>
>> org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
>>         at
>>
>> org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
>>         at
>>
>> org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164)
>>         at
>>
>> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126)
>>         at
>>
>> org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61)
>>         at
>>
>> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356)
>>         ... 17 common frames omitted
>> Caused by: org.apache.flink.util.SerializedThrowable: Couldn't initialize
>> a
>> SAX driver to create an XMLReader
>>         at
>>
>> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118)
>>         at
>>
>> com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87)
>>         at
>>
>> com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
>>         at
>>
>> com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
>>         at
>>
>> com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
>>         at
>>
>> com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
>>         at
>>
>> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
>>         at
>> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
>>         at
>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
>>         at
>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
>>         at
>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
>>         at
>>
>> com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262)
>>         at
>> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
>>         at
>> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255)
>>         at
>>
>> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223)
>>         ... 29 common frames omitted
>> Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class
>> org.apache.xerces.parsers.SAXParser not found
>>         at
>> org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230)
>>         at
>>
>> org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191)
>>         at
>>
>> com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115)
>>         ... 52 common frames omitted
>> Caused by: org.apache.flink.util.SerializedThrowable:
>> org.apache.xerces.parsers.SAXParser
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>>         at
>>
>> org.apache.flink.core.plugin.PluginLoader$PluginClassLoader.loadClass(PluginLoader.java:149)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>         at
>> org.xml.sax.helpers.NewInstance.newInstance(NewInstance.java:82)
>>         at
>> org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:228)
>>         ... 54 common frames omitted
>> /
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Reply via email to