[ 
https://issues.apache.org/jira/browse/SPARK-49872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-49872.
---------------------------------
    Resolution: Duplicate

> Spark History UI -- StreamConstraintsException: String length (20054016) 
> exceeds the maximum length (20000000)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-49872
>                 URL: https://issues.apache.org/jira/browse/SPARK-49872
>             Project: Spark
>          Issue Type: Bug
>          Components: UI
>    Affects Versions: 3.5.3, 3.5.4
>            Reporter: Anthony Sgro
>            Priority: Major
>              Labels: pull-request-available
>
> There is an issue with the Spark History UI with large amounts of event logs.
> The root of this problem is the breaking change in jackson that (in the name 
> of "safety") introduced some JSON size limits, see: 
> [https://github.com/FasterXML/jackson-core/issues/1014]
> Looks like {{JSONOptions}} in Spark already [support configuring this 
> limit|https://github.com/apache/spark/blob/c2dbb6d04bc9c781fb4a7673e5acf2c67b99c203/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L55-L58],
>  but there seems to be no way to set it globally.
> Spark should be able to handle strings of arbitrary length. I have tried 
> configuring rolling event logs, pruning event logs, etc. but this issue is 
> not fixed or causes so much data loss that the spark history ui is completely 
> useless.
> Perhaps a solution could be to add a config like:
> {code:java}
> spark.history.server.jsonStreamReadConstraints.maxStringLength=<new_value> 
> {code}
> This has a workaround for reading JSON during your application:
> {code:java}
> spark.read.option("maxStringLen", 100000000).json(path) {code}
> But this is not an option for accessing the Spark History UI. Here is the 
> full stack trace
> {code:java}
> HTTP ERROR 500 
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length
>  (20054016) exceeds the maximum length (20000000)
> URI:/history/application_1728009195451_0002/1/jobs/
> STATUS:500
> MESSAGE:com.fasterxml.jackson.core.exc.StreamConstraintsException: String 
> length (20054016) exceeds the maximum length (20000000)
> SERVLET:org.apache.spark.deploy.history.HistoryServer$$anon$1-582a764a
> CAUSED BY:com.fasterxml.jackson.core.exc.StreamConstraintsException: String 
> length (20054016) exceeds the maximum length (20000000) 
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length 
> (20054016) exceeds the maximum length (20000000)
>       at 
> com.fasterxml.jackson.core.StreamReadConstraints.validateStringLength(StreamReadConstraints.java:324)
>       at 
> com.fasterxml.jackson.core.util.ReadConstrainedTextBuffer.validateStringLength(ReadConstrainedTextBuffer.java:27)
>       at 
> com.fasterxml.jackson.core.util.TextBuffer.finishCurrentSegment(TextBuffer.java:939)
>       at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2240)
>       at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2206)
>       at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:323)
>       at 
> com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer._deserializeContainerNoRecursion(JsonNodeDeserializer.java:572)
>       at 
> com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:100)
>       at 
> com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:25)
>       at 
> com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
>       at 
> com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4867)
>       at 
> com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3219)
>       at 
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:927)
>       at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:88)
>       at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:59)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3(FsHistoryProvider.scala:1143)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3$adapted(FsHistoryProvider.scala:1141)
>       at 
> org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:48)
>       at 
> org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)
>       at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:95)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1(FsHistoryProvider.scala:1141)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1$adapted(FsHistoryProvider.scala:1139)
>       at scala.collection.immutable.List.foreach(List.scala:431)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.parseAppEventLogs(FsHistoryProvider.scala:1139)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.rebuildAppStore(FsHistoryProvider.scala:1120)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.createInMemoryStore(FsHistoryProvider.scala:1358)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:347)
>       at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:199)
>       at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>       at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:134)
>       at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>       at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:55)
>       at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:51)
>       at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>       at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>       at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>       at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>       at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
>       at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>       at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>       at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:88)
>       at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:100)
>       at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:256)
>       at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:104)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
>       at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
>       at 
> org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
>       at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>       at 
> org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>       at 
> org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
>       at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
>       at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>       at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
>       at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>       at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
>       at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>       at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
>       at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>       at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
>       at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
>       at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>       at org.apache.spark.ui.ProxyRedirectHandler.handle(JettyUtils.scala:582)
>       at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>       at org.sparkproject.jetty.server.Server.handle(Server.java:516)
>       at 
> org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
>       at 
> org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
>       at 
> org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479)
>       at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
>       at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
>       at 
> org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
>       at 
> org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
>       at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
>       at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
>       at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
>       at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
>       at 
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
>       at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
>       at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
>       at java.lang.Thread.run(Thread.java:750){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to