Re: Query timed out after PT2M

Joe Obernberger Thu, 03 Feb 2022 13:53:36 -0800

So it turns out that number after PT is increments of 60 seconds. Ichanged the timeout to 960000, and now I get PT16M (960000/60000). Since I'm still getting timeouts, something else must be wrong.

Exception in thread "main" org.apache.spark.SparkException: Job aborteddue to stage failure: Task 306 in stage 0.0 failed 4 times, most recentfailure: Lost task 306.3 in stage 0.0 (TID 1180) (172.16.100.39 executor0): com.datastax.oss.driver.api.core.DriverTimeoutException: Query timedout after PT16M atcom.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:206) atcom.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672) atcom.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747) atcom.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472) atcom.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

        at java.base/java.lang.Thread.run(Thread.java:829)


Driver stacktrace:

atorg.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2454) atorg.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2403) atorg.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2402) atscala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) atscala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) atscala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) atorg.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2402) atorg.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1160) atorg.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160)

        at scala.Option.foreach(Option.scala:407)

atorg.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160) atorg.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2642) atorg.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2584) atorg.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2573)

        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

Caused by: com.datastax.oss.driver.api.core.DriverTimeoutException:Query timed out after PT16M atcom.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:206) atcom.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672) atcom.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747) atcom.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472) atcom.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)


-Joe

On 2/3/2022 3:30 PM, Joe Obernberger wrote:

I did find this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md

And "spark.cassandra.read.timeoutMS" is set to 120000.

Running a test now, and I think that is it.  Thank you Scott.

-Joe

On 2/3/2022 3:19 PM, Joe Obernberger wrote:
Thank you Scott!
I am using the spark cassandra connector.  Code:

SparkSession spark = SparkSession
                .builder()
                .appName("SparkCassandraApp")
                .config("spark.cassandra.connection.host", "chaos")
                .config("spark.cassandra.connection.port", "9042")
.master("spark://aether.querymasters.com:8181")
                .getOrCreate();

Would I set PT2M in there?  Like .config("pt2m","300") ?
I'm not familiar with jshell, so I'm not sure where you're gettingthat duration from.
Right now, I'm just doing a count:
Dataset<Row> dataset =spark.read().format("org.apache.spark.sql.cassandra")
                .options(new HashMap<String, String>() {
                    {
                        put("keyspace", "doc");
                        put("table", "doc");
                    }
                }).load();

dataset.count();


Thank you!

-Joe

On 2/3/2022 3:01 PM, C. Scott Andreas wrote:
Hi Joe, it looks like "PT2M" may refer to a timeout value that couldbe set by your Spark job's initialization of the client. I don't seea string matching this in the Cassandra codebase itself, but I dosee that this is parseable as a Duration.
```
jshell> java.time.Duration.parse("PT2M").getSeconds()
$7 ==> 120
```
The server-side log you see is likely an indicator of the timeoutfrom the server's perspective. You might consider checking lots fromthe replicas for dropped reads, query aborts due to scanning moretombstones than the configured max, or other conditions indicatingoverload/inability to serve a response.
If you're running a Spark job, I'd recommend using the DataStaxSpark Cassandra Connector which distributes your query to executorsaddressing slices of the token range which will land on replicasets, avoiding the scatter-gather behavior that can occur if usingthe Java driver alone.
Cheers,

– Scott
On Feb 3, 2022, at 11:42 AM, Joe Obernberger<joseph.obernber...@gmail.com> wrote:
Hi all - using a Cassandra 4.0.1 and a spark job running against alarge
table (~8 billion rows) and I'm getting this error on the client side:
Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
ReadCallback.java:119 - Timed out; received 0 of 1 responses
The same code works on another table in the same Cassandra clusterthatis about 300 million rows and completes in about 2 minutes. Thecluster
is 13 nodes.

I can't find what PT2M means.  Perhaps the table needs a repair? Other
ideas?
Thank you!

-Joe
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>Virus-free. www.avg.com<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Query timed out after PT2M

Reply via email to