Hello Eric,

This happens when the data being fetched from Cassandra in single split is
greater than the maximum framesize allowed in thrift (yes it still uses
thrift underneath, until the next release when we will start using Native
CQL).

Generally, we do set the the Cassandra the framesize in Cassandra when
using it with Spark/Hadoop to 32MB or larger depending on our data model
and row size.

If you don't want to touch the Cassandra configuration you will have to
reduce the page size in use. The default here is 1000 CQL rows.
By the sizes mentioned in error message (20MB vs 15MB) I would suggest
setting the page size to 700 or lesser.

This can be done by using pageSize method in CasBuilder.

cqlCas.pageSize(700)


Cheers,
Rohit



*Founder & CEO, **Tuplejump, Inc.*
____________________________
www.tuplejump.com
*The Data Engineering Platform*


On Sat, Apr 19, 2014 at 3:02 AM, ericjohnston1989 <
ericjohnston1...@gmail.com> wrote:

> Hey all,
>
> I'm working with Calliope to run jobs on a Cassandra cluster in standalone
> mode. On some larger jobs I run into the following error:
>
> java.lang.RuntimeException: Frame size (20667866) larger than max length
> (15728640)!
>         at
>
> org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
>         at
>
> org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:322)
>         at
>
> org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:289)
>         at
>
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>         at
>
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>         at
>
> org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.nextKeyValue(CqlPagingRecordReader.java:205)
>         at
>
> com.tuplejump.calliope.cql3.Cql3CassandraRDD$$anon$1.hasNext(Cql3CassandraRDD.scala:73)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:724)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:720)
>         at
>
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
>         at
>
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
>         at
>
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
>         at
>
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
>         at
>
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
>
> The max frame size (15728640) is 15mb, which is the default frame size
> Cassandra uses. Has anyone seen this before? Are there common workarounds?
> Also, I'd much rather not have to poke around changing Cassandra settings,
> but I can change spark settings as much as I like.
>
> My program itself is extremely simple since I'm testing. I'm just using
> count() on the RDD I created with casbuilder.
>
> Thanks,
>
> Eric
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Calliope-Frame-size-larger-than-max-length-tp4469.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to