Hello,
I am trying to run a Hadoop job that pulls data out of Cassandra via
ColumnFamilyInputFormat. I am getting a "frame size" exception. To remedy that,
I have set both the thrift_framed_transport_size_in_mb and
thrift_max_message_length_in_mb to an "infinite" amount at 100000mb on all
nodes. Moreover, I have restarted the cluster and the cassandra.yaml files have
been reloaded.
However, I am still getting:
12/11/09 21:39:52 INFO mapred.JobClient: map 62% reduce 0%
12/11/09 21:40:09 INFO mapred.JobClient: Task Id :
attempt_201211082011_0015_m_000479_2, Status : FAILED
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException:
Frame size (30046945) larger than max length (16384000)!
at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:400)
at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:406)
at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:324)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:189)
Question: Why is 16384000 bytes (I assume) != 100000mb?
Next, I made this parameter true as a last hail mary attempt:
cassandra.input.widerows=true
...still with no luck.
Does someone know what I might be missing?
Thank you very much for your time,
Marko.
http://markorodriguez.com