Re: newer Cassandra + Hadoop = TimedOutException()

Florent Lefillâtre Tue, 06 Mar 2012 03:19:14 -0800

Excuse me, I had not understood.
So, for me, the problem comes from the change of ColumnFamilyInputFormat
class between 0.8.7 and 0.8.10 where the splits are created (0.8.7 uses
endpoints and 0.8.10 uses rpc_endpoints).
With your config, splits fails, so Hadoop doesn't run a Map task on
approximtively 16384 rows (your cassandra.input.split.size) but on all the
rows of a node (certainly more over 16384).
However Hadoop estimate the task progress on 16384 inputs, it's why you
have something like 9076.81%.


If you can't change rpc_adress configuration, I don't know how you can
solve your problem :/, sorry.

Le 6 mars 2012 11:53, Patrik Modesto <patrik.mode...@gmail.com> a écrit :

> Hi Florent,
>
> I don't change the server version, it is the Cassandra 0.8.10. I
> change just the version of cassandra-all in pom.xml of the mapreduce
> job.
>
> I have the 'rpc_address: 0.0.0.0'  in cassandra.yaml, because I want
> cassandra to bind RPC to all interfaces.
>
> Regards,
> P.
>
> On Tue, Mar 6, 2012 at 09:44, Florent Lefillâtre <flefi...@gmail.com>
> wrote:
> > Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
> > In my case the split of token range failed.
> > I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
> > May be see if you have not configuration changes between 0.8.7 and 0.8.10
> >
> >
> > Le 6 mars 2012 09:32, Patrik Modesto <patrik.mode...@gmail.com> a écrit
> :
> >
> >> Hi,
> >>
> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> >> Timeouts I get are not because of the Cassandra can't handle the
> >> requests. I've noticed there are several tasks that show proggess of
> >> several thousands percents. Seems like they are looping their range of
> >> keys. I've run the job with debug enabled and the ranges look ok, see
> >> http://pastebin.com/stVsFzLM
> >>
> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
> >> number of mappers the job creates:
> >> 0.8.7: 4680
> >> 0.8.10: 595
> >>
> >> Task       Complete
> >> task_201202281457_2027_m_000041 9076.81%
> >> task_201202281457_2027_m_000073 9639.04%
> >> task_201202281457_2027_m_000105 10538.60%
> >> task_201202281457_2027_m_000108 9364.17%
> >>
> >> None of this happens with cassandra-all 0.8.7.
> >>
> >> Regards,
> >> P.
> >>
> >>
> >>
> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <patrik.mode...@gmail.com
> >
> >> wrote:
> >> > I'll alter these settings and will let you know.
> >> >
> >> > Regards,
> >> > P.
> >> >
> >> > On Tue, Feb 28, 2012 at 09:23, aaron morton <aa...@thelastpickle.com>
> >> > wrote:
> >> >> Have you tried lowering the  batch size and increasing the time out?
> >> >> Even
> >> >> just to get it to work.
> >> >>
> >> >> If you get a TimedOutException it means CL number of servers did not
> >> >> respond
> >> >> in time.
> >> >>
> >> >> Cheers
> >> >>
> >> >> -----------------
> >> >> Aaron Morton
> >> >> Freelance Developer
> >> >> @aaronmorton
> >> >> http://www.thelastpickle.com
> >> >>
> >> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
> >> >>
> >> >> Hi aaron,
> >> >>
> >> >> this is our current settings:
> >> >>
> >> >>      <property>
> >> >>          <name>cassandra.range.batch.size</name>
> >> >>          <value>1024</value>
> >> >>      </property>
> >> >>
> >> >>      <property>
> >> >>          <name>cassandra.input.split.size</name>
> >> >>          <value>16384</value>
> >> >>      </property>
> >> >>
> >> >> rpc_timeout_in_ms: 30000
> >> >>
> >> >> Regards,
> >> >> P.
> >> >>
> >> >> On Mon, Feb 27, 2012 at 21:54, aaron morton <aa...@thelastpickle.com
> >
> >> >> wrote:
> >> >>
> >> >> What settings do you have for cassandra.range.batch.size
> >> >>
> >> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
> >> >> increasing
> >> >>
> >> >> the second ?
> >> >>
> >> >>
> >> >> Cheers
> >> >>
> >> >>
> >> >> -----------------
> >> >>
> >> >> Aaron Morton
> >> >>
> >> >> Freelance Developer
> >> >>
> >> >> @aaronmorton
> >> >>
> >> >> http://www.thelastpickle.com
> >> >>
> >> >>
> >> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
> >> >>
> >> >>
> >> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <
> edlinuxg...@gmail.com>
> >> >>
> >> >> wrote:
> >> >>
> >> >>
> >> >> Did you see the notes here?
> >> >>
> >> >>
> >> >>
> >> >> I'm not sure what do you mean by the notes?
> >> >>
> >> >>
> >> >> I'm using the mapred.* settings suggested there:
> >> >>
> >> >>
> >> >>     <property>
> >> >>
> >> >>         <name>mapred.max.tracker.failures</name>
> >> >>
> >> >>         <value>20</value>
> >> >>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>
> >> >>         <name>mapred.map.max.attempts</name>
> >> >>
> >> >>         <value>20</value>
> >> >>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>
> >> >>         <name>mapred.reduce.max.attempts</name>
> >> >>
> >> >>         <value>20</value>
> >> >>
> >> >>     </property>
> >> >>
> >> >>
> >> >> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
> >> >>
> >> >>
> >> >> P.
> >> >>
> >> >>
> >> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
> >> >>
> >> >>
> >> >>
> >> >>
> >
> >
>

Re: newer Cassandra + Hadoop = TimedOutException()

Reply via email to