I remember a bug on the ColumnFamilyInputFormat class 0.8.10. It was a test rpc_endpoints == "0.0.0.0" in place of rpc_endpoint.equals("0.0.0.0"), may be it can help you
Le 6 mars 2012 12:18, Florent Lefillâtre <flefi...@gmail.com> a écrit : > Excuse me, I had not understood. > So, for me, the problem comes from the change of ColumnFamilyInputFormat > class between 0.8.7 and 0.8.10 where the splits are created (0.8.7 uses > endpoints and 0.8.10 uses rpc_endpoints). > With your config, splits fails, so Hadoop doesn't run a Map task on > approximtively 16384 rows (your cassandra.input.split.size) but on all the > rows of a node (certainly more over 16384). > However Hadoop estimate the task progress on 16384 inputs, it's why you > have something like 9076.81%. > > If you can't change rpc_adress configuration, I don't know how you can > solve your problem :/, sorry. > > Le 6 mars 2012 11:53, Patrik Modesto <patrik.mode...@gmail.com> a écrit : > > Hi Florent, >> >> I don't change the server version, it is the Cassandra 0.8.10. I >> change just the version of cassandra-all in pom.xml of the mapreduce >> job. >> >> I have the 'rpc_address: 0.0.0.0' in cassandra.yaml, because I want >> cassandra to bind RPC to all interfaces. >> >> Regards, >> P. >> >> On Tue, Mar 6, 2012 at 09:44, Florent Lefillâtre <flefi...@gmail.com> >> wrote: >> > Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5. >> > In my case the split of token range failed. >> > I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml. >> > May be see if you have not configuration changes between 0.8.7 and >> 0.8.10 >> > >> > >> > Le 6 mars 2012 09:32, Patrik Modesto <patrik.mode...@gmail.com> a >> écrit : >> > >> >> Hi, >> >> >> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the >> >> Timeouts I get are not because of the Cassandra can't handle the >> >> requests. I've noticed there are several tasks that show proggess of >> >> several thousands percents. Seems like they are looping their range of >> >> keys. I've run the job with debug enabled and the ranges look ok, see >> >> http://pastebin.com/stVsFzLM >> >> >> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the >> >> number of mappers the job creates: >> >> 0.8.7: 4680 >> >> 0.8.10: 595 >> >> >> >> Task Complete >> >> task_201202281457_2027_m_000041 9076.81% >> >> task_201202281457_2027_m_000073 9639.04% >> >> task_201202281457_2027_m_000105 10538.60% >> >> task_201202281457_2027_m_000108 9364.17% >> >> >> >> None of this happens with cassandra-all 0.8.7. >> >> >> >> Regards, >> >> P. >> >> >> >> >> >> >> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto < >> patrik.mode...@gmail.com> >> >> wrote: >> >> > I'll alter these settings and will let you know. >> >> > >> >> > Regards, >> >> > P. >> >> > >> >> > On Tue, Feb 28, 2012 at 09:23, aaron morton <aa...@thelastpickle.com >> > >> >> > wrote: >> >> >> Have you tried lowering the batch size and increasing the time out? >> >> >> Even >> >> >> just to get it to work. >> >> >> >> >> >> If you get a TimedOutException it means CL number of servers did not >> >> >> respond >> >> >> in time. >> >> >> >> >> >> Cheers >> >> >> >> >> >> ----------------- >> >> >> Aaron Morton >> >> >> Freelance Developer >> >> >> @aaronmorton >> >> >> http://www.thelastpickle.com >> >> >> >> >> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote: >> >> >> >> >> >> Hi aaron, >> >> >> >> >> >> this is our current settings: >> >> >> >> >> >> <property> >> >> >> <name>cassandra.range.batch.size</name> >> >> >> <value>1024</value> >> >> >> </property> >> >> >> >> >> >> <property> >> >> >> <name>cassandra.input.split.size</name> >> >> >> <value>16384</value> >> >> >> </property> >> >> >> >> >> >> rpc_timeout_in_ms: 30000 >> >> >> >> >> >> Regards, >> >> >> P. >> >> >> >> >> >> On Mon, Feb 27, 2012 at 21:54, aaron morton < >> aa...@thelastpickle.com> >> >> >> wrote: >> >> >> >> >> >> What settings do you have for cassandra.range.batch.size >> >> >> >> >> >> and rpc_timeout_in_ms ? Have you tried reducing the first and/or >> >> >> increasing >> >> >> >> >> >> the second ? >> >> >> >> >> >> >> >> >> Cheers >> >> >> >> >> >> >> >> >> ----------------- >> >> >> >> >> >> Aaron Morton >> >> >> >> >> >> Freelance Developer >> >> >> >> >> >> @aaronmorton >> >> >> >> >> >> http://www.thelastpickle.com >> >> >> >> >> >> >> >> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote: >> >> >> >> >> >> >> >> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo < >> edlinuxg...@gmail.com> >> >> >> >> >> >> wrote: >> >> >> >> >> >> >> >> >> Did you see the notes here? >> >> >> >> >> >> >> >> >> >> >> >> I'm not sure what do you mean by the notes? >> >> >> >> >> >> >> >> >> I'm using the mapred.* settings suggested there: >> >> >> >> >> >> >> >> >> <property> >> >> >> >> >> >> <name>mapred.max.tracker.failures</name> >> >> >> >> >> >> <value>20</value> >> >> >> >> >> >> </property> >> >> >> >> >> >> <property> >> >> >> >> >> >> <name>mapred.map.max.attempts</name> >> >> >> >> >> >> <value>20</value> >> >> >> >> >> >> </property> >> >> >> >> >> >> <property> >> >> >> >> >> >> <name>mapred.reduce.max.attempts</name> >> >> >> >> >> >> <value>20</value> >> >> >> >> >> >> </property> >> >> >> >> >> >> >> >> >> But I still see the timeouts that I haven't with cassandra-all >> 0.8.7. >> >> >> >> >> >> >> >> >> P. >> >> >> >> >> >> >> >> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting >> >> >> >> >> >> >> >> >> >> >> >> >> > >> > >> > >