Writing to Cassandra from map/reduce jobs over HDFS shouldn't be a problem.
We're doing it in our cluster and I know of others doing the same thing. You
might just make sure the number of reducers (or mappers) writing to cassandra
don't overwhelm it. There's no data locality for writes, thoug
Hi Jeremy,
yes, the setup on the data-nodes is:
- Hadoop DataNode
- Hadoop TaskTracker
- CassandraDaemon
However - the map-input is not read from Cassandra. I am running a writing
stress test - no reads (well from time to time I check the produced items using
cassandra
Udo,
One thing to get out of the way - you're running task trackers on all of your
cassandra nodes, right? That is the first and foremost way to get good
performance. Otherwise you don't have data locality, which is really the point
of map/reduce, co-locating your data and your processes oper
Hi Jeremy,
thanks for the link.
I doubled the rpc_timeout (20 seconds) and reduced the range-batch-size to
2048, but I still get timeouts...
Udo
Am 29.04.2011 um 18:53 schrieb Jeremy Hanna:
> It sounds like there might be some tuning you can do to your jobs - take a
> look at the wiki's Hado
It sounds like there might be some tuning you can do to your jobs - take a look
at the wiki's HadoopSupport page, specifically the Troubleshooting section:
http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
On Apr 29, 2011, at 11:45 AM, Subscriber wrote:
> Hi all,
>
> We want to sh