I don't think the below statement accurately describes data mining or using 
Cassandra for data mining. All the techniques I am familiar with for either 
data mining or machine learning, which data mining is a subset, make one or 
more sequential scans through the data to abstract statistics or build models. 
The question is how well does Cassandra perform with sequential scans through 
the data? The Hadoop model works very well for many machine learning problems 
because it is oriented toward sequential scans through the data. The speed of 
the Hadoop interface to Cassandra would have a lot of bearing on the 
application of Cassandra to data mining or machine learning problems.

-------------
Sincerely,
David G. Boney
dbon...@semanticartifacts.com
http://www.semanticartifacts.com




On Jan 20, 2011, at 6:35 AM, David Boxenhorn wrote:

> Cassandra is not a good solution for data mining type problems, since it 
> doesn't have ad-hoc queries. Cassandra is designed to maximize throughput, 
> which is not usually a problem for data mining. 
> 
> On Thu, Jan 20, 2011 at 2:07 PM, Surender Singh <suriait2...@gmail.com> wrote:
> Hi All
> 
> I want to use Apache Cassandra to store information (like first name, last
> name, gender, address)  about 2 million people.  Then need to perform
> analytic and reporting on that data.
> is need to store information about 2 million people in Mysql and then
> transfer that information into Cassandra.?
> 
> Please help me as i m new to Apache Cassandra.
> 
> if you have any use case like that, please share.
> 
> Thanks and regards
> Surender Singh
> 
> 

Reply via email to