Hello! I am working on a project, for which I have to evaluate and recommend the implementation of a new database system, with the following major characteristics:
* Operational scalability * Low cost * Ability to serve both as a data storage facility and an advanced data manipulation tool * Speed of execution * Real-time writing capability, with potential to record millions of client data records in real time * Flexibility: ability to support all client data types and formats, structured and unstructured * Capability to support multiple data centers and geographies * Ability to provide data infrastructure solutions for clients with small and Big Data needs * Full and flawless integration with the following 3 infrastructures: (1) A data mining application (IBM SPSS Modeler) that imports/exports data from/to an SQL database (2) A partner platform, based on an Oracle Database (CSV data import/export) (3) Various client SQL databases, whose data elements will be uploaded and replicated in the recommended database system As a result to my research, I am planning to recommend the implementation of Apache Cassandra NoSQL DB, hosted on Amazon Elastic Compute Cloud (Amazon EC2). I realize that the biggest challenge from the above 3 points is probably the last one, since for each client we need to custom-build and replicate their database, changing the data model from SQL to NoSQL. The reason being that (1) and (2) relate only to transferring data up and down between SQL and NoSQL environments. My question is how easy/difficult is it to build a GUI/API that will be able to do the integration in the above 3 points with respect to transferring data (upstream / downstream) between the Cassandra NoSQL NoSQL environments? Do you have any other comments or suggestions that I should consider? Thanks a lot for your involvement and have a great day! Sincerely, Krassimir Kostov