Bear in mind that you won't be able to merely "tune" your schema - you will need to completely redesign your data model. Step one is to look at all of the queries you need to perform and get a handle on what flat, denormalized data model they will need to execute performantly in a NoSQL database. No JOINs. No ad hoc queries. Secondary indexes are supported, but not advised. The general model is that you have a "query table" for each form of query, with the primary key adapted to the needs of the query. That means a lot of denormalization and repetition of data. The new, automated Materialized View feature of Cassandra 3.0 can help with that a lot, but is a new feature and not quite stable enough for production (no DataStax Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not advised - better to do that processing at the application level. DSE also supports Hadoop and Spark for batch/analytics and Solr for search and ad hoc queries (or use Stratio or Stargate for Lucene queries.)
Best to start with a basic proof of concept implementation to get your feet wet and learn the ins and outs before making a full commitment. Is this a Java app? The Java Driver is where you need to get started in terms of ingesting and querying data. It's a bit more sophisticated than just a simple JDBC interface. Most of your queries will need to be rewritten anyway even though the CQL syntax does indeed look a lot like SQL, but much of that will be because your data model will need to be made NoSQL-compatible. That should get you started. -- Jack Krupansky On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > I understand, Ravi, we have our application layers well defined. The > major changes will be in database access layers and entities will be > changed. Schema will be modified to tune the efficiency of the data store > chosen. > > We have been using mongo as a cache for a long time now, but as its a > document store and since we have a crisp well defined schema we chose to go > with a columnar database. > > Our data size has been growing very rapidly. Currently it is 200GB with > indexes, in couple of years it will grow up to approx 5 TB. And we may need > to run procedures to aggregate data and update tables. > > On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sravikrish...@gmail.com> > wrote: > >> You are moving from a SQL database to C* ??? I hope you are aware of the >> differences between a nosql like C* and a RDBMS. To keep it short, the app >> has to change significantly. >> >> Please read documentation on differences between nosql and RDBMS. >> >> thanks. >> >> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: >> >>> Hi All, >>> >>> Im planning to shift from SQL database to a columnar nosql database, we >>> have streamlined our choices to Cassandra and HBase. I would really >>> appreciate if someone decent experience with both give me a honest >>> comparison on below parameters (links to neutral benchmarks/blogs also >>> appreciated): >>> >>> 1. Data Consistency (Eventual consistency allowed but define "eventual") >>> 2. Ease of Scaling Up >>> 3. Managebility >>> 4. Failure Recovery options >>> 5. Secondary Indexing >>> 6. Data Aggregation >>> 7. Query Language (3rd party wrapper solutions also allowed) >>> 8. Security >>> 9. *Commercial Support for quick solutions to issues*. >>> 10. Run batch job on data like map reduce or some common aggregation >>> functions using row scan. Any other packages for cassandra to achieve this? >>> 11. Trigger specific updates on tables used for secondary index. >>> 12. Please consider that our DB will be the source of truth, with no >>> specific requirement of immediate data consistency amongst nodes. >>> >>> Regards, >>> Bhuvan Rawal >>> SDE >>> >> >> >