Hi Jack, We are valuing reliability and consistency over performance right now. In E-commerce industry we can expect unexpected spikes at odd times.
Ill be grateful if you tell me about reliability and failover scenarios. On Wed, Jan 6, 2016 at 2:59 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > DataStax has documented quite a few customers/case studies: > http://www.datastax.com/resources/casestudies > > Materialized Views should be considered if you can go straight to 3.0, but > you can always do the same synthesized views yourself in your app, which is > current standard best practice anyways. MV is just a way to automate that > best practice. > > The key to performance is to characterize your load requirements and then > make sure to provision your cluster with enough nodes to support that load. > You'll have to do a proof of concept implementation to verify your own > requirements. Like start with a 6 or 8 node cluster for a subset of the > data and add nodes as needed to accommodate load. The trick is to limit the > amount of data on each node so that incoming requests can be processed as > rapidly as possible to meet latency requirements, and then to scale up load > capacity by adding nodes. > > -- Jack Krupansky > > On Tue, Jan 5, 2016 at 4:02 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> *Thanks Jack* *for the detailed advice*. >> >> Yes it is a Java Application. >> >> We have a Denormalized view of our data already in place, we use it for >> storing it in MongoDB as a cache, however will get our hands dirty before >> implementation. We would like to have a single DB view. And replace MongoDB >> & MySQL with a single data store. If we talk numbers then we can expect 10 >> Million create/update requests a day and ~500 Million read requests. >> >> The question here not "should I or should I not", but "which one". >> >> A lot of the features you have mentioned are supported but not advisable. >> *(automated >> Materialized View feature) (Triggers are supported, but not advised) >> (Secondary indexes are supported, but not advised). *By when do you >> believe that these will be stable enough to use for enterprise >> implementation? >> >> We have made our minds clear far as shift to NoSQL is concerned as MySQL >> is not able to serve our purpose and is currently a bottleneck in the >> design. >> >> From all the benchmarks we have analyzed for our use case, Cassandra >> seems to be doing better as far as performance is concerned. Our only >> concern is to know as a Primary Database how Cassandra compares with HBase. >> By Primary database I mean the attributes: Data Consistency, Transaction >> Management and Rollback, brisk Failure Recovery, cross datacenter >> replication and partition aware sharding. >> >> The general opinion of Cassandra is that its more of a cache, and as we >> are going to be replacing our primary Data Store we need something fast but >> not at the expense of reliability. Can you guide me towards a case study >> where someone has tuned it in such a way to perform reliably for most use >> cases. >> >> Also Ill be grateful if someone directs me to a repository where I can >> find major customers of the DB's and their case studies. >> >> Thanks & Regards, >> Bhuvan >> >> On Tue, Jan 5, 2016 at 9:56 PM, Jack Krupansky <jack.krupan...@gmail.com> >> wrote: >> >>> Bear in mind that you won't be able to merely "tune" your schema - you >>> will need to completely redesign your data model. Step one is to look at >>> all of the queries you need to perform and get a handle on what flat, >>> denormalized data model they will need to execute performantly in a NoSQL >>> database. No JOINs. No ad hoc queries. Secondary indexes are supported, but >>> not advised. The general model is that you have a "query table" for each >>> form of query, with the primary key adapted to the needs of the query. That >>> means a lot of denormalization and repetition of data. The new, automated >>> Materialized View feature of Cassandra 3.0 can help with that a lot, but is >>> a new feature and not quite stable enough for production (no DataStax >>> Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not >>> advised - better to do that processing at the application level. DSE also >>> supports Hadoop and Spark for batch/analytics and Solr for search and ad >>> hoc queries (or use Stratio or Stargate for Lucene queries.) >>> >>> Best to start with a basic proof of concept implementation to get your >>> feet wet and learn the ins and outs before making a full commitment. >>> >>> Is this a Java app? The Java Driver is where you need to get started in >>> terms of ingesting and querying data. It's a bit more sophisticated than >>> just a simple JDBC interface. Most of your queries will need to be >>> rewritten anyway even though the CQL syntax does indeed look a lot like >>> SQL, but much of that will be because your data model will need to be made >>> NoSQL-compatible. >>> >>> That should get you started. >>> >>> >>> -- Jack Krupansky >>> >>> On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <bhu1ra...@gmail.com> >>> wrote: >>> >>>> I understand, Ravi, we have our application layers well defined. The >>>> major changes will be in database access layers and entities will be >>>> changed. Schema will be modified to tune the efficiency of the data store >>>> chosen. >>>> >>>> We have been using mongo as a cache for a long time now, but as its a >>>> document store and since we have a crisp well defined schema we chose to go >>>> with a columnar database. >>>> >>>> Our data size has been growing very rapidly. Currently it is 200GB with >>>> indexes, in couple of years it will grow up to approx 5 TB. And we may need >>>> to run procedures to aggregate data and update tables. >>>> >>>> On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sravikrish...@gmail.com> >>>> wrote: >>>> >>>>> You are moving from a SQL database to C* ??? I hope you are aware of >>>>> the differences between a nosql like C* and a RDBMS. To keep it short, the >>>>> app has to change significantly. >>>>> >>>>> Please read documentation on differences between nosql and RDBMS. >>>>> >>>>> thanks. >>>>> >>>>> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bhu1ra...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Im planning to shift from SQL database to a columnar nosql database, >>>>>> we have streamlined our choices to Cassandra and HBase. I would really >>>>>> appreciate if someone decent experience with both give me a honest >>>>>> comparison on below parameters (links to neutral benchmarks/blogs also >>>>>> appreciated): >>>>>> >>>>>> 1. Data Consistency (Eventual consistency allowed but define >>>>>> "eventual") >>>>>> 2. Ease of Scaling Up >>>>>> 3. Managebility >>>>>> 4. Failure Recovery options >>>>>> 5. Secondary Indexing >>>>>> 6. Data Aggregation >>>>>> 7. Query Language (3rd party wrapper solutions also allowed) >>>>>> 8. Security >>>>>> 9. *Commercial Support for quick solutions to issues*. >>>>>> 10. Run batch job on data like map reduce or some common aggregation >>>>>> functions using row scan. Any other packages for cassandra to achieve >>>>>> this? >>>>>> 11. Trigger specific updates on tables used for secondary index. >>>>>> 12. Please consider that our DB will be the source of truth, with no >>>>>> specific requirement of immediate data consistency amongst nodes. >>>>>> >>>>>> Regards, >>>>>> Bhuvan Rawal >>>>>> SDE >>>>>> >>>>> >>>>> >>>> >>> >> >