Re: Requesting some details for my use case

Bhuvan Rawal Thu, 07 Jan 2016 00:24:49 -0800

Hi Jack,

We are valuing reliability and consistency over performance right now. In
E-commerce industry we can expect unexpected spikes at odd times.


Ill be grateful if you tell me about reliability and failover scenarios.

On Wed, Jan 6, 2016 at 2:59 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> DataStax has documented quite a few customers/case studies:
> http://www.datastax.com/resources/casestudies
>
> Materialized Views should be considered if you can go straight to 3.0, but
> you can always do the same synthesized views yourself in your app, which is
> current standard best practice anyways. MV is just a way to automate that
> best practice.
>
> The key to performance is to characterize your load requirements and then
> make sure to provision your cluster with enough nodes to support that load.
> You'll have to do a proof of concept implementation to verify your own
> requirements. Like start with a 6 or 8 node cluster for a subset of the
> data and add nodes as needed to accommodate load. The trick is to limit the
> amount of data on each node so that incoming requests can be processed as
> rapidly as possible to meet latency requirements, and then to scale up load
> capacity by adding nodes.
>
> -- Jack Krupansky
>
> On Tue, Jan 5, 2016 at 4:02 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> *Thanks Jack* *for the detailed advice*.
>>
>> Yes it is a Java Application.
>>
>> We have a Denormalized view of our data already in place,  we use it for
>> storing it in MongoDB as a cache, however will get our hands dirty before
>> implementation. We would like to have a single DB view. And replace MongoDB
>> & MySQL with a single data store. If we talk numbers then we can expect 10
>> Million create/update requests a day and ~500 Million read requests.
>>
>> The question here not "should I or should I not", but "which one".
>>
>> A lot of the features you have mentioned are supported but not advisable. 
>> *(automated
>> Materialized View feature) (Triggers are supported, but not advised)
>> (Secondary indexes are supported, but not advised). *By when do you
>> believe that these will be stable enough to use for enterprise
>> implementation?
>>
>> We have made our minds clear far as shift to NoSQL is concerned as MySQL
>> is not able to serve our purpose and is currently a bottleneck in the
>> design.
>>
>>  From all the benchmarks we have analyzed for our use case, Cassandra
>> seems to be doing better as far as performance is concerned.  Our only
>> concern is to know as a Primary Database how Cassandra compares with HBase.
>> By Primary database I mean the attributes: Data Consistency, Transaction
>> Management and Rollback, brisk Failure Recovery, cross datacenter
>> replication and partition aware sharding.
>>
>> The general opinion of Cassandra is that its more of a cache, and as we
>> are going to be replacing our primary Data Store we need something fast but
>> not at the expense of reliability. Can you guide me towards a case study
>> where someone has tuned it in such a way to perform reliably for most use
>> cases.
>>
>> Also Ill be grateful if someone directs me to a repository where I can
>> find major customers of the DB's and their case studies.
>>
>> Thanks & Regards,
>> Bhuvan
>>
>> On Tue, Jan 5, 2016 at 9:56 PM, Jack Krupansky <jack.krupan...@gmail.com>
>> wrote:
>>
>>> Bear in mind that you won't be able to merely "tune" your schema - you
>>> will need to completely redesign your data model. Step one is to look at
>>> all of the queries you need to perform and get a handle on what flat,
>>> denormalized data model they will need to execute performantly in a NoSQL
>>> database. No JOINs. No ad hoc queries. Secondary indexes are supported, but
>>> not advised. The general model is that you have a "query table" for each
>>> form of query, with the primary key adapted to the needs of the query. That
>>> means a lot of denormalization and repetition of data. The new, automated
>>> Materialized View feature of Cassandra 3.0 can help with that a lot, but is
>>> a new feature and not quite stable enough for production (no DataStax
>>> Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not
>>> advised - better to do that processing at the application level. DSE also
>>> supports Hadoop and Spark for batch/analytics and Solr for search and ad
>>> hoc queries (or use Stratio or Stargate for Lucene queries.)
>>>
>>> Best to start with a basic proof of concept implementation to get your
>>> feet wet and learn the ins and outs before making a full commitment.
>>>
>>> Is this a Java app? The Java Driver is where you need to get started in
>>> terms of ingesting and querying data. It's a bit more sophisticated than
>>> just a simple JDBC interface. Most of your queries will need to be
>>> rewritten anyway even though the CQL syntax does indeed look a lot like
>>> SQL, but much of that will be because your data model will need to be made
>>> NoSQL-compatible.
>>>
>>> That should get you started.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> I understand, Ravi,  we have our application layers well defined. The
>>>> major changes will be in database access layers and entities will be
>>>> changed. Schema will be modified to tune the efficiency of the data store
>>>> chosen.
>>>>
>>>> We have been using mongo as a cache for a long time now, but as its a
>>>> document store and since we have a crisp well defined schema we chose to go
>>>> with a columnar database.
>>>>
>>>> Our data size has been growing very rapidly. Currently it is 200GB with
>>>> indexes, in couple of years it will grow up to approx 5 TB. And we may need
>>>> to run procedures to aggregate data and update tables.
>>>>
>>>> On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sravikrish...@gmail.com>
>>>> wrote:
>>>>
>>>>> You are moving from a SQL database to C* ??? I hope you are aware of
>>>>> the differences between a nosql like C* and a RDBMS. To keep it short, the
>>>>> app has to change significantly.
>>>>>
>>>>> Please read documentation on differences between nosql and RDBMS.
>>>>>
>>>>> thanks.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Im planning to shift from SQL database to a columnar nosql database,
>>>>>> we have streamlined our choices to Cassandra and HBase. I would really
>>>>>> appreciate if someone decent experience with both give me a honest
>>>>>> comparison on below parameters (links to neutral benchmarks/blogs also
>>>>>> appreciated):
>>>>>>
>>>>>> 1. Data Consistency (Eventual consistency allowed but define
>>>>>> "eventual")
>>>>>> 2. Ease of Scaling Up
>>>>>> 3. Managebility
>>>>>> 4. Failure Recovery options
>>>>>> 5. Secondary Indexing
>>>>>> 6. Data Aggregation
>>>>>> 7. Query Language (3rd party wrapper solutions also allowed)
>>>>>> 8. Security
>>>>>> 9. *Commercial Support for quick solutions to issues*.
>>>>>> 10. Run batch job on data like map reduce or some common aggregation
>>>>>> functions using row scan. Any other packages for cassandra to achieve 
>>>>>> this?
>>>>>> 11. Trigger specific updates on tables used for secondary index.
>>>>>> 12. Please consider that our DB will be the source of truth, with no
>>>>>> specific requirement of immediate data consistency amongst nodes.
>>>>>>
>>>>>> Regards,
>>>>>> Bhuvan Rawal
>>>>>> SDE
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Requesting some details for my use case

Reply via email to