Re: data agility

Milind Parikh Sun, 20 Nov 2011 14:51:44 -0800

For 99% of current applications requiing a persistent datastore, Oracle,
PgSQL and MySQL variants will suffice.


For the 1% of the applications, consider C* if

         (a) you have given up on distributed transactions ("ACID"LY; but
NOT "BASE"ICLY)
         (b) wondering about this new fangled horizonatly scalability
buzzword and wonder why disks cannot spin faster and faster
         (c) need/want to design optimized query paths for your data with a
and b

Rewording a, b and c
          a.1 Cassandra provides best-in-class low latency asynchronous
replication with battle-hardened mechanisms to manage "eventual
consistenency" in an inherently disordered ("entroprophized") world... ACID
versus BASE transactions
          b.1 Cassandra's write path is completely optimized. It will write
as fast as the disk will allow it; but the most important feature is that
if you need to write faster than an individual server will allow, add more
servers. The locality of data principle, the ineorable faster computations
and anti-entropy services enables you to cloud-scale.
           c.1 Writing is easy; but then you actually need to find the
data. And do it at scale--speed wise.  The columnar nature of Cassandra,
designs of the internals in Cassandra and support at the API level
(composite indexes) make it possible to have fast quering capabilities.

Milind


On Sun, Nov 20, 2011 at 2:19 PM, Dotan N. <dip...@gmail.com> wrote:

> Thanks Aaron, I kept this use-case free as to focus on the higher level
> description, it might have been a not a good idea.
> But generally I think I got a better intuition from the various answers,
> thanks!
>
>
> --
> Dotan, @jondot <http://twitter.com/jondot>
>
>
>
> On Sun, Nov 20, 2011 at 11:52 PM, Aaron Turner <synfina...@gmail.com>wrote:
>
>> Sounds like you need to figure out what your product is going to do
>> and what technology will best fit those requirements.  I know you're
>> worried about being agile and all that, but scaling requires you to
>> use the right tool for the job. Worry about new requirements when they
>> rear their ugly head rather then a dozen of "what if" scenarios.
>>
>> You can scale MySQL/etc and Cassandra, MongoDB, etc to 10-200M "users"
>> depending on what you're asking your datastore to do.  You haven't
>> defined that really at all other then some comments about wanting to
>> do some map/reduce jobs.
>>
>> Really what you should be doing is figuring out what kind of data you
>> need to store and your needs like access patterns, availability, ACID
>> compliance, etc and then figure out what technology is the best fit.
>> There are tons of "Cassandra vs X" comparisons for every NoSQL DB in
>> existence.
>>
>> Other then that, the map/reduce on Cassandra is more job based rather
>> then useful for interactive queries so if that is important then
>> Cassandra prolly isn't a good fit.  You did mention time series data
>> too, and that's a sweet spot for Cassandra and not something I
>> personally would put in a document based datastore like MonogoDB.
>>
>> Good luck.
>> -Aaron
>>
>> On Sun, Nov 20, 2011 at 1:24 PM, Dotan N. <dip...@gmail.com> wrote:
>> > Jahangir, thanks! however I've noted that we may very well need
>> to scale to
>> > 200M users or "entities" within a short amount of time - say a year or
>> two,
>> > 10M within few months.
>> >
>> > --
>> > Dotan, @jondot
>> >
>> >
>> > On Sun, Nov 20, 2011 at 11:14 PM, Jahangir Mohammed
>> > <md.jahangi...@gmail.com> wrote:
>> >>
>> >> IMHO, you should start with something very simple RDBMS and meanwhile
>> >> getting handle over Cassandra or other noSql technology. Start out with
>> >> simple, but always be aware and conscious of the next thing you will
>> have in
>> >> stack. It's timetaking to work with new technology if you are in the
>> phase
>> >> of prototyping something fast and geared towards a Vc demo. In most of
>> the
>> >> cases, you won't need noSql for a while unless there is a very strong
>> case.
>> >>
>> >> Thanks,
>> >> Jahangir
>> >>
>> >> On Nov 20, 2011 4:04 PM, "Dotan N." <dip...@gmail.com> wrote:
>> >>>
>> >>> Thanks David.
>> >>> Stephen: thanks for the tip, we can run a recommended configuration,
>> so
>> >>> that wouldn't be an issue. I guess I can focus that my questions are
>> on
>> >>> complexity of development.
>> >>> After digesting David's answer, I guess my follow up questions would
>> be
>> >>> - how would you process data in a cassandra cluster, typically? via
>> >>> one-off coded offline jobs?
>> >>> - how easy is map/reduce on existing data (just looked at brisk but it
>> >>> may be unrelated, any case, not too much written about it)
>> >>> - how would you do analytics over a cassandra cluster
>> >>> - given the common examples of time-series, how would you recommend to
>> >>> aggregate (sum, avg, facet) and provide statistics over the collected
>> data?
>> >>> for example if it were kinds of logs and you'd like to group all of
>> certain
>> >>> fields in it, or provide a histogram over it.
>> >>> Thanks!
>> >>>
>> >>> --
>> >>> Dotan, @jondot
>> >>>
>> >>>
>> >>> On Sun, Nov 20, 2011 at 10:32 PM, Stephen Connolly
>> >>> <stephen.alan.conno...@gmail.com> wrote:
>> >>>>
>> >>>> if your startup is bootstrapping then cassandra is sometimes to
>> heavy to
>> >>>> start with.
>> >>>>
>> >>>> i.e. it needs to be fed ram... you're not going to seriously run it
>> in
>> >>>> less than 1gb per node... that level of ram commitment can be too
>> much while
>> >>>> bootstrapping.
>> >>>>
>> >>>> if your startup has enough cash to pay for 3-5 recommended spec (see
>> >>>> wiki) nodes to be up 24/7 then cassandra is a good fit...
>> >>>>
>> >>>> a friend of mine is bootstrapping a startup and had to drop back to
>> >>>> mysql while he finds his pain points and customers... he knows he
>> will end
>> >>>> up jumping back to cassandra when he gets enough customers (or a VC)
>> but for
>> >>>> now the running costs are too much to pay from his own pocket...
>> note that
>> >>>> the jdbc driver and cql will make jumping back easy for him (as he
>> still
>> >>>> tests with c*... just runs at present against mysql.... nuts eh!)
>> >>>>
>> >>>> - Stephen
>> >>>>
>> >>>> ---
>> >>>> Sent from my Android phone, so random spelling mistakes, random
>> nonsense
>> >>>> words and other nonsense are a direct result of using swype to type
>> on the
>> >>>> screen
>> >>>>
>> >>>> On 20 Nov 2011 19:07, "Dotan N." <dip...@gmail.com> wrote:
>> >>>>>
>> >>>>> Hi all,
>> >>>>> my question may be more philosophical than related technically
>> >>>>> to Cassandra, but please bear with me.
>> >>>>> Given that a young startup may not know its product full at the
>> early
>> >>>>> stages, but that it definitely points to ~200M users,
>> >>>>> would Cassandra will be the right way to go?
>> >>>>> That is, the requirement is for a large data store, that can move
>> with
>> >>>>> product changes and requirements swiftly.
>> >>>>> Given that in Cassandra one thinks hard about the queries, and then
>> >>>>> builds a model to suit it best, I was thinking of
>> >>>>> this situation as problematic.
>> >>>>> So here are some questions:
>> >>>>> - would it be wiser to start with a more agile data store (such as
>> >>>>> mongodb) and then progress onto Cassandra, when the product itself
>> >>>>> solidifies?
>> >>>>> - given that we start with Cassandra from the get go, what is a
>> common
>> >>>>> (and quick in terms of development) way or practice to change data,
>> change
>> >>>>> schemas, as the product evolves?
>> >>>>> - is it even smart to start with Cassandra? would only startups
>> whose
>> >>>>> core business is big data start with it from the get go?
>> >>>>> - how would you do map/reduce with Cassandra? how agile is that?
>> (for
>> >>>>> example, can you run map/reduce _very_ frequently?)
>> >>>>> Thanks!
>> >>>>> --
>> >>>>> Dotan, @jondot
>> >>>
>> >
>> >
>>
>>
>>
>> --
>> Aaron Turner
>> http://synfin.net/         Twitter: @synfinatic
>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
>> Windows
>> Those who would give up essential Liberty, to purchase a little temporary
>> Safety, deserve neither Liberty nor Safety.
>>     -- Benjamin Franklin
>> "carpe diem quam minimum credula postero"
>>
>
>

Re: data agility

Reply via email to