Hi Ryan, Thank you very much. This helps a lot.
On Sun, Dec 14, 2014 at 9:14 PM, Ryan Svihla <rsvi...@datastax.com> wrote: > > Well your data model looks fine at a glance, a lot of tables, but they > appear to be mapping to logically obvious query paths. This denormalization > will make your queries fast but eat up more disk, and if disk is really a > pain point, Id suggest looking at your economics a bit, and look at your > tradeoffs. > > > 1. If you want less disk usage, and can afford to have longer query > times, switch from denormalized views and use indexes instead, you'll get > better disk space savings, at the cost of more round trips on a read (read > index value..get partition key, do another read). > 2. If you really need queries to be as fast as possible, then you're > on the right path, but you'll have to realize this is the cost of scale. > With even relational databases in the past I've had to use a similar > strategy to speed up lookups (less different query parameters in that case > and more queries that would normally require lots of joins). > > Hope this helps explain tradeoffs and costs. > > On Sun, Dec 14, 2014 at 6:01 AM, Chamila Wijayarathna < > cdwijayarat...@gmail.com> wrote: >> >> Hello all, >> >> We are trying to develop a language corpus by using Cassandra as its >> storage medium. >> >> https://gist.github.com/cdwijayarathna/7550176443ad2229fae0 shows the >> types of information we need to extract from corpus interface. >> So we designed schema at >> https://gist.github.com/cdwijayarathna/6491122063152669839f to use as >> the database. Out target is to develop corpus with 100+ million words. >> >> By now we have inserted about 1.5 million words and database has used >> about 14GB space. Is this a normal scenario or are we doing anything wrong? >> Is there any issue in our data model? >> >> Thank You! >> -- >> *Chamila Dilshan Wijayarathna,* >> SMIEEE, SMIESL, >> Undergraduate, >> Department of Computer Science and Engineering, >> University of Moratuwa. >> > > > -- > > [image: datastax_logo.png] <http://www.datastax.com/> > > Ryan Svihla > > Solution Architect > > [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png] > <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> > > DataStax is the fastest, most scalable distributed database technology, > delivering Apache Cassandra to the world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.