Hi Ryan,

Thank you very much. This helps a lot.

On Sun, Dec 14, 2014 at 9:14 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>
> Well your data model looks fine at a glance, a lot of tables, but they
> appear to be mapping to logically obvious query paths. This denormalization
> will make your queries fast but eat up more disk, and if disk is really a
> pain point, Id suggest looking at your economics a bit, and look at your
> tradeoffs.
>
>
>    1. If you want less disk usage, and can afford to have longer query
>    times, switch from denormalized views and use indexes instead, you'll get
>    better disk space savings, at the cost of more round trips on a read (read
>    index value..get partition key, do another read).
>    2. If you really need queries to be as fast as possible, then you're
>    on the right path, but you'll have to realize this is the cost of scale.
>    With even relational databases in the past I've had to use a similar
>    strategy to speed up lookups (less different query parameters in that case
>    and more queries that would normally require lots of joins).
>
> Hope this helps explain tradeoffs and costs.
>
> On Sun, Dec 14, 2014 at 6:01 AM, Chamila Wijayarathna <
> cdwijayarat...@gmail.com> wrote:
>>
>> Hello all,
>>
>> We are trying to develop a language corpus by using Cassandra as its
>> storage medium.
>>
>> https://gist.github.com/cdwijayarathna/7550176443ad2229fae0 shows the
>> types of information we need to extract from corpus interface.
>> So we designed schema at
>> https://gist.github.com/cdwijayarathna/6491122063152669839f to use as
>> the database. Out target is to develop corpus with 100+ million words.
>>
>> By now we have inserted about 1.5 million words and database has used
>> about 14GB space. Is this a normal scenario or are we doing anything wrong?
>> Is there any issue in our data model?
>>
>> Thank You!
>> --
>> *Chamila Dilshan Wijayarathna,*
>> SMIEEE, SMIESL,
>> Undergraduate,
>> Department of Computer Science and Engineering,
>> University of Moratuwa.
>>
>
>
> --
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Ryan Svihla
>
> Solution Architect
>
> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
>

-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.

Reply via email to