Hello all,

We are trying to develop a language corpus by using Cassandra as its
storage medium.

https://gist.github.com/cdwijayarathna/7550176443ad2229fae0 shows the types
of information we need to extract from corpus interface.
So we designed schema at
https://gist.github.com/cdwijayarathna/6491122063152669839f to use as the
database. Out target is to develop corpus with 100+ million words.

By now we have inserted about 1.5 million words and database has used about
14GB space. Is this a normal scenario or are we doing anything wrong? Is
there any issue in our data model?

Thank You!
-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.

Reply via email to