Hello all, We are trying to develop a language corpus by using Cassandra as its storage medium.
https://gist.github.com/cdwijayarathna/7550176443ad2229fae0 shows the types of information we need to extract from corpus interface. So we designed schema at https://gist.github.com/cdwijayarathna/6491122063152669839f to use as the database. Out target is to develop corpus with 100+ million words. By now we have inserted about 1.5 million words and database has used about 14GB space. Is this a normal scenario or are we doing anything wrong? Is there any issue in our data model? Thank You! -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.