Mark, Thanks for your suggestion, It's really not a good idea to store one file in multiple columns in one row. The heap space problem will still exist. And I take your advice to store it in multiple rows, it works, I can event store one file with 2G.
On Mon, Apr 26, 2010 at 6:12 PM, Mark Robson <mar...@gmail.com> wrote: > On 26 April 2010 00:57, Shuge Lee <shuge....@gmail.com> wrote: >> >> In Python: >> >> keyspace.columnfamily[key][column] = value >> >> files.video[uuid.uuid4()]['name'] = 'foo.flv' >> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv' > > Hi. > Storing the filename in the database will not solve the file storage > problem. Cassandra is a distributed database, and a file stored locally will > not be available on other client nodes. > If you're using Cassandra at all, that probably implies that you have lots > of client nodes. A non-redundant NFS server (for example) would not offer > high availability, so would be inadequate for the OP's situation. > Storing files *IN* Cassandra is very useful because you can then retrieve > them from anywhere with high availability. > However, as others have discussed, they should be split across multiple > columns, or if very big, multiple rows. > I prefer to split by row because this scales better to very large files. > During compaction, as is well noted, Cassandra needs the entire row in > memory, which will cause a FAIL once you have files more than a few gigs. > Mark -- Best Regards Jeff Zhang