2010/4/28 Даниел Симеонов <dsimeo...@gmail.com>: > Hi Sylvain, > Thank you very much! I still have some further questions, I didn't find > how row cache is being configured?
Provided you don't use trunk but something stable like 0.6.1 (which you should), it is in storage-conf.xml. It's one option of the definition of the column families (it is documented in the file). > Regarding the splitting of rows, I > understand that it is not so necessary, still I am curious whether it is > implementable by the client code. Well, I'm not sure there is any simple way to do it (at least not efficiently). Counting the number of columns in a row is expensive plus there is no easy way to implement counter in cassandra (even though https://issues.apache.org/jira/browse/CASSANDRA-580 will make that better someday). > Best regards, Daniel. > > 2010/4/28 Sylvain Lebresne <sylv...@yakaz.com> >> >> 2010/4/28 Даниел Симеонов <dsimeo...@gmail.com>: >> > Hi, >> > I have a question about if a row in a Column Family has only columns >> > whether all of the columns are deserialized in memory if you need any of >> > them? As I understood it is the case, >> >> No it's not. Only the columns you request are deserialized in memory. The >> only >> thing is that, as of now, during compaction the entire row will be >> deserialize at >> once. So it just have to still fit in memory. But depending of the >> typical size of >> your column, you can easily millions of columns in a row without it >> being a problem >> at all. >> >> > and if the Column Family is super >> > Column Family, then only the Super Column (entire) is brought up in >> > memory? >> >> Yes, that part is true. That is the problem with the current >> implementation of super >> columns. While you can have lots of column in one row, you probably >> don't want to >> have lots of columns in one super column (but it's no problem to have >> lots of super >> column in one row). >> >> > What about row cache, is it different than memtable? >> >> Be careful with row cache. If row cache is enable, then yes, any read >> in a row will read >> the entire row. So you typically don't want to use row cache in column >> family where rows >> have lots of columns (unless you always read all the columns in the >> row each time of >> course). >> >> > I have another one question, let's say there is only data to be inserted >> > and >> > a solution to it is to have columns to be added to rows in Column >> > Family, is >> > it possible in Cassandra to split the row if certain threshold is >> > reached, >> > say 100 columns per row, what if there are concurrent inserts? >> >> No, cassandra can't do that for you. But you should be okay with what >> you describe >> below. That is, if a given row corresponds to an hour of data, it will >> limit it's size. >> And again, the number of column in a row is not really limited as long as >> the >> overall size of the row fits easily in memory. >> >> > The original data model and use case is to insert timestamped data and >> > to >> > make range queries. The original keys of CF rows were in the form of >> > <id>.<timestamp> and then a single column with data, OPP was used. This >> > is >> > not an optimal solution, since nodes are hotter than others, I am >> > thinking >> > of changing the model in the way to have keys like <id>.<year/month/day> >> > and >> > then a list of columns with timestamps within this range and >> > RandomPartitioner or using OPP but preprocess part of the key with MD5, >> > i.e. >> > the key is MD5(<id>.<year/month/day>) + "hour of the day" . Just the >> > problem >> > is how to deal with large number of columns being inserted in a >> > particular >> > row. >> > Thank you very much! >> > Best regards, Daniel. > >