>>>>> "Christopher" == Christopher Meiklejohn <cmeiklej...@basho.com> writes:
>> The intention is to store log messages in each element of the set: >> either as a string (syslog or json, or whatever else the user sees fit), >> or as a map of key-value pairs (where values themselves can be maps >> too). >> >> On average, the log messages are a few kilobytes in size. There may be >> exceptions, but >1mb ones are fairly rare. How much data the set would >> hold... now that's a question that can't really be answered. It is >> really up to the syslog-ng user to configure that. Christopher> I’m referring to the size of the entire set, not the objects that will be members of Christopher> the set. Therefore, the performance penalty seen when using large objects would Christopher> be observed as soon as the size of the entire set (or map) has reached ~1 MB. Christopher> Given that restriction, I’d imagine you would only be able to store a few messages Christopher> in each set. That granularity seems like you are no longer getting the benefits Christopher> of the set. The granularity is configurable by the user: if they have small (say, a few hundred byte long) messages, then we can store a reasonable amount of them in a single Set. For example, assuming an average length of 384 bytes / message (the longest line in today's logs on my laptop), a Set would be able to store about 2k messages. That's not too bad. Christopher> Additionally, the primary benefit of the data types in Riak is that they converge Christopher> deterministically when dealing with concurrent Christopher> operations. Not only that: using sets makes the keys predictable. If I want to retrieve the logs, with sets, I can retrieve the 2015-05-01T15:12:10-T15:12:15 key for example, and have all the logs from those 5 seconds. If I used one message per Riak object, it would be much harder to read the data back. There may be multiple threads adding to the same set, so the deterministic convergence is useful still. Christopher> I’m curious if the set is the right choice here; could Christopher> you just use a custom set format inside of a normal Christopher> Riak object That's an option worth considering, yes. Thanks! Christopher> (or store one message per Riak object, given the write will be an Christopher> immutable log entry?) That's the first goal of the project, because that's the easiest and most straightforward to implement. The downside of one message per Riak object is that it's hard to retrieve the data, because making the keys predictable is not going to be easy. Boxing a few hundred (or few thousand) together into a Set has the advantage of making the keys predictable at the cost of transferring more data to the client when looking for a subset of the logs within the set. -- |8] _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com