Suppose I have a CF that holds some sort of assets that some users of
my program have access to, and that some do not.  In SQL-ish terms it
would look something like this:

TABLE Assets (
  asset_id serial primary key,
  ...
);

TABLE Users (
  user_id serial primary key,
  user_name text
);

TABLE Permissions (
  asset_id integer references(Assets),
  user_id integer references(Users)
)

Now, I can generate UUIDs for my asset keys without any trouble, so
the serial that I have in my pseudo-SQL Assets table isn't a problem.
My problem is that I can't see a good way to model the relationship
between user ids and assets.  I see one way to do this, which has
problems, and I think I sort of see a second way.

The obvious way to do it is have the Assets CF have a SuperColumn that
somehow enumerates the users allowed to see it, so when retrieving a
specific Asset I can retrieve the users list and ensure that the user
doing the request is allowed to see it.  This has quite a few
problems.  The foremost is that Cassandra doesn't appear to have much
for conflict resolution (at least I can't find any docs on it), so if
two processes try to add permissions to the same Asset, it looks like
one process will win and I have no idea what happens to the loser.
Another problem is that Cassandra's SuperColumns don't appear to be
ideal for storing lists of things; they store maps, which isn't a
terrible problem, but it feels like a bit of a mismatch in my design.
A SuperColumn mapping from user_ids to an empty byte array seems like
it should work pretty efficiently for checking whether a user has
permissions on an Asset, but it also seems pretty evil.

The other idea that I have is a seperate CF for AssetPermissions that
somehow stores pairs of asset_ids and user_names.  I don't know what
I'd use for a key in that situation, so I haven't really gotten too
far in seeing what else is broken with that idea.  I think it would
get around the race condition, but I don't know how to do it, and I'm
not sure how efficient it could be.

What do people normally use in this situation?  I assume it's a pretty
common problem, but I haven't see it in the various data modelling
examples on the Wiki.

Reply via email to