Hi everybody, I'm implementing SCIM (System for Cross-domain Identity Management - https://tools.ietf.org/html/rfc7643 & https://tools.ietf.org/html/rfc7644) with several backends and I'm scratching my head trying to figure out how to model the data model in Riak.
There are 2 types of attributes: * simple attributes: scalars, such as boolean, string, date, integer, float * complex attributes: a map of simple attributes Attributes (simple _and_ complex) can be: * singular (0 or 1 occurence) * muti-valued Example of a "User" object: { "userName": "bjen...@example.com", "name": { "formatted": "Ms. Barbara J Jensen, III", "familyName": "Jensen", "givenName": "Barbara", "middleName": "Jane", "honorificPrefix": "Ms.", "honorificSuffix": "III" }, "access_rights": [ "scope_a", "scope_b", "scope_c" ] "emails": [ { "value": "bjen...@example.com", "type": "work", "primary": true }, { "value": "b...@jensen.org", "type": "home" } ], } With respectively a simple singular attribute, a complex singular attribute, a simple multi-valued attribute and a complex multi-value attribute. SCIM prohibits further nesting of objects (therefore the deepest level is the subattribute). I'd like to implement a Riak backend using the core CRDTS, and also to implement SCIM "search" (https://tools.ietf.org/html/rfc7644#section-3.4.2). In a nutshell, SCIM search allows: * comparing equality, inequality, greater/lower than, attribute is present, starts/ends with (for strings), contains (for strings) * and, or and not operands * matching for attribute and sub-attributes (e.g.: name.familyName co "O'Malley" - co is for "contains") * complex attribute filter grouping (e.g.: userType eq "Employee" and emails[type eq "work" and value co "@example.com"] - eq is for "equal"). In this case the expression between the brackets must evaluate to true on the same complex attribute Note that _any_ attribute can be multi-valued. Regarding the data model with CRDTs and using a map as the base object, implementing: * simple singular attribute: straightforward (let's forget the data types not natively supported such as floats for now) * complex singular attribute: a map of map * simple multi-value attribute: use of a set * complex multi-valued attribute: that's where I can't find a good solution since sets of maps don't exist in Riak CRDTs (sets can only have string values) For this latter case, I've been thinking of map of maps using random keys: iex(53)> Riak.Search.query("test_index", "my_map_map.attr5_0_map.subattr1_register:*") {:ok, {:search_results, [ {"test_index", [ {"score", "1.00000000000000000000e+00"}, {"_yz_rb", "test_bucket"}, {"_yz_rt", "test"}, {"_yz_rk", "key2"}, {"_yz_id", "1*test*test_bucket*key2*20"}, {"my_map_map.attr1_register", "value1"}, {"my_map_map.attr2_register", "value2"}, {"my_map_map.attr3_register", "value3"}, {"my_map_map.attr4_register", "value4"}, {"my_map_map.attr5_0_map.subattr1_register", "subvalue1"}, {"my_map_map.attr5_0_map.subattr2_register", "subvalue2"}, {"my_map_map.attr5_0_map.subattr3_register", "subvalue3"}, {"my_map_map.attr5_0_map.subattr4_register", "oups !"}, {"my_map_map.attr5_1_map.subattr1_register", "subvalue11"}, {"my_map_map.attr5_1_map.subattr2_register", "subvalue12"}, {"my_map_map.attr5_1_map.subattr3_register", "subvalue13"}, {"my_map_map.attr5_1_map.subattr4_register", "oups !"} ]}, {"test_index", [ {"score", "1.00000000000000000000e+00"}, {"_yz_rb", "test_bucket"}, {"_yz_rt", "test"}, {"_yz_rk", "key1"}, {"_yz_id", "1*test*test_bucket*key1*23"}, {"my_map_map.attr1_register", "value1"}, {"my_map_map.attr2_register", "value2"}, {"my_map_map.attr3_register", "value3"}, {"my_map_map.attr4_register", "value4"}, {"my_map_map.attr5_0_map.subattr1_register", "subvalue1"}, {"my_map_map.attr5_0_map.subattr2_register", "subvalue2"}, {"my_map_map.attr5_0_map.subattr3_register", "subvalue3"}, {"my_map_map.attr5_0_map.subattr4_register", "subvalue4"}, {"my_map_map.attr5_1_map.subattr1_register", "subvalue5"}, {"my_map_map.attr5_1_map.subattr2_register", "subvalue6"}, {"my_map_map.attr5_1_map.subattr3_register", "subvalue7"}, {"my_map_map.attr5_1_map.subattr4_register", "subvalue8"} ]} ], 1.0, 2}} Here the attr5 is multivalued (thanks to _0, _1..., but could be UUIDs as well). However: * How to search for any "attr5_*"? It seems that wildcards in query _attributes_ are not supported (e.g.: Riak.Search.query("test_index", "my_map_map.attr5_*_map.subattr1_register:*") won't work) * Even if that worked, how to filter on one submap? The search "my_map_map.attr5_*_map.subattr1_register:subvalue1 AND my_map_map.attr5_*_map.subattr4_register:subvalue4" would return key1 but not key2? Are some other approaches worth digging? I've thought of: * not storing a whole "user" as an object, but each attribute as an object (key being {id, attribute}). However it doesn't enable multivalued complex attributes neither * using JSON instead of CRDTs: would such search filters be doable? Also, no eventual consistency with this approach since, as far as I understand, Riak cannot "merge" JSON documents Any feedback on how you implement such models is welcome. (Sorry if that's not the right place to ask - if so feel free to tell where it's better to ask.) Cheers, Paul _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com