Hi everybody,

I'm implementing SCIM (System for Cross-domain Identity Management -
https://tools.ietf.org/html/rfc7643 &
https://tools.ietf.org/html/rfc7644) with several backends and I'm
scratching my head trying to figure out how to model the data model in
Riak.

There are 2 types of attributes:
* simple attributes: scalars, such as boolean, string, date, integer, float
* complex attributes: a map of simple attributes

Attributes (simple _and_ complex)  can be:
* singular (0 or 1 occurence)
* muti-valued

Example of a "User" object:
{
  "userName": "bjen...@example.com",
  "name": {
    "formatted": "Ms. Barbara J Jensen, III",
    "familyName": "Jensen",
    "givenName": "Barbara",
    "middleName": "Jane",
    "honorificPrefix": "Ms.",
    "honorificSuffix": "III"
  },
  "access_rights": [
    "scope_a",
    "scope_b",
    "scope_c"
  ]
  "emails": [
    {
      "value": "bjen...@example.com",
      "type": "work",
      "primary": true
    },
    {
      "value": "b...@jensen.org",
      "type": "home"
    }
  ],
}
With respectively a simple singular attribute, a complex singular
attribute, a simple multi-valued attribute and a complex multi-value
attribute. SCIM prohibits further nesting of objects (therefore the
deepest level is the subattribute).

I'd like to implement a Riak backend using the core CRDTS, and also to
implement SCIM "search"
(https://tools.ietf.org/html/rfc7644#section-3.4.2). In a nutshell,
SCIM search allows:
* comparing equality, inequality, greater/lower than, attribute is
present, starts/ends with (for strings), contains (for strings)
* and, or and not operands
* matching for attribute and sub-attributes (e.g.: name.familyName co
"O'Malley" - co is for "contains")
* complex attribute filter grouping (e.g.: userType eq "Employee" and
emails[type eq "work" and value co "@example.com"] - eq is for
"equal"). In this case the expression between the brackets must
evaluate to true on the same complex attribute
Note that _any_ attribute can be multi-valued.

Regarding the data model with CRDTs and using a map as the base
object, implementing:
* simple singular attribute: straightforward (let's forget the data
types not natively supported such as floats for now)
* complex singular attribute: a map of map
* simple multi-value attribute: use of a set
* complex multi-valued attribute: that's where I can't find a good
solution since sets of maps don't exist in Riak CRDTs (sets can only
have string values)

For this latter case, I've been thinking of map of maps using random keys:
iex(53)> Riak.Search.query("test_index",
"my_map_map.attr5_0_map.subattr1_register:*")
{:ok,
 {:search_results,
  [
    {"test_index",
     [
       {"score", "1.00000000000000000000e+00"},
       {"_yz_rb", "test_bucket"},
       {"_yz_rt", "test"},
       {"_yz_rk", "key2"},
       {"_yz_id", "1*test*test_bucket*key2*20"},
       {"my_map_map.attr1_register", "value1"},
       {"my_map_map.attr2_register", "value2"},
       {"my_map_map.attr3_register", "value3"},
       {"my_map_map.attr4_register", "value4"},
       {"my_map_map.attr5_0_map.subattr1_register", "subvalue1"},
       {"my_map_map.attr5_0_map.subattr2_register", "subvalue2"},
       {"my_map_map.attr5_0_map.subattr3_register", "subvalue3"},
       {"my_map_map.attr5_0_map.subattr4_register", "oups !"},
       {"my_map_map.attr5_1_map.subattr1_register", "subvalue11"},
       {"my_map_map.attr5_1_map.subattr2_register", "subvalue12"},
       {"my_map_map.attr5_1_map.subattr3_register", "subvalue13"},
       {"my_map_map.attr5_1_map.subattr4_register", "oups !"}
     ]},
    {"test_index",
     [
       {"score", "1.00000000000000000000e+00"},
       {"_yz_rb", "test_bucket"},
       {"_yz_rt", "test"},
       {"_yz_rk", "key1"},
       {"_yz_id", "1*test*test_bucket*key1*23"},
       {"my_map_map.attr1_register", "value1"},
       {"my_map_map.attr2_register", "value2"},
       {"my_map_map.attr3_register", "value3"},
       {"my_map_map.attr4_register", "value4"},
       {"my_map_map.attr5_0_map.subattr1_register", "subvalue1"},
       {"my_map_map.attr5_0_map.subattr2_register", "subvalue2"},
       {"my_map_map.attr5_0_map.subattr3_register", "subvalue3"},
       {"my_map_map.attr5_0_map.subattr4_register", "subvalue4"},
       {"my_map_map.attr5_1_map.subattr1_register", "subvalue5"},
       {"my_map_map.attr5_1_map.subattr2_register", "subvalue6"},
       {"my_map_map.attr5_1_map.subattr3_register", "subvalue7"},
       {"my_map_map.attr5_1_map.subattr4_register", "subvalue8"}
     ]}
  ], 1.0, 2}}

Here the attr5 is multivalued (thanks to _0, _1..., but could be UUIDs
as well). However:
* How to search for any "attr5_*"? It seems that wildcards in query
_attributes_  are not supported (e.g.: Riak.Search.query("test_index",
"my_map_map.attr5_*_map.subattr1_register:*") won't work)
* Even if that worked, how to filter on one submap? The search
"my_map_map.attr5_*_map.subattr1_register:subvalue1 AND
my_map_map.attr5_*_map.subattr4_register:subvalue4" would return key1
but not key2?

Are some other approaches worth digging? I've thought of:
* not storing a whole "user" as an object, but each attribute as an
object (key being {id, attribute}). However it doesn't enable
multivalued complex attributes neither
* using JSON instead of CRDTs: would such search filters be doable?
Also, no eventual consistency with this approach since, as far as I
understand, Riak cannot "merge" JSON documents

Any feedback on how you implement such models is welcome.

(Sorry if that's not the right place to ask - if so feel free to tell
where it's better to ask.)

Cheers,

Paul

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to