Re: Question regarding structural sharing, records, and maps

Trenton Strong Fri, 29 Jul 2011 13:34:14 -0700

On Jul 29, 4:38 am, Ken Wesson <[email protected]> wrote:
> Yes. In fact this should work with records, too -- all the "native"
> fields of the record need to be copied for each version, but most will
> typically be pointers and most of those will typically be to existing
> objects. Only the ones that were changed will point to new nested
> objects. So unless your records have huge numbers of predefined fields
> there shouldn't be much time or memory cost in copying them.


Thanks for clearing that up.  I wasn't exactly sure how the memory
model of records worked.  The semantics that records provide seem like
a good fit for applications like this.

> Probably. Any two changes to the tree will cause one to retry in the
> single-ref case, unless you're sure all the changes will commute.
> (Changes to distinct fields where one change doesn't depend on what
> the other changes will commute, and changes to the same field that
> don't matter as to sequencing, such as inc and dec, will commute, but
> otherwise no.)

Yeah, tree updates are pretty much strictly non-commutative.  The
retry overhead will probably grow quite fast with the number of
concurrent updates.

>
> You will probably want something structured more like a database, with
> "tables" that are maps and "rows" that are records or more complex,
> nested structures, such that most concurrent changes will not affect a
> common row. The "tables" can then be refs of maps of refs, with most
> changes operating on the row-refs. The table refs only need to change
> if rows are inserted or deleted; those could be bottlenecks, though if
> all objects have GUIDs as keys in these tables then row insertions and
> deletions will commute, leaving only the GUID generators for each
> table.

I think I understand the structure here, but just to be clear, you're
proposing flattening
of the tree structure from something like this:

(ref
{
  :contents (some stuff),

  :child-node1 {
    :contents (other things)
    :child-node1 ...
  },

  :child-node2 {
    :contents (thingy1 thingy2)
    ...
  }
  ...
})

into something like this?

(ref
{
  :some-guid  (ref {        <<< could be a map, record, or what have
you
    :contents (some stuff)
    :child-node1 :another-guid
    :child-node2 :and-another-guid
  })

  :another-guid (ref {
    :contents (other things)
    ...
  })

  :and-another-guid (ref {
    :contents (thingy1 thingy2)
    ...
  })
}

Interesting.  I feel like I should have thought of this since I've
done something similar in more object oriented code, but riddled with
lots of fine grained locking.  Here if updating one row references
another
the transaction is going to handle the snapshotting.  I begin to
see...

>
> Unfortunately, (let [new-id (commute table-x-guid-counter inc)])
> sounds nice but might result in new-id taking on the same value in two
> concurrent transactions, so you'll need to use alter and GUID
> generation may be a bottleneck. But it may be much faster than your
> other, more complex transactions. Be sure to do it separately, e.g.:
>
> (let [new-id (dosync (alter table-x-guid-counter inc))]
>   (dosync
>     (commute table-x assoc new-id some-thingy)))
>
> You can even use atoms for the guid counters instead of refs, and then
> the ! in swap! will remind you to get the ID before entering a
> transaction. Do as much of what's needed to generate some-thingy
> before the transaction, too -- though if it depends on other refs'
> values, and those values should be current rather than it being
> acceptable for them to be values that were there recently but might
> have changed, then some of the work will need to be done in the
> transaction.
>
> > I've considered wrapping each node in
> > a ref and performing updates that way, or possibly taking the async
> > route and dispatching cals using agents, but my experience there is
> > nil.  Any insight there?
>
> I'd avoid having refs nesting more than two deep. Complex arrangements
> of cross-references among nodes can create a giant headache in
> combination with refs and transactions, so using a "database" approach
> with "table" maps, ID token keys, and cross-references to ID tokens
> rather than objects is often sensible. In particular, complex objects
> with nested refs effectively have mutable state inside them, which is
> usually not desired in Clojure. (One can argue, though, that the ref
> object is not different in principle from the ID token and removes a
> layer of indirection, and possibly obviates the need to generate IDs
> from counters that act as bottlenecks -- though, really, the Java new
> operator's allocation of memory now takes the role of ID generator.
> I'm given to understand modern VMs do some clever things to make new
> able to be fairly concurrent, though, such as giving each thread its
> own subset of the eden generation to allocate in.)

Makes sense.  I need to look into generating GUIDs quickly in a
concurrent manner for other parts
of the application anyways.  I'm sure if this becomes the bottleneck
there are workarounds like pooling
that can be used to alleviate some of that.

Thanks for taking the time to craft such a thoughtful reply, it is
really helpful.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Question regarding structural sharing, records, and maps

Reply via email to