Re: Modelling complex data structures (graphs and trees for example)

James Keats Sat, 09 Jul 2011 08:26:07 -0700

Well if it's a project you own then you're free to do whatever you
want, but if you're only an employee then I urge you to consider
carefully what you're about to do, and be as conservative as you could
be about it. :-)

On Jul 9, 2:15 pm, Colin Yates <[email protected]> wrote:
> I did think about moving this logic to the database, but I am toying around
> with a different model - having the entire data set in memory (possibly
> across multiple nodes using messaging infrastructure to communicate).  The
> reason for this is:
>
>  - writes are very small but reads are very high
>  - each read typically requires complex processing
>  - most operations cover a large part of the entire dataset
>
> Paying the cost of having the entire data set *efficiently* available for
> the application (Clojure in this case) means:
>
>  - less dependence on (probably hard to test) yet-another-bit-of-tech.
>  Integration testing DAOs or Repositories always seems like a lot of work.
>  Reducing the technical pieces just makes things much easier
>  - I am hoping clever use of persistent structures will help here, as there
> is a lot of commonality in the data itself (i.e. 5 projects might actually
> share 80% of the same state).  Clever use in constructing these might pay
> dividends...
>  - I don't think I can offload *all* processing onto a third party
> technology so I need the ability to deal with large data sets in memory with
> real-time (whatever that means) - if I need it for one, I may as well use it
> for all.
>
> Ambitious, and full of hairy concerns!  But the idea of moving away from
> single-threaded web-based applications with big powerful data engines to a
> single chunk of logic that occasionally throws state to a fairly dumb
> persistent store is certainly not new ground, and seems to offer a much more
> powerful architecture.
>
> For example, dealing with historical data is always a pain point.  What I
> want is the ability to snapshot the entire system whenever anything changes,
> to allow us to see how the system (or client rather) has improved.  In a
> relational database, this would be ridiculous, so I captured a "snapshot of
> interesting data".  Tomorrow they realise that something else was
> interesting....  We also played with document stores (MongoDB) which makes
> the job much much smaller - just cloning a single document (and related
> data), but then it has to be hydrated, so for ease of use a snapshot is
> taken every X period, even if the data hasn't changed.  Yuck.
>
> Now Clojure appears, with its extremely efficient (in terms of memory) way
> of storing data, and suddenly it feels like storing a representation every
> time the structure changes (which is only once or twice a week) and then
> realising the entire history in memory is now do-able.  This means if a
> Project only changed 5 times over a 3 month period there would only be 5
> instances of that project in storage.  Calculating how each project
> contributes to a historical chart broken down by day (or hour whatever) is
> much much easier to do in Java/Clojure/whatever than third party store of
> choice.  I am asserting that providing a sequence for a project for every
> day over the last year when there are only 5 snapshots will certainly not
> consume sizeOfProject * daysInYear memory.
>
> (Not sure that was the best example of the pain points I am trying to solve
> actually :), but anyway).
>
> I guess, after 15 years of using the "web, app-logic, database"
> template-cutter I am giving myself a clean piece of paper and asking "what
> do you want to do and what is the simplest way to do it", and keeping
> everything in the application layer (rather than the persistence layer)
> seems appealing.
>
> We aren't dealing with billions of rows - I still need to experiment, but it
> feels like having our entire data set in memory is possible on a fairly
> beefy server.  I appreciate the JVM isn't the best wrt huge heaps, but I can
> work around that (with multiple virtual machines each running their own JVM
> and using ActiveMQ for example).  Clojure's STM seems to be the final step
> on the ladder to reach this goal.
>
> I have previously considered CouchDB (for its views), Hadoop (for its highly
> scalable and parallelisable map/reduce execution), Cassandra for its ability
> to store huge amounts of highly nested structure, Neo4j to store large
> numbers of small nodes that are heavily inter-related.  And of course,
> MongoDB, which I am currently using in production.  I also considered Erlang
> and Scala for their distributed VM actor models, but I am really really sold
> on the power of LISP macros.
>
> I dunno - might be a fool's errand, but spreading the complexity over that
> much technology just seems like hard work.  *If* the working set can be
> stored in current memory then I think a much simpler, and much more powerful
> solution will emerge.  Sure, I am putting all my eggs in
> Clojure+my-own-ability, but at risk of re-inventing the wheel, but maybe
> that is the right thing to do  - building the simplest and most elegant
> solution with new tools.
>
> I probably ate something that disagreed with me, but I just want to break
> free from the shackles of these heavy-weight tools and fly!  OK - that's
> enough.
>
> Or, it might all be a catastrophic failure and I will be signing up to
> Careers 2.0 :)
>
> Col
>
> P.S>  Usual disclaimer - still only written three lines of Clojure :)
>
> On 8 July 2011 20:57, James Keats <[email protected]> wrote:
>
>
>
>
>
>
>
>
>
> > On Jun 16, 3:08 pm, Colin Yates <[email protected]> wrote:
> > > (newbie warning)
>
> > > Our current solution is an OO implementation in Groovy and Java.  We
> > > have a (mutable) Project which has a DAG (directed acyclic graph).
> > > This is stored as a set of nodes and edges.  There are multiple
> > > implementations of nodes (which may themselves be Projects).  There
> > > are also multiple implementations of edges.
>
> > > My question isn't how to do this in a functional paradigm, my first
> > > question is *how do I learn* to do this in a functional paradigm.  I
> > > want to be able to get the answer myself ;).  To that end, are there
> > > any "domain driven design with functional programming" type resources?
>
> > > A more specific question is how do I model a graph?  These graphs can
> > > be quite extensive, with mutations on the individual nodes as well as
> > > the structure (i.e. adding or removing branches).  Does this mean that
> > > every every node would be a ref?  I think the general answer is that
> > > the aggregate roots are refs, meaning they are an atomic block, but is
> > > there any more guidance?
>
> > May I humbly suggest that this ought to be a database-ish concern
> > rather than a middleware one? have you looked at neo4j for example? A
> > quick google found this:
>
> >http://wiki.neo4j.org/content/Roles
>
> > "This is an implementation of an example found in the article A Model
> > to Represent Directed Acyclic Graphs (DAG) on SQL Databases by Kemal
> > Erdogan. ... In Neo4j storing the roles is trivial, as working with
> > graphs is what Neo4j was designed for"
>
> > I would humbly suggest that you use as much of the database
> > functionality as possible for your data needs and avoid replicating it
> > in your middleware. I hope this works. :-)
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Clojure" group.
> > To post to this group, send email to [email protected]
> > Note that posts from new members are moderated - please be patient with
> > your first post.
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
Re: Modelling complex data structures (graphs and trees for example)

Reply via email to