Re: Modelling complex data structures (graphs and trees for example)

Colin Yates Sat, 09 Jul 2011 08:37:02 -0700

he he :)

Well, conservative might be a run-of-the-mill Java/Spring/Hibernate
application with all of that fun as those are the tools which I am most
familiar with.


I am not going to type another long email, but it is interesting how people
define "risk" and "conservative".  I *do not* think "doing the same thing as
always and hope for the best" is the right answer.  If the question is "give
us something that takes years to develop and can be maintained by a team of
interchangable average devs doing the same old thing" then sure - this is
not the answer - in any way :).

I think the question being asked is "can you provide a solution which allows
us to respond very quickly to changing requirements that will be developed
by yourself and whoever else you think you need" and just maybe this is the
right answer...who knows - it is all an experiment.

I am very fortunate that I am paid to work in an organisation where any
technical solution is evaluated based on "I know how to design, which are
the right tools" rather than the *very* entrenched "I know some tools, what
can I build with them".  It might have something to do with me being the
technical authority here...not sure :)!

On 9 July 2011 16:25, James Keats <james.w.ke...@gmail.com> wrote:

>
>
>
> Well if it's a project you own then you're free to do whatever you
> want, but if you're only an employee then I urge you to consider
> carefully what you're about to do, and be as conservative as you could
> be about it. :-)
>
> On Jul 9, 2:15 pm, Colin Yates <colin.ya...@gmail.com> wrote:
> > I did think about moving this logic to the database, but I am toying
> around
> > with a different model - having the entire data set in memory (possibly
> > across multiple nodes using messaging infrastructure to communicate).
>  The
> > reason for this is:
> >
> >  - writes are very small but reads are very high
> >  - each read typically requires complex processing
> >  - most operations cover a large part of the entire dataset
> >
> > Paying the cost of having the entire data set *efficiently* available for
> > the application (Clojure in this case) means:
> >
> >  - less dependence on (probably hard to test) yet-another-bit-of-tech.
> >  Integration testing DAOs or Repositories always seems like a lot of
> work.
> >  Reducing the technical pieces just makes things much easier
> >  - I am hoping clever use of persistent structures will help here, as
> there
> > is a lot of commonality in the data itself (i.e. 5 projects might
> actually
> > share 80% of the same state).  Clever use in constructing these might pay
> > dividends...
> >  - I don't think I can offload *all* processing onto a third party
> > technology so I need the ability to deal with large data sets in memory
> with
> > real-time (whatever that means) - if I need it for one, I may as well use
> it
> > for all.
> >
> > Ambitious, and full of hairy concerns!  But the idea of moving away from
> > single-threaded web-based applications with big powerful data engines to
> a
> > single chunk of logic that occasionally throws state to a fairly dumb
> > persistent store is certainly not new ground, and seems to offer a much
> more
> > powerful architecture.
> >
> > For example, dealing with historical data is always a pain point.  What I
> > want is the ability to snapshot the entire system whenever anything
> changes,
> > to allow us to see how the system (or client rather) has improved.  In a
> > relational database, this would be ridiculous, so I captured a "snapshot
> of
> > interesting data".  Tomorrow they realise that something else was
> > interesting....  We also played with document stores (MongoDB) which
> makes
> > the job much much smaller - just cloning a single document (and related
> > data), but then it has to be hydrated, so for ease of use a snapshot is
> > taken every X period, even if the data hasn't changed.  Yuck.
> >
> > Now Clojure appears, with its extremely efficient (in terms of memory)
> way
> > of storing data, and suddenly it feels like storing a representation
> every
> > time the structure changes (which is only once or twice a week) and then
> > realising the entire history in memory is now do-able.  This means if a
> > Project only changed 5 times over a 3 month period there would only be 5
> > instances of that project in storage.  Calculating how each project
> > contributes to a historical chart broken down by day (or hour whatever)
> is
> > much much easier to do in Java/Clojure/whatever than third party store of
> > choice.  I am asserting that providing a sequence for a project for every
> > day over the last year when there are only 5 snapshots will certainly not
> > consume sizeOfProject * daysInYear memory.
> >
> > (Not sure that was the best example of the pain points I am trying to
> solve
> > actually :), but anyway).
> >
> > I guess, after 15 years of using the "web, app-logic, database"
> > template-cutter I am giving myself a clean piece of paper and asking
> "what
> > do you want to do and what is the simplest way to do it", and keeping
> > everything in the application layer (rather than the persistence layer)
> > seems appealing.
> >
> > We aren't dealing with billions of rows - I still need to experiment, but
> it
> > feels like having our entire data set in memory is possible on a fairly
> > beefy server.  I appreciate the JVM isn't the best wrt huge heaps, but I
> can
> > work around that (with multiple virtual machines each running their own
> JVM
> > and using ActiveMQ for example).  Clojure's STM seems to be the final
> step
> > on the ladder to reach this goal.
> >
> > I have previously considered CouchDB (for its views), Hadoop (for its
> highly
> > scalable and parallelisable map/reduce execution), Cassandra for its
> ability
> > to store huge amounts of highly nested structure, Neo4j to store large
> > numbers of small nodes that are heavily inter-related.  And of course,
> > MongoDB, which I am currently using in production.  I also considered
> Erlang
> > and Scala for their distributed VM actor models, but I am really really
> sold
> > on the power of LISP macros.
> >
> > I dunno - might be a fool's errand, but spreading the complexity over
> that
> > much technology just seems like hard work.  *If* the working set can be
> > stored in current memory then I think a much simpler, and much more
> powerful
> > solution will emerge.  Sure, I am putting all my eggs in
> > Clojure+my-own-ability, but at risk of re-inventing the wheel, but maybe
> > that is the right thing to do  - building the simplest and most elegant
> > solution with new tools.
> >
> > I probably ate something that disagreed with me, but I just want to break
> > free from the shackles of these heavy-weight tools and fly!  OK - that's
> > enough.
> >
> > Or, it might all be a catastrophic failure and I will be signing up to
> > Careers 2.0 :)
> >
> > Col
> >
> > P.S>  Usual disclaimer - still only written three lines of Clojure :)
> >
> > On 8 July 2011 20:57, James Keats <james.w.ke...@gmail.com> wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > On Jun 16, 3:08 pm, Colin Yates <colin.ya...@gmail.com> wrote:
> > > > (newbie warning)
> >
> > > > Our current solution is an OO implementation in Groovy and Java.  We
> > > > have a (mutable) Project which has a DAG (directed acyclic graph).
> > > > This is stored as a set of nodes and edges.  There are multiple
> > > > implementations of nodes (which may themselves be Projects).  There
> > > > are also multiple implementations of edges.
> >
> > > > My question isn't how to do this in a functional paradigm, my first
> > > > question is *how do I learn* to do this in a functional paradigm.  I
> > > > want to be able to get the answer myself ;).  To that end, are there
> > > > any "domain driven design with functional programming" type
> resources?
> >
> > > > A more specific question is how do I model a graph?  These graphs can
> > > > be quite extensive, with mutations on the individual nodes as well as
> > > > the structure (i.e. adding or removing branches).  Does this mean
> that
> > > > every every node would be a ref?  I think the general answer is that
> > > > the aggregate roots are refs, meaning they are an atomic block, but
> is
> > > > there any more guidance?
> >
> > > May I humbly suggest that this ought to be a database-ish concern
> > > rather than a middleware one? have you looked at neo4j for example? A
> > > quick google found this:
> >
> > >http://wiki.neo4j.org/content/Roles
> >
> > > "This is an implementation of an example found in the article A Model
> > > to Represent Directed Acyclic Graphs (DAG) on SQL Databases by Kemal
> > > Erdogan. ... In Neo4j storing the roles is trivial, as working with
> > > graphs is what Neo4j was designed for"
> >
> > > I would humbly suggest that you use as much of the database
> > > functionality as possible for your data needs and avoid replicating it
> > > in your middleware. I hope this works. :-)
> >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Clojure" group.
> > > To post to this group, send email to clojure@googlegroups.com
> > > Note that posts from new members are moderated - please be patient with
> > > your first post.
> > > To unsubscribe from this group, send email to
> > > clojure+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > >http://groups.google.com/group/clojure?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Modelling complex data structures (graphs and trees for example)

Reply via email to