I did think about moving this logic to the database, but I am toying around with a different model - having the entire data set in memory (possibly across multiple nodes using messaging infrastructure to communicate). The reason for this is:
- writes are very small but reads are very high - each read typically requires complex processing - most operations cover a large part of the entire dataset Paying the cost of having the entire data set *efficiently* available for the application (Clojure in this case) means: - less dependence on (probably hard to test) yet-another-bit-of-tech. Integration testing DAOs or Repositories always seems like a lot of work. Reducing the technical pieces just makes things much easier - I am hoping clever use of persistent structures will help here, as there is a lot of commonality in the data itself (i.e. 5 projects might actually share 80% of the same state). Clever use in constructing these might pay dividends... - I don't think I can offload *all* processing onto a third party technology so I need the ability to deal with large data sets in memory with real-time (whatever that means) - if I need it for one, I may as well use it for all. Ambitious, and full of hairy concerns! But the idea of moving away from single-threaded web-based applications with big powerful data engines to a single chunk of logic that occasionally throws state to a fairly dumb persistent store is certainly not new ground, and seems to offer a much more powerful architecture. For example, dealing with historical data is always a pain point. What I want is the ability to snapshot the entire system whenever anything changes, to allow us to see how the system (or client rather) has improved. In a relational database, this would be ridiculous, so I captured a "snapshot of interesting data". Tomorrow they realise that something else was interesting.... We also played with document stores (MongoDB) which makes the job much much smaller - just cloning a single document (and related data), but then it has to be hydrated, so for ease of use a snapshot is taken every X period, even if the data hasn't changed. Yuck. Now Clojure appears, with its extremely efficient (in terms of memory) way of storing data, and suddenly it feels like storing a representation every time the structure changes (which is only once or twice a week) and then realising the entire history in memory is now do-able. This means if a Project only changed 5 times over a 3 month period there would only be 5 instances of that project in storage. Calculating how each project contributes to a historical chart broken down by day (or hour whatever) is much much easier to do in Java/Clojure/whatever than third party store of choice. I am asserting that providing a sequence for a project for every day over the last year when there are only 5 snapshots will certainly not consume sizeOfProject * daysInYear memory. (Not sure that was the best example of the pain points I am trying to solve actually :), but anyway). I guess, after 15 years of using the "web, app-logic, database" template-cutter I am giving myself a clean piece of paper and asking "what do you want to do and what is the simplest way to do it", and keeping everything in the application layer (rather than the persistence layer) seems appealing. We aren't dealing with billions of rows - I still need to experiment, but it feels like having our entire data set in memory is possible on a fairly beefy server. I appreciate the JVM isn't the best wrt huge heaps, but I can work around that (with multiple virtual machines each running their own JVM and using ActiveMQ for example). Clojure's STM seems to be the final step on the ladder to reach this goal. I have previously considered CouchDB (for its views), Hadoop (for its highly scalable and parallelisable map/reduce execution), Cassandra for its ability to store huge amounts of highly nested structure, Neo4j to store large numbers of small nodes that are heavily inter-related. And of course, MongoDB, which I am currently using in production. I also considered Erlang and Scala for their distributed VM actor models, but I am really really sold on the power of LISP macros. I dunno - might be a fool's errand, but spreading the complexity over that much technology just seems like hard work. *If* the working set can be stored in current memory then I think a much simpler, and much more powerful solution will emerge. Sure, I am putting all my eggs in Clojure+my-own-ability, but at risk of re-inventing the wheel, but maybe that is the right thing to do - building the simplest and most elegant solution with new tools. I probably ate something that disagreed with me, but I just want to break free from the shackles of these heavy-weight tools and fly! OK - that's enough. Or, it might all be a catastrophic failure and I will be signing up to Careers 2.0 :) Col P.S> Usual disclaimer - still only written three lines of Clojure :) On 8 July 2011 20:57, James Keats <james.w.ke...@gmail.com> wrote: > > > On Jun 16, 3:08 pm, Colin Yates <colin.ya...@gmail.com> wrote: > > (newbie warning) > > > > Our current solution is an OO implementation in Groovy and Java. We > > have a (mutable) Project which has a DAG (directed acyclic graph). > > This is stored as a set of nodes and edges. There are multiple > > implementations of nodes (which may themselves be Projects). There > > are also multiple implementations of edges. > > > > My question isn't how to do this in a functional paradigm, my first > > question is *how do I learn* to do this in a functional paradigm. I > > want to be able to get the answer myself ;). To that end, are there > > any "domain driven design with functional programming" type resources? > > > > A more specific question is how do I model a graph? These graphs can > > be quite extensive, with mutations on the individual nodes as well as > > the structure (i.e. adding or removing branches). Does this mean that > > every every node would be a ref? I think the general answer is that > > the aggregate roots are refs, meaning they are an atomic block, but is > > there any more guidance? > > May I humbly suggest that this ought to be a database-ish concern > rather than a middleware one? have you looked at neo4j for example? A > quick google found this: > > http://wiki.neo4j.org/content/Roles > > "This is an implementation of an example found in the article A Model > to Represent Directed Acyclic Graphs (DAG) on SQL Databases by Kemal > Erdogan. ... In Neo4j storing the roles is trivial, as working with > graphs is what Neo4j was designed for" > > I would humbly suggest that you use as much of the database > functionality as possible for your data needs and avoid replicating it > in your middleware. I hope this works. :-) > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en