Thanks for the pointer. Datomic is definitely on my short list on the persistence side of things. My workflow is unfortunately fairly varied; some longer-term batch jobs, and some very-soft-realtime jobs (seconds, not milliseconds).
With a larger dataset (multi-terabyte, maybe) something like Hadoop (/Cascalog?) + HBase would be a natural fit, but when you're just around a terabyte or so it's a bit more ambiguous. On Monday, August 20, 2012 8:55:46 PM UTC-4, ronen wrote: > > Terabyte size and chain of dependent tasks might hint toward > Cascalog<https://github.com/nathanmarz/cascalog/wiki> this assumes that > your doing batch job processing (on top of hadoop) > > If you need a more soft real time datalog based query then I would check > datomic <http://www.datomic.com/> although from your description is > sounds less so. > > Ronen > > On Tuesday, August 21, 2012 3:14:23 AM UTC+3, Leif wrote: >> >> +1. I know of a couple tools in python for this purpose that are called >> "workflow management systems." It would be good to know if there is a >> robust one in clojure. >> >> On Monday, August 20, 2012 12:18:54 AM UTC-4, matt hoffman wrote: >>> >>> I have a problem that I'm trying to figure out how to tackle. I'm new to >>> Clojure, but I'm interested, and perhaps this will be my excuse to give it >>> a try. Any of the following answers would help: >>> "What you're describing really sounds like X" >>> "You could think of that problem like this, instead" >>> "You may want to search for term 'Y'...it sounds related" (I imagine I'm >>> probably describing some well-established domain...I just don't know the >>> right terms to search for) >>> >>> So, the problem: >>> I have an app that is in production doing some fairly complex >>> calculations on large-ish (terabyte-range) amounts of data. The >>> calculations are expressed as chains of dependent tasks, where each tasks >>> can have a number of inputs and outputs. But the code has become hard to >>> maintain, full of accidental complexity and very difficult for newer >>> developers to understand. So, I'm trying to find the right abstractions to >>> put in place to keep things simple. >>> One of the sources of complexity is the intermingling of code involving >>> loading data, dividing up data to be executed in parallel, processing data, >>> persisting data, and handling the execution flow on an individual datum >>> (configuring pipelines of components,etc.) I'd like to keep the functions >>> pure and push the other concerns off to a framework -- and, ideally, not >>> have to write that framework. >>> >>> So I think my problem statement is this: >>> I'd like to be able to define functions that specify, somehow, what >>> input they want, and perhaps what output they produce. Then I'd like to >>> push the concern of how those inputs are calculated -- loaded from a db, >>> calculated from source data -- off on some other party. >>> >>> For example, if I define a function that requires "foo", and I call that >>> function without providing "foo", I'd like for _something_ to step in and >>> say, "Ok, you require foo. I have this function over here that produces >>> foo. Let me call that for you, then hand you the output." Perhaps instead >>> of a framework that transparently looks up and executes that function and >>> provides a Future for the result, perhaps I can explicitly build a >>> dependency graph up-front containing all the functions required to produce >>> the end result, and then execute them all in order... I think the effect is >>> the same. >>> >>> From a bit of searching I've done today, dataflow programming like >>> clojure.contrib.dataflow sounds like it might be close to what I'm looking >>> for, but I'd love to hear ideas. Am I describing something that already >>> exists? Would this actually be simpler than it seems using some clever >>> macros? Are there some keywords I should search for to get started? Or >>> perhaps I'm coming at this problem wrong, and I should think about it a >>> different way... >>> >>> -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en