Re: Good resources on dataflow based programming

Daniel Kersten Thu, 26 Dec 2013 07:35:10 -0800

*"Given an infinite number of cores, the time to process a set of dataflow
functions is equivalent to the the time that the longest function took to
do its processing."*


It sounds like you've just discovered Amdahls Law :-D
https://en.wikipedia.org/wiki/Amdahl%27s_law

As for the articles, the hierarchies one is interesting to me mainly
because a dataflow network is basically a flat hierarchy: the network will
still run if you remove, exchange or add to parts of it. I noticed this a
lot when I did some Max/MSP development and it changed how I approached
problems. It was a very exploratory approach, but the difference to using a
REPL is that its easy to change code deeply integrated in other code and
re-evaluate whichever parts you want to test (actually, I do this in
Clojure by evaluating my code from my editor rather than in a REPL, but it
still feels different from rewiring a visual language).

In Pedestal, you can see this when you remove, exchange or add, eg,
transform functions - this does not affect the rest of the dataflow at all
as they are independent and isolated. I find this a very powerful and
pleasant way to program, which is why I linked that article.

The other article obviously doesn't apply in pedestal-app's because
javascript cannot run the code in parallel, but I found that conceptually
it has helped me understand how and why to isolate code and how it all fits
together. I also envision that the pedestal dataflow system will eventually
become part of pedestal-service (they ported it to use core.async, I'm
assuming this was to make this a possibility), in which case, it actually
can run in parallel and pipelined.

I hope the articles made things clearer and not more confusing!


On 25 December 2013 02:40, Stephen Cagle <same...@gmail.com> wrote:

> Just a quick thought I had as I was walking home.
>
> Given an infinite number of cores, the time to process a set of dataflow
> functions is equivalent to the the time that the longest function took to
> do its processing. The efficiency is the (sum of time that all the dataflow
> functions took) / ( (count of the dataflow functions) * (the time of the
> longest running dataflow function) ). Given this, optimization is really
> simple. Take the longest running dataflow function, and see if you can
> somehow split it into smaller functions. Nothing profound here, but I
> thought it was interesting how "evident" optimizations might be when you
> use a dataflow processing model.
>
> On Tuesday, December 24, 2013 3:50:43 PM UTC-8, Stephen Cagle wrote:
>>
>> One thing that I am seeing on a re-read is that I conflated the notion of
>> the data flow function and the paths. I was sort of thinking that the data
>> flow functions "sit" at a particular path location. Similar to how a value
>> "sits" in a location in memory. It is more appropriate to say that the data
>> flow function is associated (referred isn't quite right) with a particular
>> path location.
>>
>> Taking this a bit further, a data flow function DOES NOT actually know
>> who its inputs are. It only knows that it takes input values in/of a
>> certain form. The schema that specifies how one set of paths map to another
>> path are possibly separate from the schema that specifies which dataflow
>> function is associated with which path.
>>
>> On Tuesday, December 24, 2013 1:36:16 PM UTC-8, Stephen Cagle wrote:
>>>
>>> Thank you. I only read the last two articles so far; some notes.
>>>
>>> http://my.opera.com/Vorlath/blog/2008/01/06/simple-
>>> example-of-the-difference-between-imperative-functional-and-data-flow
>>>
>>> I realized that I really wasn't getting what dataflow was about. I was
>>> viewing dataflow paths as a sort of hook that I could hang values on. I had
>>> never externalized the fact that each path only refers to the things it
>>> inputs upon. Specifically, I was modeling some of the patterns like path A
>>> passes a value to B, B does some computation, and puts the result on path
>>> A. This isn't neccessarily wrong, but it appears that I was using dataflow
>>> paths in a way similar to function evaluation.
>>>
>>> He makes a big deal out of not needing to know where his inputs come
>>> from, as well as not needing to invoke a function to create his inputs, but
>>> it still seems he must have a reference to his inputs.
>>>
>>> The effectively automatic parallellization of the code is pretty neat.
>>> Not so much because it is parallelized (which can be done in other
>>> systems), but mostly because it required no forethought or synchronization.
>>> It is automatically parallelized and pipelined. Neato. Of course, we aren't
>>> going to get much of that in js without some work.
>>>
>>> The second to last paragraph was another head turner. I had previously
>>> viewed every dataflow node/path as a loop that just waits for a change in
>>> its inputs and computes a new value when one of them changes. However, in
>>> his system recursive calls are also parallelized and pipelined. I am not
>>> quite sure how this would be implemented, but it seems neat. Again though,
>>> probably not relevant to js.
>>>
>>> http://my.opera.com/Vorlath/blog/2008/07/25/hierarchies-and-equivalence
>>>
>>> This one is more of a philosophy/notice of intent piece. This one should
>>> probably be read after the previous article. If I hadn't read the previous
>>> article, I would understand the words he is saying, but not their
>>> implications. There were many parts in this article that I disagreed with,
>>> but it seems wasteful to argue over individual sentences.
>>>
>>> Honestly, too many ideas came at me too fast. I would have to re-read at
>>> a later point to have much of an opinion. Having said that, the content may
>>> be good (or not), but I am not sure this article would change any opinions
>>> unless the person was already predisposed.
>>>
>>> On Tuesday, December 24, 2013 9:39:58 AM UTC-8, Dan Kersten wrote:
>>>>
>>>> Here's some resources to get you started learning about dataflow as a
>>>> paradigm. From this you should be able to figure out how Pedestal's
>>>> dataflow system fits in.
>>>>
>>>> A list of existing dataflow languages and systems:
>>>> http://stackoverflow.com/questions/461796/dataflow-
>>>> programming-languages/949771#949771 I would suggest looking into a few
>>>> of these to learn the concepts. I'd also suggest trying out one or more of
>>>> the visual dataflow languages to get a feel for how problems can be solved
>>>> in this paradigm and how non-pure functions fit in. I've personally used
>>>> both Max/MSP and Synthmaker and found the concepts really clicked after
>>>> using them for a while.
>>>>
>>>> A work in progress book on which you may also want to keep an eye:
>>>> http://dataflowbook.com/cms/
>>>>
>>>> Also, if you can ignore the tone, this article has helped me better
>>>> understand how code fits together in a dataflow system (a flat hierarchy):
>>>> http://my.opera.com/Vorlath/blog/2008/07/25/hierarchies-and-equivalence
>>>>  and this one talks about how dataflow differs from imperative
>>>> programming: http://my.opera.com/Vorlath/blog/2008/01/06/
>>>> simple-example-of-the-difference-between-imperative-
>>>> functional-and-data-flow But be warned that the author of those is
>>>> very opinionated and feels that he is writing about the "one true way" ;-)
>>>>
>>>>
>>>>
>>>> On 23 December 2013 19:32, Stephen Cagle <sam...@gmail.com> wrote:
>>>>
>>>>> < Cross posted from pedestal-users group, as many people who are not
>>>>> using pedestal may still know about dataflow, terms like effect are used 
>>>>> in
>>>>> the context of pedestal >
>>>>>
>>>>> Pedestal seems strongly based on dataflow based programming. The
>>>>> gigantic tutorial just sort of jumps in on it.
>>>>>
>>>>> Based on my usage thus far, dataflow seems really good at modeling
>>>>> problems that are self contained. In their pure forms (no external inputs
>>>>> or internal effects), dataflow seems almost like a circuit diagram.
>>>>>
>>>>> However, I quickly get confused once we start dealing with effects.
>>>>> Seems that there is a large "Here be Dragons" area of code in that region
>>>>> (services.cljs).
>>>>>
>>>>> I feel that this may not be dataflow's fault; I just haven't got my
>>>>> head around it. When I look at dataflow, I feel like it is constraining me
>>>>> to a particular way of solving a problem. The upside of this is that I 
>>>>> have
>>>>> made my logic declarative and potentially easier to reason about. This is 
>>>>> a
>>>>> trade-offs that seems similar to the trade-offs one makes when moving from
>>>>> a mutable procedural programming model to a immutable functional model. I
>>>>> have yet to personally get substantial benefit from dataflow, but that 
>>>>> does
>>>>> not mean I will not with more mastery.
>>>>>
>>>>> I am wondering if there are any any getting started guides for
>>>>> dataflow programming that you (the community) would recommend. I would be
>>>>> especially interested in "recipe books" for dataflow based programming. 
>>>>> How
>>>>> do you really do asynchronous processing with dataflow? What if your 
>>>>> asyncs
>>>>> may return in a random order but must be processed in a specified order? A
>>>>> few books/articles/whatever on how experts think through these problems
>>>>> could be quite beneficial.
>>>>>
>>>>> --
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To post to this group, send email to clo...@googlegroups.com
>>>>> Note that posts from new members are moderated - please be patient
>>>>> with your first post.
>>>>> To unsubscribe from this group, send email to
>>>>> clojure+u...@googlegroups.com
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/clojure?hl=en
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to clojure+u...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>
>>>>  --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Good resources on dataflow based programming

Reply via email to