Re: How is the emphasis of “data over code” different than the XML advocacy of the early 2000s?

Timothy Baldridge Wed, 03 Feb 2016 07:46:10 -0800

I find this subject interesting as I was just discussing this with a
co-worker recently. There's a few points I'd like to make:


Firstly, data is often a form of a DSL (domain specific language).
Libraries like Onyx often contain (as Lucas mentioned) a parser that walks
the data and performs some actions based on that. That's also known as a
evaluator. These libraries also often optimize the data by composing
functions or emitting clojure code, aka...a compiler.

So when we say that something is "fully data driven", we have to realize
that we are in essence writing a language. A language with a familiar
syntax, but a language with different semantics. Documenting those
semantics is critical.

So why not just write in code to begin with? Well often we wish to
pragmatically manipulate the inputs to these libraries before execution. So
we want our language to be in a format that is easy to manipulate. "Why not
lisp code?" you may ask. Well that's often a question about ease of
processing.

This code is easy to read:

(if x :foo :bar)

But this code is easier to process programmatically:

{:op :if
 :children [:test :then :else]
 :test {:op :local :name 'x}
 :then {:op :const :val :foo}
 :else {:op :const :val :bar}}


I never really want to hand-write the latter, but I don't want to write a
program to analyze the former.

So, all that is a round-about way of saying my preferred pattern is the
following:

1) Write my library using functions and immutable data for all inputs,
preferably also without positional arguments, each function takes one or
more maps. Positional arguments are hard to emit programatically.

2) Write helper functions to allow users to construct data for my system
using the APIs from #1, these will basically generate data from positional
arguments.

3) If needed write macros and DSLs to parse/emit data from user-friendly
data inputs, into my data DSL format.

4) If needed, optimize performance by writing DSL "compilers" or emitting
records/protocols.


In short, configure your code with data, make your data palatable with
code.

Timothy


On Wed, Feb 3, 2016 at 2:04 AM, <lucas.bradstr...@onyxplatform.org> wrote:

> Hi Josh,
>
> I am one of the core Onyx developers, so I am biased in some respects. I'm
> going to only speak to specific advantages that code > data gives Onyx.
>
> An advantage with Onyx is the ability to build up your jobs dynamically
> using data that is easily transformable by code, using all of the functions
> that you use in clojure e.g. conj, assoc, update, get, etc. Data in clojure
> is far more easily manipulated by core functions than XML, ensuring that
> you can do things like build up a job from a base system, add arbitrary
> numbers and types of tasks, parameters, lifecycles, and options to your job
> for different purposes.
>
> This ensures that Onyx is very flexible - complex jobs do not have to be
> simply stored in lengthy static EDN files, they can be built by code from
> job to job, depending on your needs. To give an example, imagine a case
> where you wanted to load data from an arbitrary number of queue
> datasources, and an input plugin only allows a single queue name to be read
> from in a single task - you can easily transform your job's workflow and
> catalog to expand out an arbitrary number of tasks to read from these
> queues, annotating the input data with the queue name, all directed at
> another task that you define. If you wish to sometimes add some debugging
> metrics, you can do so by transforming the job, etc. If tasks within a job
> are not the correct level of granularity, you could instead dynamically
> build multiple jobs and submit them all to the cluster.
>
> Mike brings up a good point around performance concerns around data >
> code. With respect to Onyx, the "dataness" of Onyx jobs is very often
> compiled down to records and more performant representations. This ensures
> that the dataness at the user level isn't lost, while ensuring performance
> for the common case. In some ways you can think of the data in the Onyx job
> as the AST for the Onyx job, which is validated, and then compiled for
> performance. It would be quite easily to build code over this data which
> ensured you never had to touch the data that defined the job, especially
> since the core code functionality, a task's onyx/fn, is a plain clojure
> function, operating using whatever Clojure/Java objects you want. We don't
> think this is generally a good idea, but you have the ability should you
> need it.
>
> When these jobs are submitted to the cluster, this data is serialized and
> stored in ZooKeeper, to be read back by the cluster for scheduling
> purposes. This data is human readable when viewed in a dashboard, usable in
> ClojureScript (even allowing jobs to be built and dispatched by web clients
> - at which point you may need a data representation anyway), or
> transformable e.g. if you inspect a previous job's end state and data in
> order to migrate between jobs.
>
> By defining an information model and documentation around the core data
> representation, we can easily present specific documentation to users when
> their jobs fail schema validation for any reason, see
> https://github.com/onyx-platform/onyx/blob/0.8.x/src/onyx/information_model.cljc
> for the model and documentation map that we use for error messages, and how
> we have additionally leveraged this information model to build a
> ClojureScript page that is a handy reference guide for users
> http://www.onyxplatform.org/docs/cheat-sheet/latest/#/trigger-entry.
>
> Hopefully this answers some of your questions around why I like this
> technique for Onyx, even if I didn't answer your overarching question.
>
> Cheers,
>
> Lucas
>
>
> On Tuesday, February 2, 2016 at 6:02:23 AM UTC+8, Josh Tilles wrote:
>>
>> As I’m watching Michael Drogalis’s Clojure/Conj 2015 presentation “Onyx:
>> Distributed Computing for Clojure”
>> <https://youtube.com/watch?v=YlfA8hFs2HY&t=734>, I'm distracted by a
>> nagging worry that we —as a community— are somehow falling into the same
>> trap as the those advocating XML in the early 2000s. That said, it's a very
>> *vague* unease, because I don’t know much about why the industry seems
>> to have rejected XML as “bad”; by the time I started programming
>> professionally there was already a consensus that XML sucked, and that
>> libraries/frameworks that relied heavily on XML configuration files were to
>> be regarded with suspicion and/or distaste.
>>
>> So, am I incorrect in seeing a similarity between the “data > code”
>> mentality and the rise of XML? Or, assuming there is a legitimate
>> parallel, is it perhaps unnecessary to be alarmed? Does the tendency to use
>> edn instead of XML sidestep everything that went wrong in the 2000s? Or is
>> it the case that the widespread backlash against XML threw a baby out with
>> the bathwater, forgetting the advantages of data over code?
>>
>> Cheers,
>> Josh
>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
“One of the main causes of the fall of the Roman Empire was that–lacking
zero–they had no way to indicate successful termination of their C
programs.”
(Robert Firth)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How is the emphasis of “data over code” different than the XML advocacy of the early 2000s?

Reply via email to