Re: How is the emphasis of “data over code” different than the XML advocacy of the early 2000s?

lucas . bradstreet Wed, 03 Feb 2016 01:05:18 -0800

Hi Josh,

I am one of the core Onyx developers, so I am biased in some respects. I'm 
going to only speak to specific advantages that code > data gives Onyx.

An advantage with Onyx is the ability to build up your jobs dynamically 
using data that is easily transformable by code, using all of the functions 
that you use in clojure e.g. conj, assoc, update, get, etc. Data in clojure 
is far more easily manipulated by core functions than XML, ensuring that 
you can do things like build up a job from a base system, add arbitrary 
numbers and types of tasks, parameters, lifecycles, and options to your job 
for different purposes.

This ensures that Onyx is very flexible - complex jobs do not have to be 
simply stored in lengthy static EDN files, they can be built by code from 
job to job, depending on your needs. To give an example, imagine a case 
where you wanted to load data from an arbitrary number of queue 
datasources, and an input plugin only allows a single queue name to be read 
from in a single task - you can easily transform your job's workflow and 
catalog to expand out an arbitrary number of tasks to read from these 
queues, annotating the input data with the queue name, all directed at 
another task that you define. If you wish to sometimes add some debugging 
metrics, you can do so by transforming the job, etc. If tasks within a job 
are not the correct level of granularity, you could instead dynamically 
build multiple jobs and submit them all to the cluster.

Mike brings up a good point around performance concerns around data > code. 
With respect to Onyx, the "dataness" of Onyx jobs is very often compiled 
down to records and more performant representations. This ensures that the 
dataness at the user level isn't lost, while ensuring performance for the 
common case. In some ways you can think of the data in the Onyx job as the 
AST for the Onyx job, which is validated, and then compiled for 
performance. It would be quite easily to build code over this data which 
ensured you never had to touch the data that defined the job, especially 
since the core code functionality, a task's onyx/fn, is a plain clojure 
function, operating using whatever Clojure/Java objects you want. We don't 
think this is generally a good idea, but you have the ability should you 
need it.

When these jobs are submitted to the cluster, this data is serialized and 
stored in ZooKeeper, to be read back by the cluster for scheduling 
purposes. This data is human readable when viewed in a dashboard, usable in 
ClojureScript (even allowing jobs to be built and dispatched by web clients 
- at which point you may need a data representation anyway), or 
transformable e.g. if you inspect a previous job's end state and data in 
order to migrate between jobs.

By defining an information model and documentation around the core data 
representation, we can easily present specific documentation to users when 
their jobs fail schema validation for any reason, see 
https://github.com/onyx-platform/onyx/blob/0.8.x/src/onyx/information_model.cljc

for the model and documentation map that we use for error messages, and how 
we have additionally leveraged this information model to build a 
ClojureScript page that is a handy reference guide for users 
http://www.onyxplatform.org/docs/cheat-sheet/latest/#/trigger-entry.

Hopefully this answers some of your questions around why I like this 
technique for Onyx, even if I didn't answer your overarching question.

Cheers,

Lucas

On Tuesday, February 2, 2016 at 6:02:23 AM UTC+8, Josh Tilles wrote:
>
> As I’m watching Michael Drogalis’s Clojure/Conj 2015 presentation “Onyx: 
> Distributed Computing for Clojure” 
> <https://youtube.com/watch?v=YlfA8hFs2HY&t=734>, I'm distracted by a 
> nagging worry that we —as a community— are somehow falling into the same 
> trap as the those advocating XML in the early 2000s. That said, it's a very 
> *vague* unease, because I don’t know much about why the industry seems to 
> have rejected XML as “bad”; by the time I started programming 
> professionally there was already a consensus that XML sucked, and that 
> libraries/frameworks that relied heavily on XML configuration files were to 
> be regarded with suspicion and/or distaste.
>
> So, am I incorrect in seeing a similarity between the “data > code” 
> mentality and the rise of XML? Or, assuming there is a legitimate 
> parallel, is it perhaps unnecessary to be alarmed? Does the tendency to use 
> edn instead of XML sidestep everything that went wrong in the 2000s? Or is 
> it the case that the widespread backlash against XML threw a baby out with 
> the bathwater, forgetting the advantages of data over code?
>
> Cheers,
> Josh
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How is the emphasis of “data over code” different than the XML advocacy of the early 2000s?

Reply via email to