Re: How is the emphasis of “data over code” different than the XML advocacy of the early 2000s?

Alan Thompson Wed, 03 Feb 2016 09:36:52 -0800

Very good points, Timothy!

On Wed, Feb 3, 2016 at 7:45 AM, Timothy Baldridge <tbaldri...@gmail.com>
wrote:


> I find this subject interesting as I was just discussing this with a
> co-worker recently. There's a few points I'd like to make:
>
> Firstly, data is often a form of a DSL (domain specific language).
> Libraries like Onyx often contain (as Lucas mentioned) a parser that walks
> the data and performs some actions based on that. That's also known as a
> evaluator. These libraries also often optimize the data by composing
> functions or emitting clojure code, aka...a compiler.
>
> So when we say that something is "fully data driven", we have to realize
> that we are in essence writing a language. A language with a familiar
> syntax, but a language with different semantics. Documenting those
> semantics is critical.
>
> So why not just write in code to begin with? Well often we wish to
> pragmatically manipulate the inputs to these libraries before execution. So
> we want our language to be in a format that is easy to manipulate. "Why not
> lisp code?" you may ask. Well that's often a question about ease of
> processing.
>
> This code is easy to read:
>
> (if x :foo :bar)
>
> But this code is easier to process programmatically:
>
> {:op :if
>  :children [:test :then :else]
>  :test {:op :local :name 'x}
>  :then {:op :const :val :foo}
>  :else {:op :const :val :bar}}
>
>
> I never really want to hand-write the latter, but I don't want to write a
> program to analyze the former.
>
> So, all that is a round-about way of saying my preferred pattern is the
> following:
>
> 1) Write my library using functions and immutable data for all inputs,
> preferably also without positional arguments, each function takes one or
> more maps. Positional arguments are hard to emit programatically.
>
> 2) Write helper functions to allow users to construct data for my system
> using the APIs from #1, these will basically generate data from positional
> arguments.
>
> 3) If needed write macros and DSLs to parse/emit data from user-friendly
> data inputs, into my data DSL format.
>
> 4) If needed, optimize performance by writing DSL "compilers" or emitting
> records/protocols.
>
>
> In short, configure your code with data, make your data palatable with
> code.
>
> Timothy
>
>
> On Wed, Feb 3, 2016 at 2:04 AM, <lucas.bradstr...@onyxplatform.org> wrote:
>
>> Hi Josh,
>>
>> I am one of the core Onyx developers, so I am biased in some respects.
>> I'm going to only speak to specific advantages that code > data gives Onyx.
>>
>> An advantage with Onyx is the ability to build up your jobs dynamically
>> using data that is easily transformable by code, using all of the functions
>> that you use in clojure e.g. conj, assoc, update, get, etc. Data in clojure
>> is far more easily manipulated by core functions than XML, ensuring that
>> you can do things like build up a job from a base system, add arbitrary
>> numbers and types of tasks, parameters, lifecycles, and options to your job
>> for different purposes.
>>
>> This ensures that Onyx is very flexible - complex jobs do not have to be
>> simply stored in lengthy static EDN files, they can be built by code from
>> job to job, depending on your needs. To give an example, imagine a case
>> where you wanted to load data from an arbitrary number of queue
>> datasources, and an input plugin only allows a single queue name to be read
>> from in a single task - you can easily transform your job's workflow and
>> catalog to expand out an arbitrary number of tasks to read from these
>> queues, annotating the input data with the queue name, all directed at
>> another task that you define. If you wish to sometimes add some debugging
>> metrics, you can do so by transforming the job, etc. If tasks within a job
>> are not the correct level of granularity, you could instead dynamically
>> build multiple jobs and submit them all to the cluster.
>>
>> Mike brings up a good point around performance concerns around data >
>> code. With respect to Onyx, the "dataness" of Onyx jobs is very often
>> compiled down to records and more performant representations. This ensures
>> that the dataness at the user level isn't lost, while ensuring performance
>> for the common case. In some ways you can think of the data in the Onyx job
>> as the AST for the Onyx job, which is validated, and then compiled for
>> performance. It would be quite easily to build code over this data which
>> ensured you never had to touch the data that defined the job, especially
>> since the core code functionality, a task's onyx/fn, is a plain clojure
>> function, operating using whatever Clojure/Java objects you want. We don't
>> think this is generally a good idea, but you have the ability should you
>> need it.
>>
>> When these jobs are submitted to the cluster, this data is serialized and
>> stored in ZooKeeper, to be read back by the cluster for scheduling
>> purposes. This data is human readable when viewed in a dashboard, usable in
>> ClojureScript (even allowing jobs to be built and dispatched by web clients
>> - at which point you may need a data representation anyway), or
>> transformable e.g. if you inspect a previous job's end state and data in
>> order to migrate between jobs.
>>
>> By defining an information model and documentation around the core data
>> representation, we can easily present specific documentation to users when
>> their jobs fail schema validation for any reason, see
>> https://github.com/onyx-platform/onyx/blob/0.8.x/src/onyx/information_model.cljc
>> for the model and documentation map that we use for error messages, and how
>> we have additionally leveraged this information model to build a
>> ClojureScript page that is a handy reference guide for users
>> http://www.onyxplatform.org/docs/cheat-sheet/latest/#/trigger-entry.
>>
>> Hopefully this answers some of your questions around why I like this
>> technique for Onyx, even if I didn't answer your overarching question.
>>
>> Cheers,
>>
>> Lucas
>>
>>
>> On Tuesday, February 2, 2016 at 6:02:23 AM UTC+8, Josh Tilles wrote:
>>>
>>> As I’m watching Michael Drogalis’s Clojure/Conj 2015 presentation
>>> “Onyx: Distributed Computing for Clojure”
>>> <https://youtube.com/watch?v=YlfA8hFs2HY&t=734>, I'm distracted by a
>>> nagging worry that we —as a community— are somehow falling into the same
>>> trap as the those advocating XML in the early 2000s. That said, it's a very
>>> *vague* unease, because I don’t know much about why the industry seems
>>> to have rejected XML as “bad”; by the time I started programming
>>> professionally there was already a consensus that XML sucked, and that
>>> libraries/frameworks that relied heavily on XML configuration files were to
>>> be regarded with suspicion and/or distaste.
>>>
>>> So, am I incorrect in seeing a similarity between the “data > code”
>>> mentality and the rise of XML? Or, assuming there is a legitimate
>>> parallel, is it perhaps unnecessary to be alarmed? Does the tendency to use
>>> edn instead of XML sidestep everything that went wrong in the 2000s? Or is
>>> it the case that the widespread backlash against XML threw a baby out with
>>> the bathwater, forgetting the advantages of data over code?
>>>
>>> Cheers,
>>> Josh
>>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> “One of the main causes of the fall of the Roman Empire was that–lacking
> zero–they had no way to indicate successful termination of their C
> programs.”
> (Robert Firth)
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How is the emphasis of “data over code” different than the XML advocacy of the early 2000s?

Reply via email to