Re: Doseq, map-style

Kurt Harriger Tue, 12 Jun 2012 09:54:35 -0700


On Tuesday, June 12, 2012 2:18:03 AM UTC-6, Christophe Grand wrote:
>
> Hi,
>
> To contrast our experiences of the language and the different approaches 
> to deal with some problems:
>
> On Sun, Jun 10, 2012 at 4:47 AM, Kurt Harriger <kurtharri...@gmail.com>wrote:
>
>>  Many will say that side-effecting functions are more difficult to test 
>> then pure functions... However after writing about 4000 lines of clojure 
>> code, I realized that things in practice are never quite as simple as they 
>> seem.  As functions are composed the data structures they work with grow 
>> larger and more complex and this leads to maps containing maps containing 
>> lists containing maps and a minor change downstream can ripple through the 
>> program.  Tests become significantly more complex and fragile as the input 
>> and output structures grow in complexity.
>>
>
> Do you test only the functions or do you have also introduced "lint" 
> functions which check the shape of your data. To me, these are pretty 
> useful: you can use them in tests, pre/postconds, middlewares to guard 
> against untrusted sources etc. 
>
>
I do use pre-conditions where the test condition is simple, ie is this a 
string? does the map have a :field-type? however I get a lot of my input 
data from http requests as json which have similar structures but different 
semantics, so I often do not have preconditions where type is not 
explicit. For example a string could be an list id or a contact id or an 
encoded json document. While it is possible to try to parse a string for 
json to verify its type this is seems very computationally expensive and 
therefore usually inferred from the context.


I also felt that explicit type checking went against the spirit of duck 
typing, there was a couple of times I added type checks only to realize 
that the type checking was to strict... ie nil was an acceptable value and 
now the code through an assertion error. In many cases a function simply 
takes the argument and passes it to another function, so I don't really 
care what type the argument is as long as there is an implementation of the 
other function which supports that data type, maybe it doesn't know but in 
the future maybe it will should my code unnecessarily constrain the type 
based on an implementation detail?  

I have never been truely sold on duck typing however as I often find the 
time I spend debugging exceptions thrown deep down in the call stack 
because an error that could have been caught earlier when the problem was 
obvious was allowed to penetrate deep into call stack where the problem is 
no longer obvious often within a third party library that never expected 
that type of input.  In OO the argument is an interface which says nothing 
about the structure of the object, only that it provides the desired 
behavior.  Protocols are a step in this direction, however, if you extend a 
protocol to a map then (satisfy? TheProtocol {}) will return true for ALL 
maps, making satisfies? an otherwise useless precondition.  

For example, my first contact model I had {... :fields [{:field-type 
 :email :value "" ...}]}  I later realized that the endpoint to get the 
contact information provided them grouped and I was filtering a lot based 
on field-type anyway so a more effective data structure was {:emails 
[{:value "" ...}]}, :field-type was just an implementation detail, the 
email object still has a value and associated behavior but the map no 
longer contains a :field-type.  However, phone numbers have exactly the 
same {:value ""} structure so the only way to determine if it is an email 
or a phone number is from context or by parsing the string.


 
>
>> This reminded me of another OO code smell.... "Don't talk to strangers" 
>> and the Law of Demeter, instead sending and returning maps of lists of maps 
>> I started returning maps of functions.  This provided additional decoupling 
>> that enabled me to refactor a bit more easily additionally maps of maps of 
>> lists of maps often need to be fully computed where as a map containing 
>> functions allows me to defer computation until it is actually required 
>> which may in many cases be never. 
>>
>
> Basically you are returning a "lazy map" a map of keys to delayed values 
> (why not use delays instead of fns as values?), while it is sometimes a 
> necessity to do so, the implied trade-off must not be overlooked: the map 
> can't be treated as a value anymore: if you call twice the pure function 
> which generates such a lazy map twice with the same arguments, you get two 
> lazy maps which are not equals! I'm not even speaking about being equal to 
> their non-lazy counterparts (which makes them a bit harder to test)
>

This is an excellent point. Initially I started passing maps of functions 
into other functions as optional parameter map for testing (new-correction 
[.... & {:keys [get-current-date] :or [get-current-date get-current-date]). 
  This solved one problem but created others, the function was now 
completely deterministic and easy to test, but for more than a function or 
two quickly becomes verbose, so I created a factory function to create the 
map and not long after realized these were "objects".  At this point I 
started using records and protocols instead of maps, even an empty record 
since really what I cared about that this type had associated behavior.  So 
while I used maps of functions in some cases I later refactored away from 
maps to records.  However this has created an entirely new set of 
frustrations.  If I implement the protocol within (defrecord ...) form I 
find that changing the implementation requires restarting the JVM to which 
I have probably lost a good hour of debugging problems I already fixed just 
needed to restart.  If I use (extend Type Protocol {...}) I no longer need 
to reload the JVM, however now the record type will no longer implement the 
associated java interface.  This generally is ok, however I found that 
Protocols cannot extend other protocols, so if this is desired then you 
either need to extend the (:oninterface Protocol), extend all known 
implementations, or create an adapter to delegate to the other protocol.  

I also tried using multimethods instead of protocols, but I still found I 
needed to restart the JVM frequently.  I think whenever I would re-eval the 
namespace with the record type and (defmethod ...) a new class file would 
be emitted and the objects in my swank session need to be recreated.   

 
>
>> Although very idiomatic to use keywords to get from maps, I have started 
>> to think of this as a code smell and instead prefer to (def value :value) 
>> and use this var instead of the keyword because it allows me to later 
>> replace the implementation or rename properties if it is necessary to 
>> refactor and I want to minimize changes to existing code or make changes to 
>> the existing code in small incremental units rather than all at once.
>>
>
> I think this is a premature optimization. If you need to get rid off 
> keywords acces later on, you can either do what you propose and modify all 
> the call sites to remove the colon (but the important thing is that it 
> won't change the shape of your code, it's a minor refactoring) OR if you 
> are really stuck and don't want to touch the codebase, don't forget that in 
> Clojure (unless you are using interop) you are always one abstraction away: 
> you can define your own associative type which will knows how to respond to 
> lookup for gets. Plus doing so you may even choose to leverage the 
> optimized code path for keyword lookups (see IKeywordLookup.java).
>

I disagree.  Ironically, if you told a java developer that getters were a 
premature optimization and he should use fields instead he would look at 
you funny.  Getters are not an optimization and if anything have a minor 
performance penalty.  However, using fields makes one very dependent on 
implementation details doing so is considered bad practice. I don't see how 
this is any different in clojure.  For example, my email data structure 
currently has just {:value "u...@domain.com"} however for various reasons 
it might actually be beneficial for me to use {:user "user" :domain 
"domain.com"}.  Using (def value :value) has practically no significant 
overhead, but would make this change trivial.  However, if I had to find 
and replace :value everywhere this becomes significantly more difficult and 
error prone.  Even if I grep for :value many maps that aren't emails 
contain :value so I would need to proceed carefully to ensure not only that 
I replaced them all, but I did not accidentally replace a :value for a 
different type.   I have also found that if I mistype a keyword I get 
nil... but if I mistype value the complier will complain that it cannot 
find the definition.  This also allows me to use emacs to find all usages 
and to "lean on the compiler" during any refactoring.  

Overriding IKeywordLookup will only get use so far, perhaps the consumer 
uses the map as a sequence key values which is really a MapEntry which is 
destructured with nth or perhaps they don't destructure but use key and val 
instead?  And what then if the user tries to assoc a new value clojure will 
change field within the record value and return a new instance of that 
record which still implements the IKeywordLookup behavior so the record 
will appear not to have been updated.  Even if I could override all this to 
make it work, I would much rather find and replace all usages of :value as 
necessary... or better still just change a single occurrence of (def value 
:value).  

There is a quote "better to have 100 functions that work on one data 
structure than 10 functions on 10 data structures."  I once also reiterated 
this quote, however, in retrospective I completely disagree with this 
quote.  These functions are often leaky and expose implementation details 
which makes changes unnecessarily difficult. While it may be possible to 
re-implement these functions to preserve desired it is usually not ideal 
and often even more difficult to maintain.  "I would rather provide 10 
functions that are relevant to the problem domain, then support 100 
functions which are not."  

There are still a lot of things I like about clojure, immutable persistent 
data structures, equality semantics, meta programming and embedded 
languages such core logic, cascalog, etc.  Abstraction and data hiding are 
different things.  In OO I too think private methods are a code smell, 
(they should be moved to public methods within another class), but the 
clojure community seems to believe encapsulation is only about data hiding 
and does not currently seem to value the age old abstraction principle.

Kurt 


> Christophe
>  

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Doseq, map-style

Reply via email to