Debugging idea for a constrained situation

fahptv Thu, 21 Jan 2016 10:45:34 -0800

While not strictly Clojure-related, I thought I'd share this idea with you 
here because (1) I came up with it while thinking about design from a 
Clojure / functional point of view and (2) I respect your opinion. It's 
very likely you'll have better ideas...


*I'm in an highly constrained situation*

   - When an Incident occurs (possible bug, bad behavior), I'm told days 
   later and I never have access to the machine where the code ran
   - I only have the following
      - a log file with size constraints (no more than n KB)
      - the version of the code that was running
      - I don't have the following
      - core dumps (they'd be bigger than the n KB anyway, plus no one 
      knows a priori when to persist one for Incidents where the code fully 
      believed it was doing fine)
      - complete info on how to re-create the environment (only partial 
      info)
      - since the code can be running on any kind of machine with any kind 
         of configuration
         - and since there are a lot of other applications of various 
         versions running as well
         - besides, even if I had complete info, actually re-creating such 
         an environment would be very time consuming and error prone
         
Figuring out what went wrong has been *painful*. 


But if I had access to all the values that a program *obtained/received* 
from its environment leading up to the Incident then I could just have my 
program use these values while running in a debugger.


*The basic idea is  *

   1. Log *external* values used by the program over time in production. 
   Don't worry about internal / local values since they are all derived 
   functionally from these external values.
   2. When an Incident occurs, load this log and a final time-stamp into 
   the program's "state map"
   3. Any time the program needs a value from the outside, it uses a value 
   from the state map instead
   4. Set breakpoints and debug away (I'm stuck using C++ (sadness!))

I like this because minimal time is spent re-creating the crime scene. I 
just have to tweak the program to start the task / thread in question after 
it's done loading the state. I won't have to ask QA "do you have a test 
environment where this problem is reproducible?" And I won't be making any 
mistakes in reproducing the Incident because all the values used will be 
loaded in an automated fashion.


*Considerations*

   - Since values may be large, I may have to tweak the logging to enable 
   re-using a value from earlier if it hasn't changed instead of logging it 
   all over again. (current / expired / re-use)
      - The program would be checking if the value has changed for those 
      values which have "expired" (that is, values expire if the task they're 
      related to has finished -- when that task starts up again the program 
would 
      check the map for a value that it needs, find that it has expired, and go 
      fetch the current one from the environment. Then it can decide how to log 
      it in the state log.)
      - I have to make sure every value I need would be within the most 
   recent n KB of log. I may have a separate thread that logs a snapshot of 
   the entire state every n KB.
   - I'm forced to change "every" external access into a conditional that 
   checks the state map first. 
   - sections of code can always opt out of this as long as 
      - I don't think I'll need to debug it esp. if it's been working fine 
         for months 
         - it's basically separate from the rest of the code (i.e. it won't 
         be involved in re-creating any Incidents in other code)
      - State maps make the code more test-able since I can make the 
   program "see" any kind of arbitrary weirdness.
   
I'm very interested to know what you think of this. It does smell 
heavy-handed to me -- but having something like it would alleviate a ton of 
pain... It could be worth it. Thanks in advance for any feedback.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Debugging idea for a constrained situation

Reply via email to