Amazonica performance: options?

Dave Tenny Fri, 28 Mar 2014 08:07:10 -0700

I'm trying to code some amazonica based solutions in a nontrivial AWS 
environment.
I work with many AWS accounts and it isn't unusual to see a thousand 
instances running on one account, and similar excesses in other types of 
AWS resources.  So if you're going an ec2-describe-instances (or amazonica 
equivalent), it needs not to choke in this environment.


I like the way amazonica does all the bean marshalling for me so I can 
express queries simply.  But the returned datasets need to be more 
pragmatic/performant.

The problem for me is that Amazonica doesn't seem up to the task of dealing 
with queries that return large volumes of data.
It has nothing to do with reflection I suspect, and more to do with 
unwieldy amounts of duplicate information in the result unmarshalling 
process.
The "clojure all the way down" philosophy results of duplicated information 
and just printing the result to a file takes a long time.
If I accidentally let the output go to an emacs cider repl buffer, then 
things get so wedged up to the point I  may as well kill -9 emacs.
(Known cider repl issues here, it isn't all amazonica).

For example:  here's how long it takes to run the java based ec2 cli to 
describe instances on an account:

$ time ec2-describe-images >/tmp/ec2-cli-images.out

real    0m11.484s
user    0m2.564s 
sys     0m0.129s 


And here's how long it takes from a 'lein repl' to run the same query on 
the same account:

(time (with-output ["/tmp/clj-awz-images.out"] (println 
(ec2/describe-images))))
"Elapsed time: 194685.552683 msecs"

Now the amount of data being printed by the EC2 CLI is of course much 
different than the output from Amazonica,
amazonica is returning everything in gory duplicate map detail, ec2 is not, 
as evidenced by the relative output sizes:

-rw-rw-r--.  1 dave dave 17201290 Mar 28 10:35 clj-awz-images.out
-rw-rw-r--.  1 dave dave    99342 Mar 28 10:26 ec2-cli-images.out.11.5s

Where the amazonica output starts with:
{:images [{:hypervisor xen, :state available, :virtualization-type 
paravirtual, :root-device-type instance-store,
... and goes on like that with duplicate keywords all the way down.

Anyway, my goal isn't to turn amazonica into ec2 cli.  But even the most 
trivial operations in amazonica (especially the most trivial, i.e. those 
lacking filters against large data sets), pretty  much whack me left and 
right
with CPU wedged tools and (completely unacceptable) long waits for results.

Any suggestions on how to use amazonica in a way where the output is ... 
different, and minimal/workable?

Or am I left with going to another package or writing my own java sdk api's 
directly?

I'm pretty sure the results need to be structures whose relationship to 
data values is implicit (and not explicit in map keys). I don't see any 
options with amazonica to change this however.

Thanks for suggestions, forgive me if I've missed something obvious.  I'm 
just trying to see what's out there and at the same time move along quickly 
enough that I can get some usable tools for work (so I can lose all my 
python and bash scripts for various interfaces, I want clojure!).

- Dave


-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amazonica performance: options?

Reply via email to