Recommendations for a schema-based data language for use in Hadoop?

Ryan Schmitt Tue, 04 Aug 2015 19:06:12 -0700

Hi Clojure people,

I'm currently working on some problems in the big data space, and I'm more 
or less starting from scratch with the Hadoop ecosystem. I was looking at 
ways to work with data in Hadoop, and I realized that (because of how 
InputFormat splitting works) this is a use case where it's actually pretty 
important to use a data language with an external schema. This probably 
means ruling out Edn (for performance and space efficiency reasons) and 
Fressian (managing the Fressian caching domain seems like it could get 
complicated), which are my default solutions for everything, so now I'm 
back to the drawing board. I'd rather not use something braindead like JSON 
or CSV.


It seems like there are a few language-agnostic data languages that are 
popular in this space, such as:

* Thrift
* Protobuf
* Avro

But since the Clojure community has very high standards for data languages, 
as well as a number of different libraries that run code on Hadoop, I was 
wondering if anyone could provide a recommendation for a fast, extensible, 
and well-designed data language to use. (Recommendations of what to avoid 
are also welcome.)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Recommendations for a schema-based data language for use in Hadoop?

Reply via email to