Hi All,
I'm newbie on Clojure/Big Data, and i'm starting with hadoop.
I have installed Hortonworks HDP 3.1
I have to design a Big Data Layer that ingests large iot datasets and
social media datasets, process data with MapReduce job and produce
aggregation to store on HBASE tables.
For now, my
I've found Clojure to be an excellent fit for big data processing for a few
reasons:
- the nature of big data is that it is often unstructured or
semi-structured, and Clojure's immutable ad hoc map-based orientation is
well suited to this
- much of the big data ecosystem is Java or JVM-based (a
eglue,
1. I think this is a great idea if it is really necessary. I would be in
favor of a reify++ alone to simplify things. I find reify amazing at code
compression and heavily use it via type specific macros to implement
interfaces that for instance support a particular primitive type.
2. Is
5. If you need a concrete class definition that then implements a set of
type specific interfaces this would seem to fall into a category of
gen-class assuming you could specify the interfaces with type
specifications. I can't immediately place a way to do this with anything
mentioned above. It
I'm glad someone else is thinking on this too!
#2 - For my case at the moment (Apache Beam), I believe we will always know
the types in advance so using a Java class is workable but of course a
(proxy++) would be ideal. Beam asks for us to extend abstract generic class
so we must use (proxy). I
My biased first reaction to Hadoop is, do you really need it? It has a separate
runtime, some overhead. And it seems to me it much easier to use Kafka,
probably connect to get data in/out and Streams/Ksql to process the data.
Because of Java interop and the nice generic Kafka Api it's really eas