Clojure is a good choice for Big Data? Which clojure/Hadoop work to use?

2019-07-02 Thread orazio
Hi All, I'm newbie on Clojure/Big Data, and i'm starting with hadoop. I have installed Hortonworks HDP 3.1 I have to design a Big Data Layer that ingests large iot datasets and social media datasets, process data with MapReduce job and produce aggregation to store on HBASE tables. For now, my

Re: Clojure is a good choice for Big Data? Which clojure/Hadoop work to use?

2019-07-02 Thread atdixon
I've found Clojure to be an excellent fit for big data processing for a few reasons: - the nature of big data is that it is often unstructured or semi-structured, and Clojure's immutable ad hoc map-based orientation is well suited to this - much of the big data ecosystem is Java or JVM-based (a

Re: Java Interop on steroids?

2019-07-02 Thread Chris Nuernberger
eglue, 1. I think this is a great idea if it is really necessary. I would be in favor of a reify++ alone to simplify things. I find reify amazing at code compression and heavily use it via type specific macros to implement interfaces that for instance support a particular primitive type. 2. Is

Re: Java Interop on steroids?

2019-07-02 Thread Chris Nuernberger
5. If you need a concrete class definition that then implements a set of type specific interfaces this would seem to fall into a category of gen-class assuming you could specify the interfaces with type specifications. I can't immediately place a way to do this with anything mentioned above. It

Re: Java Interop on steroids?

2019-07-02 Thread atdixon
I'm glad someone else is thinking on this too! #2 - For my case at the moment (Apache Beam), I believe we will always know the types in advance so using a Java class is workable but of course a (proxy++) would be ideal. Beam asks for us to extend abstract generic class so we must use (proxy). I

Clojure is a good choice for Big Data? Which clojure/Hadoop work to use?

2019-07-02 Thread 'Gerard Klijs' via Clojure
My biased first reaction to Hadoop is, do you really need it? It has a separate runtime, some overhead. And it seems to me it much easier to use Kafka, probably connect to get data in/out and Streams/Ksql to process the data. Because of Java interop and the nice generic Kafka Api it's really eas