"Batch" - doing things in chunks "Processing" - THE WORLD :-) because it means so many different things to so many folks (including your boss)
Without a doubt, you will love Apache Spark for your batch processing and writing Spark Programs to conquer any World you are building. Spend time to install Spark standalone deploy and then use its powerful Spark Shell <https://spark.apache.org/docs/latest/quick-start.html> (the feeling of Clojure REPL !!) If you just want to jump in to a public cluster and Try Spark, then I would suggest Databricks <https://databricks.com/spark/about>. Spend time reading the features under Libraries drop-down menu on Apache Spark website <https://spark.apache.org/>. You might even be encouraged enough to write an official API in Clojure for Apache Spark within a year! (win-win) One note of caution if you are building something for long term, you will eventually have a need for data versioning, ACID transactions, schema evolution, for this I use Delta Lake <https://delta.io/> (not Datomic) since its fully compatible with Spark Best of luck! Thad https://www.linkedin.com/in/thadguidry/ On Thu, Jul 4, 2019 at 3:22 AM orazio <orazio.pist...@gmail.com> wrote: > Hi @atdixon and Thad, thanks for your help. > > I provide more details about my project > My big data layer is inspired by Lambda architecture. The pipeline > include following layers and related tool choosed to address the issue: > - *Nifi* for *data ingestion*, and publisinh data/message on kafka topic. > - *Kafka* as *message broker* that with kafka connect, allow me to store > data in mongodb ( with mongodb sink and 1 day retention period ) and HDFS > (hdfk sink with 1 year retention period) > - *Real time processing* with *mongoDB* using it's built-in QueryEngine > taht provides extensive Querying, Filtering, and Searching abilities. > - *Batch processing* of data stored on HDFS, that performs data > aggregation and store result on a HBase Table. *?* The question is : > Which tool do you suggest to use for data processing sotred on HDFS ? > - *Serving Layer* with *HBase/Phoneix* to store and allow access to batch > view. > > Now i'm invoking your help to choose *the most appropriate tool to > execute batch jobs (map reduce)* which will have to aggregate data. > Natahn Marz suggests Clojure/Cascalog. Do you know other excellent > clojure/Hadoop work in the community, about data processing? > if you know some particularly appropriate tools, I could also consider > other work/library outside the clojure community. > > Thanks > > > > Il giorno mercoledì 3 luglio 2019 14:56:09 UTC+2, Thad Guidry ha scritto: >> >> "The best code is never written" >> >> https://zeppelin.apache.org/ >> https://nifi.apache.org/ >> >> Thad >> https://www.linkedin.com/in/thadguidry/ >> >> >> On Tue, Jul 2, 2019 at 11:07 AM orazio <orazio...@gmail.com> wrote: >> >>> Hi All, >>> >>> I'm newbie on Clojure/Big Data, and i'm starting with hadoop. >>> I have installed Hortonworks HDP 3.1 >>> I have to design a Big Data Layer that ingests large iot datasets and >>> social media datasets, process data with MapReduce job and produce >>> aggregation to store on HBASE tables. >>> >>> For now, my focus is addressed on data processing issue. My question is: >>> Is Clojure a good choice for distributed data processing on hadoop ? >>> I found Cascalog as fully-featured data processing and querying library >>> for Clojure or Java. But are there any active maintainers, for this library >>> ? >>> Do you know other excellent clojure/Hadoop work in the community, abaout >>> data processing? >>> >>> I would appreciate some help. >>> >>> Orazio >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clo...@googlegroups.com >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clo...@googlegroups.com >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clo...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/clojure/fbc26ffb-5f00-46a7-bf33-7a899f1ffead%40googlegroups.com >>> <https://groups.google.com/d/msgid/clojure/fbc26ffb-5f00-46a7-bf33-7a899f1ffead%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/clojure/25a56148-9231-4a1b-8bba-8cb79776ba6b%40googlegroups.com > <https://groups.google.com/d/msgid/clojure/25a56148-9231-4a1b-8bba-8cb79776ba6b%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/CAChbWaP7jdLY0DRBwMAu2jWi_YbV2xqf2Y_az00Jb8U_ctv%3DFw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.