Thad,

You approach seems very promising to me for a lot of jobs.  Spark runs on
top of many things.

As far as a clojure layer on top, what do you think about sparkling
<http://gorillalabs.github.io/sparkling/>?

On Thu, Jul 4, 2019 at 8:43 AM Thad Guidry <thadgui...@gmail.com> wrote:

> "Batch" - doing things in chunks
> "Processing" - THE WORLD :-)  because it means so many different things to
> so many folks (including your boss)
>
> Without a doubt, you will love Apache Spark for your batch processing and
> writing Spark Programs to conquer any World you are building.
> Spend time to install Spark standalone deploy and then use its powerful
> Spark Shell <https://spark.apache.org/docs/latest/quick-start.html> (the
> feeling of Clojure REPL  !!)
> If you just want to jump in to a public cluster and Try Spark, then I
> would suggest Databricks <https://databricks.com/spark/about>.
> Spend time reading the features under Libraries drop-down menu on Apache
> Spark website <https://spark.apache.org/>.
>
> You might even be encouraged enough to write an official API in Clojure
> for Apache Spark within a year!  (win-win)
>
> One note of caution if you are building something for long term, you will
> eventually have a need for data versioning, ACID transactions, schema
> evolution, for this I use Delta Lake <https://delta.io/> (not Datomic)
> since its fully compatible with Spark
>
> Best of luck!
> Thad
> https://www.linkedin.com/in/thadguidry/
>
>
> On Thu, Jul 4, 2019 at 3:22 AM orazio <orazio.pist...@gmail.com> wrote:
>
>> Hi @atdixon and Thad, thanks for your help.
>>
>> I provide more details about my project
>> My big data layer  is inspired by Lambda architecture. The pipeline
>> include following layers and related tool choosed to address the issue:
>> - *Nifi* for *data ingestion*, and publisinh data/message on  kafka
>> topic.
>> - *Kafka* as *message broker* that with kafka connect, allow me to store
>> data in mongodb ( with mongodb sink and 1 day retention period ) and HDFS
>> (hdfk sink with 1 year retention period)
>> - *Real time processing* with *mongoDB* using it's built-in QueryEngine
>> taht provides extensive Querying, Filtering, and Searching abilities.
>> - *Batch processing* of data stored on HDFS, that performs data
>> aggregation and store result on a HBase Table. *?* The question is :
>> Which tool do you suggest to use for data processing sotred on HDFS ?
>> - *Serving Layer* with *HBase/Phoneix* to store and allow access to
>> batch view.
>>
>> Now i'm invoking your help to choose *the most appropriate tool to
>> execute batch jobs (map reduce)* which will have to aggregate data.
>> Natahn Marz suggests Clojure/Cascalog. Do you know other excellent
>> clojure/Hadoop work in the community, about data processing?
>> if you know some particularly appropriate tools, I could also consider
>> other work/library outside the clojure community.
>>
>> Thanks
>>
>>
>>
>> Il giorno mercoledì 3 luglio 2019 14:56:09 UTC+2, Thad Guidry ha scritto:
>>>
>>> "The best code is never written"
>>>
>>> https://zeppelin.apache.org/
>>> https://nifi.apache.org/
>>>
>>> Thad
>>> https://www.linkedin.com/in/thadguidry/
>>>
>>>
>>> On Tue, Jul 2, 2019 at 11:07 AM orazio <orazio...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I'm newbie on Clojure/Big Data, and i'm starting with hadoop.
>>>> I have installed Hortonworks HDP 3.1
>>>> I have to design a Big Data Layer that ingests large iot datasets and
>>>> social media datasets, process data with MapReduce job and produce
>>>> aggregation to store on HBASE tables.
>>>>
>>>> For now, my focus is addressed on data processing issue. My question
>>>> is: Is Clojure a good choice for distributed data processing on hadoop ?
>>>> I found Cascalog as fully-featured data processing and querying library
>>>> for Clojure or Java. But are there any active maintainers, for this library
>>>> ?
>>>> Do you know other excellent clojure/Hadoop work in the community,
>>>> abaout data processing?
>>>>
>>>> I would appreciate some help.
>>>>
>>>> Orazio
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To post to this group, send email to clo...@googlegroups.com
>>>> Note that posts from new members are moderated - please be patient with
>>>> your first post.
>>>> To unsubscribe from this group, send email to
>>>> clo...@googlegroups.com
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/clojure?hl=en
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to clo...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/clojure/fbc26ffb-5f00-46a7-bf33-7a899f1ffead%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/clojure/fbc26ffb-5f00-46a7-bf33-7a899f1ffead%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/clojure/25a56148-9231-4a1b-8bba-8cb79776ba6b%40googlegroups.com
>> <https://groups.google.com/d/msgid/clojure/25a56148-9231-4a1b-8bba-8cb79776ba6b%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/clojure/CAChbWaP7jdLY0DRBwMAu2jWi_YbV2xqf2Y_az00Jb8U_ctv%3DFw%40mail.gmail.com
> <https://groups.google.com/d/msgid/clojure/CAChbWaP7jdLY0DRBwMAu2jWi_YbV2xqf2Y_az00Jb8U_ctv%3DFw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/CADbpEJtRLqEpD5nzq5eUwUqXYtE7na87j043LqnqwdUaOWjfSA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to