Many thanks for your clarifications.
I don't have a team of engineers. Just myself, that I think with much 
modesty is not little.
I'm not familiar with clojure, i know java programming language.
The lambda's architecture pipeline i want to build will not be made 
entirely with clojure. As described above I will use existing tools that I 
don't need to develop (NiFi, Kafka, MongoDB, Hadoop, Hbase)
Let's focus only on the batch layer of the lambda architecture.
My doubt is that i did not find an optimal tool, recognized by the Big Data 
community as the best, for distributed data processing (map reduce) of 
historical data on HDFS.
Map reduce algorithms that I have to implement concern Word Count Algorithm 
of social data message (twitter,facebook,telegram) and iot data analisys 
and aggregation (such as average values each 30 minutes, each hour, each 
day).
Reading Nathan Marz big data book, Principles and best practices of 
scalable realtime data systems, he suggests clojure/Cascalog for 
distributed data processing on HDFS Hadoop.
I'm asking you if clojure/cascalog could be a good choice to do dataset 
processing (map reduce) and to store the resulting data aggregation to 
Hbase, or if you suggest other work.
Otherwise, if you know an existing, well documented, well googleable 
framework in java language to do distributed data processing and to store 
resulting data aggregation on Hbase,  it would be appreciated your advise 
about it.

Thanks again.
Orazio

Il giorno venerdì 5 luglio 2019 19:43:16 UTC+2, ri...@chartbeat.com ha 
scritto:
>
> As much as I would love to convert a new data engineer to the ways of 
> clojure, in my opinion, choosing a language to solve a problem is rarely a 
> wise move. Do you have a team of engineers ready and willing to learn 
> clojure or are you doing this yourself? We do a lot of work with all of the 
> tools you mention (in clojure) but we built a lot of the frameworks 
> ourselves or wrote wrappers around java tools. Not for the newbie... if 
> your goal is to build this pipeline for your boss and you have any sort of 
> deadline do yourself a favor and pick an existing, well documented, well 
> googleable framework in a language that your team is familiar with. There 
> are a ton of hurdles with everything you mentioned without even getting to 
> clojure. You’re jumping in the deep end of the pool with no life jacket and 
> you don’t know how to swim.
>
> That said, if you ignore my advice you will learn a lot and we will be 
> here to help, just be warned 😎
>
> On Jul 4, 2019, at 2:09 PM, Thad Guidry <thadg...@gmail.com <javascript:>> 
> wrote:
>
> Christian writes really good tools.  Sparkling is no exception.
> I have yet to use it in production myself however, since I haven't had the 
> need to use Clojure directly to solve any "data aggregation" problems.  
> Spark and other tools do that well enough, naturally.
>
> As far as using a tool/programming language to solve "data integration" 
> problems in large enterprise environments, I will ALWAYS use Open Source 
> tools for that purpose.  Clojure is no exception.  But I do tend to choose 
> open source hammers to drive nails.  Sometimes Clojure is missing the 
> handle on its hammer, as we have all experienced, but that's on us since WE 
> have the power to make Clojure better.  But often TIME is what we lack to 
> build better API's, libraries, tools for Clojure expansion.
>
> The Apache ecosystem offers many tools & libraries for "big data" and 
> "data integration"  which I often turn to first because I lack TIME for 
> building (long tail), but have enough TIME for learning new things (shorter 
> tail that helps the long tail).
> https://projects.apache.org/projects.html?category 
>
> Thad
> https://www.linkedin.com/in/thadguidry/
>
>
> On Thu, Jul 4, 2019 at 12:37 PM Chris Nuernberger <ch...@techascent.com 
> <javascript:>> wrote:
>
>> Thad,
>>
>> You approach seems very promising to me for a lot of jobs.  Spark runs on 
>> top of many things.
>>
>> As far as a clojure layer on top, what do you think about sparkling 
>> <http://gorillalabs.github.io/sparkling/>?
>>
>> On Thu, Jul 4, 2019 at 8:43 AM Thad Guidry <thadg...@gmail.com 
>> <javascript:>> wrote:
>>
>>> "Batch" - doing things in chunks
>>> "Processing" - THE WORLD :-)  because it means so many different things 
>>> to so many folks (including your boss)
>>>
>>> Without a doubt, you will love Apache Spark for your batch processing 
>>> and writing Spark Programs to conquer any World you are building.
>>> Spend time to install Spark standalone deploy and then use its powerful 
>>> Spark Shell <https://spark.apache.org/docs/latest/quick-start.html> 
>>> (the feeling of Clojure REPL  !!)
>>> If you just want to jump in to a public cluster and Try Spark, then I 
>>> would suggest Databricks <https://databricks.com/spark/about>. 
>>> Spend time reading the features under Libraries drop-down menu on Apache 
>>> Spark website <https://spark.apache.org/>.
>>>
>>> You might even be encouraged enough to write an official API in Clojure 
>>> for Apache Spark within a year!  (win-win)
>>>
>>> One note of caution if you are building something for long term, you 
>>> will eventually have a need for data versioning, ACID transactions, schema 
>>> evolution, for this I use Delta Lake <https://delta.io/> (not Datomic) 
>>> since its fully compatible with Spark
>>>
>>> Best of luck!
>>> Thad
>>> https://www.linkedin.com/in/thadguidry/
>>>
>>>
>>> On Thu, Jul 4, 2019 at 3:22 AM orazio <orazio...@gmail.com <javascript:>> 
>>> wrote:
>>>
>>>> Hi @atdixon and Thad, thanks for your help.
>>>>
>>>> I provide more details about my project
>>>> My big data layer  is inspired by Lambda architecture. The pipeline 
>>>> include following layers and related tool choosed to address the issue:
>>>> - *Nifi* for *data ingestion*, and publisinh data/message on  kafka 
>>>> topic.
>>>> - *Kafka* as *message broker* that with kafka connect, allow me to 
>>>> store data in mongodb ( with mongodb sink and 1 day retention period ) and 
>>>> HDFS (hdfk sink with 1 year retention period)
>>>> - *Real time processing* with *mongoDB* using it's built-in QueryEngine 
>>>> taht provides extensive Querying, Filtering, and Searching abilities.
>>>> - *Batch processing* of data stored on HDFS, that performs data 
>>>> aggregation and store result on a HBase Table. *?* The question is : 
>>>> Which tool do you suggest to use for data processing sotred on HDFS ?
>>>> - *Serving Layer* with *HBase/Phoneix* to store and allow access to 
>>>> batch view.
>>>>
>>>> Now i'm invoking your help to choose *the most appropriate tool to 
>>>> execute batch jobs (map reduce)* which will have to aggregate data.
>>>> Natahn Marz suggests Clojure/Cascalog. Do you know other excellent 
>>>> clojure/Hadoop work in the community, about data processing?
>>>> if you know some particularly appropriate tools, I could also consider 
>>>> other work/library outside the clojure community.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> Il giorno mercoledì 3 luglio 2019 14:56:09 UTC+2, Thad Guidry ha 
>>>> scritto:
>>>>>
>>>>> "The best code is never written"
>>>>>
>>>>> https://zeppelin.apache.org/ 
>>>>> https://nifi.apache.org/  
>>>>>  
>>>>> Thad
>>>>> https://www.linkedin.com/in/thadguidry/
>>>>>
>>>>>
>>>>> On Tue, Jul 2, 2019 at 11:07 AM orazio <orazio...@gmail.com> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I'm newbie on Clojure/Big Data, and i'm starting with hadoop.
>>>>>> I have installed Hortonworks HDP 3.1 
>>>>>> I have to design a Big Data Layer that ingests large iot datasets and 
>>>>>> social media datasets, process data with MapReduce job and produce 
>>>>>> aggregation to store on HBASE tables.
>>>>>>
>>>>>> For now, my focus is addressed on data processing issue. My question 
>>>>>> is: Is Clojure a good choice for distributed data processing on hadoop ?
>>>>>> I found Cascalog as fully-featured data processing and querying 
>>>>>> library for Clojure or Java. But are there any active maintainers, for 
>>>>>> this 
>>>>>> library ? 
>>>>>> Do you know other excellent clojure/Hadoop work in the community, 
>>>>>> abaout data processing? 
>>>>>>
>>>>>> I would appreciate some help.
>>>>>>
>>>>>> Orazio
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Clojure" group.
>>>>>> To post to this group, send email to clo...@googlegroups.com
>>>>>> Note that posts from new members are moderated - please be patient 
>>>>>> with your first post.
>>>>>> To unsubscribe from this group, send email to
>>>>>> clo...@googlegroups.com
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/clojure?hl=en
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Clojure" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to clo...@googlegroups.com.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/clojure/fbc26ffb-5f00-46a7-bf33-7a899f1ffead%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/clojure/fbc26ffb-5f00-46a7-bf33-7a899f1ffead%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To post to this group, send email to clo...@googlegroups.com 
>>>> <javascript:>
>>>> Note that posts from new members are moderated - please be patient with 
>>>> your first post.
>>>> To unsubscribe from this group, send email to
>>>> clo...@googlegroups.com <javascript:>
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/clojure?hl=en
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Clojure" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to clo...@googlegroups.com <javascript:>.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/clojure/25a56148-9231-4a1b-8bba-8cb79776ba6b%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/clojure/25a56148-9231-4a1b-8bba-8cb79776ba6b%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com 
>>> <javascript:>
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clo...@googlegroups.com <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to clo...@googlegroups.com <javascript:>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/clojure/CAChbWaP7jdLY0DRBwMAu2jWi_YbV2xqf2Y_az00Jb8U_ctv%3DFw%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/clojure/CAChbWaP7jdLY0DRBwMAu2jWi_YbV2xqf2Y_az00Jb8U_ctv%3DFw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com 
>> <javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clo...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to clo...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/clojure/CADbpEJtRLqEpD5nzq5eUwUqXYtE7na87j043LqnqwdUaOWjfSA%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/clojure/CADbpEJtRLqEpD5nzq5eUwUqXYtE7na87j043LqnqwdUaOWjfSA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com <javascript:>
> Note that posts from new members are moderated - please be patient with 
> your first post.
> To unsubscribe from this group, send email to
> clo...@googlegroups.com <javascript:>
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clo...@googlegroups.com <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clojure/CAChbWaNzPoCmYtK4iunpgazyLPFPn83rYzdVP-MQeZVsszr7fw%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/clojure/CAChbWaNzPoCmYtK4iunpgazyLPFPn83rYzdVP-MQeZVsszr7fw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/62768943-4327-4587-9f2f-eb885470214e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to