Re: [ANN] General ML and ETL libraries

Didier Wed, 13 Mar 2019 22:38:38 -0700

Awesome!

On Wednesday, 27 February 2019 09:17:33 UTC-8, Chris Nuernberger wrote:
>
> Clojurians,
>
>
> Good morning from (again) snowy Boulder!
>
>
> Following lots of discussion and interactions with many people around the 
> clojure and ML worlds, TechAscent has built a foundation with the intention 
> to allow the average clojurist to do high quality machine learning of the 
> type they are likely to encounter in their day to day work.
>
>
> This isn't a deep learning framework; I already tried that in a bespoke 
> fashion that and I think the mxnet bindings are great. 
>
>
> This is specifically for the use case where you have data coming in from 
> multiple data sources and you need to do the cleaning, processing, and 
> feature augmentation before running some set of simple models.  Then 
> gridsearch across a range of models and go about your business from there.  
> Think more small to medium sized datomic databases and such.  Everyone has 
> a little data before they have a lot and I think this scale captures a far 
> wider range of possible use cases.
>
>
> The foundation comes in two parts.  
>
>
> The first is the ETL library:
>
> https://github.com/techascent/tech.ml.dataset
>
> This library is a column-store based design sitting on top of tablesaw.  
> The clojure ml group profiled lots of different libraries and we found that 
> tablesaw works great.  
>
> The ETL language is composed of three sub languages.  First a 
> set-invariant column selection language.  Second, a minimal functional math 
> language along the lines of APL or J.  Finally a pipeline concept that 
> allows you to describe an ETL pipeline in data with the idea that you 
> create the pipeline and run it on training data and then it records 
> context.  Then during inference later you just used the saved pipeline from 
> the first operation.  
>
> This is the second large ETL system I have worked on; the first was one 
> named Alteryx.
>
>
> The next library is a general ML framework:
>
> https://github.com/techascent/tech.ml
>
> The library has  bindings to xgboost, smile, and libsvm.  Libsvm doesn't 
> get the credit it deserves, btw, as it works extremely well on small-n 
> problems.  xgboost works well on everything and smile contains lots of 
> different types of models that may or may not work well depending on the 
> problem as well as clustering and a lot of other machine-learny type 
> things.  
>
> For this case, my interest wasn't a clear exposition of all the different 
> things smile can do as it was more just to get a wide enough domain of 
> different model generators to be effective.  For a more thorough binding to 
> smile, check out: 
>
> https://github.com/generateme/fastmath
>
>
> I built a clojure version a very involved kaggle problem example using 
> clojupyter and oz as a proof of concept:
>
>
>
> https://github.com/cnuernber/ames-house-prices/blob/master/ames-housing-prices-clojure.md
>
>
> Enjoy :-).
>
> Complements of the TechAscent Crew & Clojure ML Working Group
>


-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] General ML and ETL libraries

Reply via email to