Awesome! On Wednesday, 27 February 2019 09:17:33 UTC-8, Chris Nuernberger wrote: > > Clojurians, > > > Good morning from (again) snowy Boulder! > > > Following lots of discussion and interactions with many people around the > clojure and ML worlds, TechAscent has built a foundation with the intention > to allow the average clojurist to do high quality machine learning of the > type they are likely to encounter in their day to day work. > > > This isn't a deep learning framework; I already tried that in a bespoke > fashion that and I think the mxnet bindings are great. > > > This is specifically for the use case where you have data coming in from > multiple data sources and you need to do the cleaning, processing, and > feature augmentation before running some set of simple models. Then > gridsearch across a range of models and go about your business from there. > Think more small to medium sized datomic databases and such. Everyone has > a little data before they have a lot and I think this scale captures a far > wider range of possible use cases. > > > The foundation comes in two parts. > > > The first is the ETL library: > > https://github.com/techascent/tech.ml.dataset > > This library is a column-store based design sitting on top of tablesaw. > The clojure ml group profiled lots of different libraries and we found that > tablesaw works great. > > The ETL language is composed of three sub languages. First a > set-invariant column selection language. Second, a minimal functional math > language along the lines of APL or J. Finally a pipeline concept that > allows you to describe an ETL pipeline in data with the idea that you > create the pipeline and run it on training data and then it records > context. Then during inference later you just used the saved pipeline from > the first operation. > > This is the second large ETL system I have worked on; the first was one > named Alteryx. > > > The next library is a general ML framework: > > https://github.com/techascent/tech.ml > > The library has bindings to xgboost, smile, and libsvm. Libsvm doesn't > get the credit it deserves, btw, as it works extremely well on small-n > problems. xgboost works well on everything and smile contains lots of > different types of models that may or may not work well depending on the > problem as well as clustering and a lot of other machine-learny type > things. > > For this case, my interest wasn't a clear exposition of all the different > things smile can do as it was more just to get a wide enough domain of > different model generators to be effective. For a more thorough binding to > smile, check out: > > https://github.com/generateme/fastmath > > > I built a clojure version a very involved kaggle problem example using > clojupyter and oz as a proof of concept: > > > > https://github.com/cnuernber/ames-house-prices/blob/master/ames-housing-prices-clojure.md > > > Enjoy :-). > > Complements of the TechAscent Crew & Clojure ML Working Group >
-- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.