Dan, That's a huge amount of stats packages available for use assuming we achieve interop with Renjin's dataframes. I'll look into it as well. My priorities are to first get something working for $DAYJOB, and then to build a more generally useful package, and finally add extras such as interop.
- Arthur On Wednesday, March 9, 2016 at 7:04:17 PM UTC-5, Daniel Slutsky wrote: > > Thank you for raising this question. > > By the way, one desired feature for a Clojure dataframe abstraction would > be good interop with Renjin's dataframes. > Renjin is a JVM-based rewrite of (a subset of) R. It offers a large number > of JVM-based statistical libraries. Most of them rely on the dataframe > abstraction for their data. R is also very Lisp-like in its data > representation, so wrapping all this with Clojure would be a delight. > > > > On Thursday, March 10, 2016 at 1:47:44 AM UTC+2, Christopher Small wrote: >> >> >> If you're going to do any work in this area, I would highly encourage you >> to do in as part of the core.matrix library. That is what Incanter is or >> will be using for it's dataset implementation. But it's nice that those >> abstractions and implementations be separate from Incanter itself, since >> Incanter is a rather large dependency. >> >> Core.matrix is certainly (in my eyes) becoming the de facto matrix >> computation library in the Clojure ecosystem, and I think in the level of >> interop between different implementations there, and extent of utilization >> by the clojure community, we rival the python offerings. However, while >> core.matrix has some dataset protocols, api functions and basic >> implementations, there's still some work to get the full expressiveness of >> the data.frame pattern as seen in R and Pandas. Specifically, there is no >> support for setting rownames (or arbitrary "name" assignments beyond that >> of a single dimension (columns...)). This is something I started working on >> a while back, but wasn't able to finish. I could potentially push what I >> came up with to a fork, but unfortunately, I don't have any more time to >> work on the problem at the moment. >> >> Mike Anderson is a great project maintainer, and will probably be happy >> to help guide you in stitching together a solution. >> >> Best >> >> Chris >> >> >> >> >> >> On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com >> wrote: >>> >>> Is there any desire or need for a Clojure DataFrame? >>> >>> >>> By DataFrame, I mean a structure similar to R's data.frame, and Python's >>> pandas.DataFrame. >>> >>> Incanter's DataSet may already be fulfilling this purpose, and if so, >>> I'd like to know if and how people are using it. >>> >>> From quickly researching, I see that some prior work has been done in >>> this space, such as: >>> >>> * https://github.com/cardillo/joinery >>> * https://github.com/mattrepl/data-frame >>> * >>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes >>> >>> Rather than going off and creating a competing implementation ( >>> https://xkcd.com/927/), I'd like to know if anyone here is actively >>> working on, or would like to work on a DataFrame and related utilities for >>> Clojure (and by extension Java)? Is it something that's sorely needed, or >>> is everybody happy with using Incanter or some other library that I'm not >>> aware of? If there's already a defacto standard out there, would anyone >>> care to please point it out? >>> >>> As background information: >>> >>> My specific use-case is in NLP and ML, where I often explore and >>> prototype in Python, but I'm then left to deal with a smattering of >>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, >>> etc.), each with their own ad-hoc implementations of algorithms, matrices, >>> and utilities for reading data. It would be great to have a unified way to >>> explore my data in the Clojure REPL, and then serve the same code and >>> models in production. >>> >>> I would love for Clojure to have a broadly compatible ecosystem similar >>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix >>> and Incanter appear to fulfill a large chunk of those roles, but I am not >>> aware if they've yet become the defacto standards in the community. >>> >>> Any feedback is greatly appreciated. >>> >> -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.