Renjin and Spark's dataframes are not going to be easily removed from their 
respective codebases, as far as my brief perusal of the source can tell. I 
agree that N-D DataFrames would be a good addition to the ecosystem, 
similar to the goals of Python's xarray (xarray.pydata.org). However, it is 
not a priority for myself as of this time. Thanks for pointing out the 
DataSet proposal. I'll take a look at that later.

On a slightly related note, where is the best place to ask core.matrix 
questions? I have some small questions about sparse matrix support in 
core.matrix, and what sparse formats are implemented.

On Thursday, March 10, 2016 at 7:45:44 PM UTC-5, Mikera wrote:
>
> core.matrix maintainer here.
>
> I think it would be great to have more work on dataframe-type support. I 
> think the right strategy is as follows:
> a) Make use of the core.matrix Dataset protocols where possible (or add 
> new ones)
> b) Create implementation(s) for these protocols for whatever back-end data 
> frame implementation is being used
>
> The beauty of core.matrix is that we *can* support multiple 
> implementations without fragmentation, because the protocol based approach 
> means that every implementation can use the same API. This is already 
> working well for the array programming APIs (it's easy to mix and match 
> Clojure data structures, Vectorz Java-based arrays, GPU backed arrays in 
> computations). We just need to do the same for DataFrames.
>
> Now: the current core.matrix Dataset API is a bit focused on 2D data 
> tables, but I think it can be extended to general N-dimensional dataframe 
> capability. Would be a great project for someone to take on, happy to give 
> guidance and help merge in changes as needed.
>
> I don't have a particularly strong opinion on which Dataframe 
> implementations are best, but it looks like Spark and Renjin are both great 
> candidates and would be very useful additions to the Clojure numerical 
> ecosystem. If we do things right, they should interoperate easily with the 
> core.matrix APIs, making Clojure ideal for "glue" code across such 
> implementations.
>
> On Thursday, 10 March 2016 04:57:31 UTC+8, arthur.ma...@gmail.com wrote:
>>
>> Is there any desire or need for a Clojure DataFrame?
>>
>>
>> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
>> pandas.DataFrame.
>>
>> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd 
>> like to know if and how people are using it.
>>
>> From quickly researching, I see that some prior work has been done in 
>> this space, such as:
>>
>> * https://github.com/cardillo/joinery
>> * https://github.com/mattrepl/data-frame
>> * 
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>>
>> Rather than going off and creating a competing implementation (
>> https://xkcd.com/927/), I'd like to know if anyone here is actively 
>> working on, or would like to work on a DataFrame and related utilities for 
>> Clojure (and by extension Java)? Is it something that's sorely needed, or 
>> is everybody happy with using Incanter or some other library that I'm not 
>> aware of? If there's already a defacto standard out there, would anyone 
>> care to please point it out?
>>
>> As background information:
>>
>> My specific use-case is in NLP and ML, where I often explore and 
>> prototype in Python, but I'm then left to deal with a smattering of 
>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, 
>> etc.), each with their own ad-hoc implementations of algorithms, matrices, 
>> and utilities for reading data. It would be great to have a unified way to 
>> explore my data in the Clojure REPL, and then serve the same code and 
>> models in production.
>>
>> I would love for Clojure to have a broadly compatible ecosystem similar 
>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix 
>> and Incanter appear to fulfill a large chunk of those roles, but I am not 
>> aware if they've yet become the defacto standards in the community.
>>
>> Any feedback is greatly appreciated.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to