Re: feature selection and sparse vector support

2014-04-11 Thread Ignacio Zendejas
Here's the JIRA: https://issues.apache.org/jira/browse/SPARK-1473 Future discussions should take place in its comments section. Thanks. On Fri, Apr 11, 2014 at 11:26 AM, Ignacio Zendejas < ignacio.zendejas...@gmail.com> wrote: > Thanks for the response, Xiangrui. > > And sounds good, Héctor.

Re: feature selection and sparse vector support

2014-04-11 Thread Ignacio Zendejas
Thanks for the response, Xiangrui. And sounds good, Héctor. Look forward to working on this together. A common interface is definitely required. I'll create a JIRA shortly and will explore design options myself to bring ideas to the table. cheers. On Fri, Apr 11, 2014 at 5:44 AM, Héctor Mour

Re: feature selection and sparse vector support

2014-04-11 Thread Héctor Mouriño-Talín
Hi, Regarding the implementation of feature selection techniques, I'm implementing some iterative algorithms based on a paper by Gavin Brown et al. [1]. In this paper, he proposes a common framework for many Information Theory-based criteria, namely those that use relevancy (mutual information bet

Re: feature selection and sparse vector support

2014-04-10 Thread Xiangrui Meng
Hi Ignacio, Please create a JIRA and send a PR for the information gain computation, so it is easy to track the progress. The sparse vector support for NaiveBayes is already implemented in branch-1.0 and master. You only need to provide an RDD of sparse vectors (created from Vectors.sparse). MLU

feature selection and sparse vector support

2014-04-10 Thread Ignacio Zendejas
Hi, again - As part of the next step, I'd like to make a more substantive contribution and propose some initial work on feature selection, primarily as it relates to text classification. Specifically, I'd like to contribute very straightforward code to perform information gain feature evaluation.