Here's the JIRA:
https://issues.apache.org/jira/browse/SPARK-1473
Future discussions should take place in its comments section.
Thanks.
On Fri, Apr 11, 2014 at 11:26 AM, Ignacio Zendejas <
ignacio.zendejas...@gmail.com> wrote:
> Thanks for the response, Xiangrui.
>
> And sounds good, Héctor.
Thanks for the response, Xiangrui.
And sounds good, Héctor. Look forward to working on this together.
A common interface is definitely required. I'll create a JIRA shortly and
will explore design options myself to bring ideas to the table.
cheers.
On Fri, Apr 11, 2014 at 5:44 AM, Héctor Mour
Hi,
Regarding the implementation of feature selection techniques, I'm
implementing some iterative algorithms based on a paper by Gavin Brown et
al. [1]. In this paper, he proposes a common framework for many Information
Theory-based criteria, namely those that use relevancy (mutual information
bet
Hi Ignacio,
Please create a JIRA and send a PR for the information gain
computation, so it is easy to track the progress.
The sparse vector support for NaiveBayes is already implemented in
branch-1.0 and master. You only need to provide an RDD of sparse
vectors (created from Vectors.sparse).
MLU
Hi, again -
As part of the next step, I'd like to make a more substantive contribution
and propose some initial work on feature selection, primarily as it relates
to text classification.
Specifically, I'd like to contribute very straightforward code to perform
information gain feature evaluation.