Re: naive bayes text classifier with tf-idf in pyspark

Xiangrui Meng Mon, 09 Feb 2015 11:59:07 -0800

On Fri, Feb 6, 2015 at 2:08 PM, Imran Akbar <im...@infoscoutinc.com> wrote:
> Hi,
>
> I've got the following code that's almost complete, but I have 2 questions:
>
> 1)  Once I've computed the TF-IDF vector, how do I compute the vector for
> each string to feed into the LabeledPoint?
>


If I understand your code correctly, you want to map string labels
into double labels in {0.0, 1.0, ..., } to fit NaiveBayes. You can do
that by collecting all distinct labels and create a map from labels to
indices. (We will add a transformer to make this step easier.)

> 2)  Does MLLib provide any methods to evaluate the model's precision,
> recall, F-score, etc?  All I saw in the documentation was"MLlib supports
> common evaluation metrics for binary classification (not available
> inPySpark). This includes precision, recall, F-measure".  What about other
> classifiers besides binary, and from PySpark?
>

We have evaluation metrics for multiclass classification. But
unfortunately they are not available in Python. I created a JIRA
(SPARK-5694) to track it.

> thanks,
> imran

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: naive bayes text classifier with tf-idf in pyspark

Reply via email to