Thanks Yanbo, will try that!
On Sun, Jul 17, 2016 at 10:26 PM, Yanbo Liang wrote:
> Hi Tobi,
>
> Thanks for clarifying the question. It's very straight forward to convert
> the filtered RDD to DataFrame, you can refer the following code snippets:
>
> from pyspark.sql import Row
>
> rdd2 = filter
Hi Tobi,
Thanks for clarifying the question. It's very straight forward to convert
the filtered RDD to DataFrame, you can refer the following code snippets:
from pyspark.sql import Row
rdd2 = filteredRDD.map(lambda v: Row(features=v))
df = rdd2.toDF()
Thanks
Yanbo
2016-07-16 14:51 GMT-07:00
Hi Yanbo,
Appreciate the response. I might not have phrased this correctly, but I
really wanted to know how to convert the pipeline rdd into a data frame. I
have seen the example you posted. However I need to transform all my data,
just not 1 line. So I did sucessfully use map to use the chisq sel
Hi Tobi,
The MLlib RDD-based API does support to apply transformation on both Vector
and RDD, but you did not use the appropriate way to do.
Suppose you have a RDD with LabeledPoint in each line, you can refer the
following code snippets to train a ChiSqSelectorModel model and do
transformation:
Hi everyone,
I am trying to filter my features based on the spark.mllib ChiSqSelector.
filteredData = vectorizedTestPar.map(lambda lp: LabeledPoint(lp.label,
model.transform(lp.features)))
However when I do the following I get the error below. Is there any other
way to filter my data to avoid th