Re: PySpark + Streaming + DataFrames

Tathagata Das Mon, 19 Oct 2015 14:24:19 -0700

RDD and DF are not compatible data types. So you cannot return a DF when
you have to return an RDD. What rather you can do is return the underlying
RDD of the dataframe by dataframe.rdd().



On Fri, Oct 16, 2015 at 12:07 PM, Jason White <jason.wh...@shopify.com>
wrote:

> Hi Ken, thanks for replying.
>
> Unless I'm misunderstanding something, I don't believe that's correct.
> Dstream.transform() accepts a single argument, func. func should be a
> function that accepts a single RDD, and returns a single RDD. That's what
> transform_to_df does, except the RDD it returns is a DF.
>
> I've used Dstream.transform() successfully in the past when transforming
> RDDs, so I don't think my problem is there.
>
> I haven't tried this in Scala yet, and all of the examples I've seen on the
> website seem to use foreach instead of transform. Does this approach work
> in
> Scala?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-Streaming-DataFrames-tp25095p25099.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: PySpark + Streaming + DataFrames

Reply via email to