Hi,
as said thanks for little discussion over mail.
I understand that the action is triggered in the end at the write and then all
of a sudden everything is executed at once. But I dont really need to trigger
an action before. I am caching somewherew a df that will be reused several
times (sligh
Hello
After I converted the dataframe to RDD I found the data type was
missing.
scala> df.show
++---+
|name|age|
++---+
|jone| 12|
|rosa| 21|
++---+
scala> df.printSchema
root
|-- name: string (nullable = true)
|-- age: integer (nullable = false)
scala> df.rdd.map{ row => (r
This feels like premature optimization, and not clear it's optimizing, but
maybe.
Caching things that are used once is worse than not caching. It looks like
a straight-line through to the write, so I doubt caching helps anything
here.
On Fri, Apr 1, 2022 at 2:49 AM Joris Billen
wrote:
> Hi,
> as
Hi
I got a dataframe object from other application, it means this obj is
not generated by me.
How can I change the data types for some columns in this dataframe?
For example, change the column type from Int to Float.
Thanks.
---
Please use cast. Also I would strongly recommend to go through spark doco,
its pretty good.
On Sat, 2 Apr 2022 at 12:43 pm, wrote:
> Hi
>
> I got a dataframe object from other application, it means this obj is
> not generated by me.
> How can I change the data types for some columns in this data