Hi Dongjoon Hyun,
Any inputs on the below issue would be helpful. Please let us know if we're
missing anything?
Thanks and Regards,
Abhishek
From: Patidar, Mohanlal (Nokia - IN/Bangalore)
Sent: Thursday, January 20, 2022 11:58 AM
To: user@spark.apache.org
Subject: Suspected SPAM - RE: Regardin
You can cast the cols as well. But are the columns strings to begin with?
they could also actually be doubles.
On Wed, Jan 26, 2022 at 8:49 PM wrote:
> when creating dataframe from a list, how can I specify the col type?
>
> such as:
>
> >>> df =
> >>>
> spark.createDataFrame(list,["name","title
from pyspark.sql.types import *
list =[("buck trends", "ceo", 20.00, 0.25, "100")]
schema = StructType([ StructField("name", StringType(), True),
StructField("title", StringType(), True),
StructField("salary", DoubleType(), True),
when creating dataframe from a list, how can I specify the col type?
such as:
df =
spark.createDataFrame(list,["name","title","salary","rate","insurance"])
df.show()
+---+-+--++-+
| name|title|salary|rate|insurance|
+---+-+--++
Hi Aurélien!
Please run
mvn dependency:tree
and check it for Jackson dependencies.
Feel free to respond with the output if you have any questions about it.
Cheers,
Steve C
> On 22 Jan 2022, at 10:49 am, Aurélien Mazoyer wrote:
>
> Hello,
>
> I migrated my code to Spark 3.2 and I am
unsubscribe
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Really depends on what your UDF is doing. You could read 2GB of XML into
much more than that as a DOM representation in memory.
Remember 15GB of executor memory is shared across tasks.
You need to get a handle on what memory your code is using to begin with to
start to reason about whether that's e
Thanks for your quick response.
For some reasons I can't use spark-xml (schema related issue).
I've tried reducing number of tasks per executor by increasing the number
of executors, but it still throws same error.
I can't understand why does even 15gb of executor memory is not sufficient
to par
Executor memory used shows data that is cached, not the VM usage. You're
running out of memory somewhere, likely in your UDF, which probably parses
massive XML docs as a DOM first or something. Use more memory, fewer tasks
per executor, or consider using spark-xml if you are really just parsing
pie
I'm doing some complex operations inside spark UDF (parsing huge XML).
Dataframe:
| value |
| Content of XML File 1 |
| Content of XML File 2 |
| Content of XML File N |
val df = Dataframe.select(UDF_to_parse_xml(value))
UDF looks something like:
val XMLelements : Array[MyClass1] = getXMLelemen
10 matches
Mail list logo