Hello all! I'm working with PySpark trying to reproduce some of the results
we see on batch through streaming processes, just as a PoC for now. For
this, I'm thinking of trying to interpret the execution plan and eventually
write it back to Python (I'm doing something similar with pandas as well,
a
--
*This email, including any information it contains and any
attachments to it, is confidential and may be privileged. This email is
intended only for the use of the named recipient(s). If you are not a named
recipient, please notify the sender immediately by replying to this messa
What scanner did you use? Looks like all CVEs you listed for
jackson-databind-xxx.jar are for older versions (2.9.10.x). A quick
search on NVD revealed that there is only one CVE (CVE-2020-36518) that
affects your Spark versions. This CVE (not on your scanned CVE list) is
on jackson-databind
Unsubscribe
Thanks Mich.
But many original datasource has the abnormal values included from my
experience.
I already used rlike and filter to implement the data cleaning as my
this writing:
https://bigcount.xyz/calculate-urban-words-vote-in-spark.html
What I am surprised is that spark does the string to n
Agg and ave are numeric functions dealing with the numeric values. Why is
column number defined as String type?
Do you perform data cleaning beforehand by any chance? It is good practice.
Alternatively you can use the rlike() function to filter rows that have
numeric values in a column..
scala>