There should not be any need to explicitly make DF-2, DF-3 computation
parallel. Spark generates execution plans and it can decide what can run in
parallel (ideally you should see them running parallel in spark UI).
You need to cache DF-1 if possible (either in memory/disk), otherwise
computation
Same problem here. A google search shows a few related jira tickets in
"Resolved" state but I am getting the same error in Spark 3.0.1. I'm
pasting my `spark-shell` output below:
scala> import org.apache.spark.sql.Encoders
import org.apache.spark.sql.Encoders
scala> val linkageBean = Encoders.b
unsubsribe
Is there any memory leak in spark 2.3.3 version as mentioned in below Jira.
https://issues.apache.org/jira/browse/SPARK-29055.
Please let me know how to solve it.
Thanks
Amit
On Fri, Dec 4, 2020 at 1:55 PM Amit Sharma wrote:
> Can someone help me on this please.
>
>
> Thanks
> Amit
>
> On Wed,
We have a Spark job that produces a result data frame, say DF-1 at the
end of the pipeline (i.e. Proc-1). From DF-1, we need to create two or
more dataf rames, say DF-2 and DF-3 via additional SQL or ML processes,
i.e. Proc-2 and Proc-3. Ideally, we would like to perform Proc-2 and
Proc-3 in
Can someone help me on this please.
Thanks
Amit
On Wed, Dec 2, 2020 at 11:52 AM Amit Sharma wrote:
> Hi , I have a spark streaming job. When I am checking the Excetors tab ,
> there is a Storage Memory column. It displays used memory /total memory.
> What is used memory. Is it memory in use
Hi everyone
We're using the spark thrift server with spark 3.0.1.
We're using it to query hive with jdbc queries using ldap authentication,
and it seems that the LdapAuthenticationProviderImpl.java provided by spark
thrift server is way outdated
(https://github.com/apache/spark/blob/v3.0.1/sql/hi
Hi all,
I have an iterative algorithm in spark that uses each iteration as the
input for the following one, but the size of the data does not change. I am
using localCheckpoint to cut the data's lineage (and also facilitate some
computations that reuse df-s). However, this runs slower and slower a
OK with PyCharm itself, i am getting this error
pyspark.sql.utils.AnalysisException: java.lang.RuntimeException: Error
while running command to get file permissions : java.io.IOException: (null)
entry in command string: null ls -F
C:\Users\admin\PycharmProjects\pythonProject\hive-scratchdir
I gat