Re: How to Spawn Child Thread or Sub-jobs in a Spark Session

2020-12-04 Thread Raghavendra Ganesh
There should not be any need to explicitly make DF-2, DF-3 computation parallel. Spark generates execution plans and it can decide what can run in parallel (ideally you should see them running parallel in spark UI). You need to cache DF-1 if possible (either in memory/disk), otherwise computation

Re: Typed datataset from Avro generated classes?

2020-12-04 Thread Nads
Same problem here. A google search shows a few related jira tickets in "Resolved" state but I am getting the same error in Spark 3.0.1. I'm pasting my `spark-shell` output below: scala> import org.apache.spark.sql.Encoders import org.apache.spark.sql.Encoders scala> val linkageBean = Encoders.b

RE: Spark UI Storage Memory

2020-12-04 Thread Jack Yang
unsubsribe

Re: Spark UI Storage Memory

2020-12-04 Thread Amit Sharma
Is there any memory leak in spark 2.3.3 version as mentioned in below Jira. https://issues.apache.org/jira/browse/SPARK-29055. Please let me know how to solve it. Thanks Amit On Fri, Dec 4, 2020 at 1:55 PM Amit Sharma wrote: > Can someone help me on this please. > > > Thanks > Amit > > On Wed,

How to Spawn Child Thread or Sub-jobs in a Spark Session

2020-12-04 Thread Artemis User
We have a Spark job that produces a result data frame, say DF-1 at the end of the pipeline (i.e. Proc-1).  From DF-1, we need to create two or more dataf rames, say DF-2 and DF-3 via additional SQL or ML processes, i.e. Proc-2 and Proc-3.  Ideally, we would like to perform Proc-2 and Proc-3 in

Re: Spark UI Storage Memory

2020-12-04 Thread Amit Sharma
Can someone help me on this please. Thanks Amit On Wed, Dec 2, 2020 at 11:52 AM Amit Sharma wrote: > Hi , I have a spark streaming job. When I am checking the Excetors tab , > there is a Storage Memory column. It displays used memory /total memory. > What is used memory. Is it memory in use

Spark thrift server ldap

2020-12-04 Thread mickymiek
Hi everyone We're using the spark thrift server with spark 3.0.1. We're using it to query hive with jdbc queries using ldap authentication, and it seems that the LdapAuthenticationProviderImpl.java provided by spark thrift server is way outdated (https://github.com/apache/spark/blob/v3.0.1/sql/hi

Broadcast size increases with subsequent iterations

2020-12-04 Thread Kalin Stoyanov
Hi all, I have an iterative algorithm in spark that uses each iteration as the input for the following one, but the size of the data does not change. I am using localCheckpoint to cut the data's lineage (and also facilitate some computations that reuse df-s). However, this runs slower and slower a

Re: In windows 10, accessing Hive from PySpark with PyCharm throws error

2020-12-04 Thread Mich Talebzadeh
OK with PyCharm itself, i am getting this error pyspark.sql.utils.AnalysisException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\Users\admin\PycharmProjects\pythonProject\hive-scratchdir I gat