Re: Question about installing Apache Spark [PySpark] computer requirements

2024-07-29 Thread Sadha Chilukoori
) > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) > at org.apache.spark.scheduler.Task.run(Task.scala:141) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > at > org.apache.spark.uti

Re: Question about installing Apache Spark [PySpark] computer requirements

2024-07-29 Thread Sadha Chilukoori
Hi Mike, I'm not sure about the minimum requirements of a machine for running Spark. But to run some Pyspark scripts (and Jupiter notbebooks) on a local machine, I found the following steps are the easiest. I installed Amazon corretto and updated the java_home variable as instructed here https:/

Re: 7368396 - Apache Spark 3.5.1 (Support)

2024-06-07 Thread Sadha Chilukoori
Hi Alex, Spark is an open source software available under Apache License 2.0 ( https://www.apache.org/licenses/), further details can be found here in the FAQ page (https://spark.apache.org/faq.html). Hope this helps. Thanks, Sadha On Thu, Jun 6, 2024, 1:32 PM SANTOS SOUZA, ALEX wrote: > H

Re: Spark join produce duplicate rows in resultset

2023-10-22 Thread Sadha Chilukoori
Hi Meena, I'm asking to clarify, are the *on *& *and* keywords optional in the join conditions? Please try this snippet, and see if it helps select rev.* from rev inner join customer c on rev.custumer_id =c.id inner join product p on rev.sys = p.sys and rev.prin = p.prin and rev.scode= p.bcode

Re: Why the same INSERT OVERWRITE sql , final table file produced by spark sql is larger than hive sql?

2022-10-11 Thread Sadha Chilukoori
I have faced the same problem, where hive and spark orc were using the snappy compression. Hive 2.1 Spark 2.4.8 I'm curious to learn what could be the root cause of this. -S On Tue, Oct 11, 2022, 2:18 AM Chartist <13289341...@163.com> wrote: > > Hi,All > > I encountered a problem as the e-mai