[spark user] Issues with Overwrite Mode in Spark Write Parquet Using Root User

2024-10-23 Thread Ilango
Hi all, I am using a Spark standalone cluster (open source) with the root user as the Spark user. I am facing an issue when using the Overwrite mode in Spark write parquet. The jobs are failing because the user is unable to delete the tmp folder within the parquet folder. I plan to create a Spark

Re: Help - Learning/Understanding spark web UI

2024-09-26 Thread Ilango
Hi Karthick, I found one of the spark summit talk few years back on spark UI was quite useful. Just search in youtube. let me Check it out and will share it with you if i found it again Thanks, Elango On Thu, 26 Sep 2024 at 4:04 PM, Karthick Nk wrote: > Hi All, > I am looking to deepen my und

[Spark Connect ] Date Data type formatting issue

2024-08-14 Thread Ilango
Hi all, I am experiencing a date field formatting issue when loading data from an Hive table in Spark via Spark Connect (On AWS EMR cluster) using R sparklyr package. The date field is converted to a char type, where as the same field is loaded as a date type when using our On-Premise Spark with Y

Re: [spark connect] unable to utilize stand alone cluster

2024-08-06 Thread Ilango
on passing the master option to your spark connect > command? > > On Tue, 6 Aug, 2024, 15:36 Ilango, wrote: > >> >> >> >> Thanks Prabodh. I'm having an issue with the Spark Connect connection as >> the `spark.master` value is set to `local[*]` in Spark Con

Re: [spark connect] unable to utilize stand alone cluster

2024-08-06 Thread Ilango
:45 PM, Prabodh Agarwal wrote: > There is an executors tab on spark connect. It's contents are generally > similar to the workers section of the spark master ui. > > You might need to specify --master option in your spark connect command if > you haven't done so yet. >

[spark connect] unable to utilize stand alone cluster

2024-08-06 Thread Ilango
Hi all, I am evaluating the use of Spark Connect with my Spark stand-alone cluster, which has a master node and 3 worker nodes. I have successfully created a Spark Connect connection. However, when submitting Spark SQL queries, the jobs are being executed only on the master node, and I do not obse

Re: [Spark Connect] connection issue

2024-07-29 Thread Ilango
spark connect jar file in the > `$SPARK_HOME/jars` directory and remove the `--packages` or the `--jars` > option from your start command. > > On Mon, Jul 29, 2024 at 7:01 PM Ilango wrote: > >> >> Thanks Prabodh, Yes I can see the spark connect logs in $SPARK_HOME/

Re: [Spark Connect] connection issue

2024-07-29 Thread Ilango
. Is that not feasible > for you? > For me log comes to $SPARK_HOME/logs > > On Mon, 29 Jul, 2024, 15:30 Ilango, wrote: > >> >> Hi all, >> >> >> I am facing issues with a Spark Connect application running on a Spark >> standalone cluster (without YARN

[Spark Connect] connection issue

2024-07-29 Thread Ilango
Hi all, I am facing issues with a Spark Connect application running on a Spark standalone cluster (without YARN and HDFS). After executing the start-connect-server.sh script with the specified packages, I observe a process ID for a short period but am unable to see the corresponding port (default

[spark connect] issue in testing spark connect

2024-07-19 Thread Ilango
Hi all, I am currently using a Spark standalone cluster, which is functioning as expected. Users are able to connect to the cluster and submit jobs without any issues. I am also testing the Spark Connect capability, which will enable external clients to submit jobs to the cluster. To start the

Re: Spark stand-alone mode

2023-10-16 Thread Ilango
g.html#dynamic-resource-allocation > > On Mon, Sep 18, 2023 at 3:53 PM Ilango wrote: > >> >> Thanks all for your suggestions. Noted with thanks. >> Just wanted share few more details about the environment >> 1. We use NFS for data storage and data is in parquet for

Re: Spark stand-alone mode

2023-09-18 Thread Ilango
>> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >

Spark stand-alone mode

2023-09-14 Thread Ilango
Hi all, We have 4 HPC nodes and installed spark individually in all nodes. Spark is used as local mode(each driver/executor will have 8 cores and 65 GB) in Sparklyr/pyspark using Rstudio/Posit workbench. Slurm is used as scheduler. As this is local mode, we are facing performance issue(as only o

Datastore for GrpahX

2015-11-21 Thread Ilango Ravi
Hi I am trying to figure which Datastore I can use for storing data to be used with GraphX. Is there a good Graph database out there which I can use for storing Graph data for efficient data storage/retireval. thanks, ravi