Hi all,
I am using a Spark standalone cluster (open source) with the root user as
the Spark user. I am facing an issue when using the Overwrite mode in Spark
write parquet. The jobs are failing because the user is unable to delete
the tmp folder within the parquet folder.
I plan to create a Spark
Hi Karthick,
I found one of the spark summit talk few years back on spark UI was quite
useful. Just search in youtube. let me Check it out and will share it with
you if i found it again
Thanks,
Elango
On Thu, 26 Sep 2024 at 4:04 PM, Karthick Nk wrote:
> Hi All,
> I am looking to deepen my und
Hi all,
I am experiencing a date field formatting issue when loading data from an
Hive table in Spark via Spark Connect (On AWS EMR cluster) using R sparklyr
package. The date field is converted to a char type, where as the same
field is loaded as a date type when using our On-Premise Spark with Y
on passing the master option to your spark connect
> command?
>
> On Tue, 6 Aug, 2024, 15:36 Ilango, wrote:
>
>>
>>
>>
>> Thanks Prabodh. I'm having an issue with the Spark Connect connection as
>> the `spark.master` value is set to `local[*]` in Spark Con
:45 PM, Prabodh Agarwal
wrote:
> There is an executors tab on spark connect. It's contents are generally
> similar to the workers section of the spark master ui.
>
> You might need to specify --master option in your spark connect command if
> you haven't done so yet.
>
Hi all,
I am evaluating the use of Spark Connect with my Spark stand-alone cluster,
which has a master node and 3 worker nodes. I have successfully created a
Spark Connect connection. However, when submitting Spark SQL queries, the
jobs are being executed only on the master node, and I do not obse
spark connect jar file in the
> `$SPARK_HOME/jars` directory and remove the `--packages` or the `--jars`
> option from your start command.
>
> On Mon, Jul 29, 2024 at 7:01 PM Ilango wrote:
>
>>
>> Thanks Prabodh, Yes I can see the spark connect logs in $SPARK_HOME/
. Is that not feasible
> for you?
> For me log comes to $SPARK_HOME/logs
>
> On Mon, 29 Jul, 2024, 15:30 Ilango, wrote:
>
>>
>> Hi all,
>>
>>
>> I am facing issues with a Spark Connect application running on a Spark
>> standalone cluster (without YARN
Hi all,
I am facing issues with a Spark Connect application running on a Spark
standalone cluster (without YARN and HDFS). After executing the
start-connect-server.sh script with the specified packages, I observe a
process ID for a short period but am unable to see the corresponding port
(default
Hi all,
I am currently using a Spark standalone cluster, which is functioning as
expected. Users are able to connect to the cluster and submit jobs without
any issues.
I am also testing the Spark Connect capability, which will enable external
clients to submit jobs to the cluster. To start the
g.html#dynamic-resource-allocation
>
> On Mon, Sep 18, 2023 at 3:53 PM Ilango wrote:
>
>>
>> Thanks all for your suggestions. Noted with thanks.
>> Just wanted share few more details about the environment
>> 1. We use NFS for data storage and data is in parquet for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>
Hi all,
We have 4 HPC nodes and installed spark individually in all nodes.
Spark is used as local mode(each driver/executor will have 8 cores and 65
GB) in Sparklyr/pyspark using Rstudio/Posit workbench. Slurm is used as
scheduler.
As this is local mode, we are facing performance issue(as only o
Hi
I am trying to figure which Datastore I can use for storing data to be
used with GraphX. Is there a good Graph database out there which I can use
for storing Graph data for efficient data storage/retireval.
thanks,
ravi
14 matches
Mail list logo