One issue I've seen is that after about 24 hours, the sparkapplication job
pods seem to be getting evicted .. i've installed spark history server,
and am verifying the case.
It could be due to resource constraints, checking this.
Pls note : kubeflow spark operator is installed in
t;
>>> hello all - checking to see if anyone has any input on this
>>>
>>> thanks!
>>>
>>>
>>> On Tue, Mar 25, 2025 at 12:22 PM karan alang
>>> wrote:
>>>
>>>> hello All,
>>>>
>>>> I have kubef
t
> me know.
>
> thanks!
>
>
> On Mon, Mar 31, 2025 at 1:58 PM karan alang wrote:
>
>> hello all - checking to see if anyone has any input on this
>>
>> thanks!
>>
>>
>> On Tue, Mar 25, 2025 at 12:22 PM karan alang
>> wrote:
>
Hello folks,
My colleague has posted this issue on Github:
https://github.com/kubeflow/spark-operator/issues/2491
I'm wondering whether anyone here is using the kubeflow, Spark-Operator and
could provide any insight into what's happening here. I know he's been
stumped for a
gt; thanks!
>
>
> On Mon, Mar 31, 2025 at 1:58 PM karan alang wrote:
>
>> hello all - checking to see if anyone has any input on this
>>
>> thanks!
>>
>>
>> On Tue, Mar 25, 2025 at 12:22 PM karan alang
>> wrote:
>>
>>> hell
wrote:
>
>> hello All,
>>
>> I have kubeflow Spark Operator installed on k8s and from what i
>> understand - Spark Shuffle is not officially supported on kubernetes.
>>
>> Looking for feedback from the community on what approach is being taken
>> t
hello all - checking to see if anyone has any input on this
thanks!
On Tue, Mar 25, 2025 at 12:22 PM karan alang wrote:
> hello All,
>
> I have kubeflow Spark Operator installed on k8s and from what i understand
> - Spark Shuffle is not officially supported on kubernetes.
>
hello All,
I have kubeflow Spark Operator installed on k8s and from what i understand
- Spark Shuffle is not officially supported on kubernetes.
Looking for feedback from the community on what approach is being taken to
handle this issue - especially since dynamicAllocation cannot be
enabled
alang wrote:
> Hello All,
> I have kubeflow spark operator installed on GKE (in namespace - so350), as
> well as Spark History Server installed on GKE in namespace shs-350.
> The spark job is launched in a separate namespaces - spark-apps.
>
> When I launch the spark job, it runs fine
Hello All,
I have kubeflow spark operator installed on GKE (in namespace - so350), as
well as Spark History Server installed on GKE in namespace shs-350.
The spark job is launched in a separate namespaces - spark-apps.
When I launch the spark job, it runs fine and I'm able to see the job
de
:///opt/spark/other-jars/mongo-spark-connector_2.12-3.0.2.jar,file:///opt/spark/other-jars/bson-4.0.5.jar,file:///opt/spark/other-jars/mongodb-driver-sync-4.0.5.jar,file:///opt/spark/other-jars/mongodb-driver-core-4.0.5.jar,file:///opt/spark/other-jars/org.apache.spark_spark-sql-kafka-0-10_2.
ar,file:///opt/spark/other-jars/org.mongodb_mongodb-driver-sync-4.0.5.jar,file:///opt/spark/other-jars/org.mongodb_bson-4.0.5.jar,file:///opt/spark/other-jars/org.mongodb_mongodb-driver-core-4.0.5.jar"
>>> "spark.executor.extraClassPath":
>>> "file:///opt/spark/othe
0.jar,file:///opt/spark/other-jars/org.apache.commons_commons-pool2-2.6.2.jar,file:///opt/spark/other-jars/com.github.luben_zstd-jni-1.4.8-1.jar,file:///opt/spark/other-jars/org.lz4_lz4-java-1.7.1.jar,file:///opt/spark/other-jars/org.xerial.snappy_snappy-java-1.1.8.2.jar,file:///opt/spark/other-jars
/spark/zips/streams.zip,file:///opt/spark/zips/utils.zip"
> hadoopConf:
> "fs.gs.impl": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"
> "fs.AbstractFileSystem.gs.impl":
> "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS"
> "google.cloud.a
quot;--isdebug={{ .Values.isdebug }}"
- "--istest={{ .Values.isdebug }}"
here is snapshot of the secret :
```
(base) Karans-MacBook-Pro:spark-k8s-operator karanalang$ kc get secret
spark-gcs-creds -n so350 -o yaml
apiVersion: v1
data:
spark-gcs-key.json:
<--- KEY --->
kind:
Where is the checkpoint location? Not in GCS?
Probably the location of the checkpoint is there- and you don't have
permissions for that...
בתאריך יום ה׳, 3 באוק׳ 2024, 02:43, מאת karan alang :
> This seems to be the cause of this ->
> github.com/kubeflow/spark-operator/issues/1619
This seems to be the cause of this ->
github.com/kubeflow/spark-operator/issues/1619 .. the secret is not getting
mounted die to this error -> MountVolume.SetUp failed for volume
“spark-conf-volume-driver
I'm getting same error in event logs, and the secret mounted is not getting
read
I've kubeflow spark-operator installed on K8s (GKE), and i'm running a
structured streaming job which reads data from kafka .. the job is run
every 10 mins.
It is giving an error shown below:
```
Traceback (most recent call last):
File "/opt/spark/custom-dir/main.py
18 matches
Mail list logo