from:"second_co...@yahoo.com.INVALID"

spark.sql.shuffle.partitions=auto

2024-04-30 Thread second_co...@yahoo.com.INVALID

May i know is spark.sql.shuffle.partitions=auto only available on Databricks? what about on vanilla Spark ? When i set this, it gives error need to put int. Any open source library that auto find the best partition , block size for dataframe?

auto create event log directory if not exist

2024-04-15 Thread second_co...@yahoo.com.INVALID

Spark history server is set to use s3a, like below spark.eventLog.enabled true spark.eventLog.dir s3a://bucket-test/test-directory-log any configuration option i can set on the Spark config such that if the directory 'test-directory-log' does not exist auto create it before start Spark history s

randomsplit has issue?

2024-01-31 Thread second_co...@yahoo.com.INVALID

based on this blog post https://sergei-ivanov.medium.com/why-you-should-not-use-randomsplit-in-pyspark-to-split-data-into-train-and-test-58576d539a36 , I noticed a recommendation against using randomSplit for data splitting due to data sorting. Is the information provided in the blog accurate? I

Re: conver panda image column to spark dataframe

2023-08-03 Thread second_co...@yahoo.com.INVALID

x27;s the case, then you'd want ro only use 3 layers of ArrayType when you define the schema. Best regards,Adrian On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID wrote: i have panda dataframe with column 'image' using numpy.ndarray. shape is (500, 333, 3) per image. my pan

Re: conver panda image column to spark dataframe

2023-07-31 Thread second_co...@yahoo.com.INVALID

? Because if that's the case, then you'd want ro only use 3 layers of ArrayType when you define the schema. Best regards,Adrian On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID wrote: i have panda dataframe with column 'image' using numpy.ndarray. shape is (500, 333, 3)

conver panda image column to spark dataframe

2023-07-27 Thread second_co...@yahoo.com.INVALID

i have panda dataframe with column 'image' using numpy.ndarray. shape is (500, 333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10, 500, 333, 3) when using spark.createDataframe(panda_dataframe, schema), i need to specify the schema, schema = StructType([ StructField(

spark context list_packages()

2023-07-26 Thread second_co...@yahoo.com.INVALID

I ran the following code spark.sparkContext.list_packages() on spark 3.4.1 and i get below error An error was encountered: AttributeError [Traceback (most recent call last): , File "/tmp/spark-3d66c08a-08a3-4d4e-9fdf-45853f65e03d/shell_wrapper.py", line 113, in exec self._exec_then_eval(co

cannot load model using pyspark

2023-05-22 Thread second_co...@yahoo.com.INVALID

spark.sparkContext.textFile("s3a://a_bucket/models/random_forest_zepp/bestModel/metadata", 1).getNumPartitions() when i run above code, i get below error. Can advice how to troubleshoot? i' using spark 3.3.0. the above file path exist. --

Re: Tensorflow on Spark CPU

2023-04-29 Thread second_co...@yahoo.com.INVALID

at 7:30 AM second_co...@yahoo.com.INVALID wrote: Anyone successfully run native tensorflow on Spark ? i tested example at https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor on Kubernetes CPU . By running in on multiple workers CPUs. I do not see any speed up i

Tensorflow on Spark CPU

2023-04-29 Thread second_co...@yahoo.com.INVALID

Anyone successfully run native tensorflow on Spark ? i tested example at https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor on Kubernetes CPU . By running in on multiple workers CPUs. I do not see any speed up in training time by setting number of slot from1

driver and executors shared same Kubernetes PVC

2023-04-28 Thread second_co...@yahoo.com.INVALID

i able to shared same PVC for spark 3.3. but on Spark 3.4 onward. i get below error. I would like all the executors and driver to mount the same PVC. Is this a bug ? I don't want to use SPARK_EXECUTOR_ID or OnDemandOn because otherwise each of the executors will use an unique and separate PVC.

read a binary file and save in another location

2023-03-09 Thread second_co...@yahoo.com.INVALID

any example on how to read a binary file using pySpark and save it in another location . copy feature Thank you,Teoh

pyspark.sql.dataframe.DataFrame versus pyspark.pandas.frame.DataFrame

2023-01-12 Thread second_co...@yahoo.com.INVALID

Good day, May i know what is the different between pyspark.sql.dataframe.DataFrame versus pyspark.pandas.frame.DataFrame ? Are both store in Spark dataframe format? I'm looking for a way to load a huge excel file (4-10GB), i wonder should i use third party library spark-excel or just use native

cannot write spark log to s3a

2022-11-09 Thread second_co...@yahoo.com.INVALID

when running spark job, i used "spark.eventLog.dir": "s3a://_some_bucket_on_prem/spark-history", "spark.eventLog.enabled": true i see the log of the job shows 22/11/10 06:42:30 INFO SingleEventLogFileWriter: Logging events to s3a://_some_bucket_on_prem/spark-history/spark-a2befd8cb91341

Re: pyspark connect to spark thrift server port

2022-10-20 Thread second_co...@yahoo.com.INVALID

+Metastore+3.0+Administration). On 10/20/22 4:31 AM, second_co...@yahoo.com.INVALID wrote: Currently my pyspark code able to connect to hive metastore at port 9083. However using this approach i can't put in-place any security mechanism like LDAP and sql authentication control. Is there a

pyspark connect to spark thrift server port

2022-10-20 Thread second_co...@yahoo.com.INVALID

Currently my pyspark code able to connect to hive metastore at port 9083. However using this approach i can't put in-place any security mechanism like LDAP and sql authentication control. Is there anyway to connect from pyspark to spark thrift server on port 1 without exposing hive metastore

spark.sql.shuffle.partitions=auto

auto create event log directory if not exist

randomsplit has issue?

Re: conver panda image column to spark dataframe

Re: conver panda image column to spark dataframe

conver panda image column to spark dataframe

spark context list_packages()

cannot load model using pyspark

Re: Tensorflow on Spark CPU

Tensorflow on Spark CPU

driver and executors shared same Kubernetes PVC

read a binary file and save in another location

pyspark.sql.dataframe.DataFrame versus pyspark.pandas.frame.DataFrame

cannot write spark log to s3a

Re: pyspark connect to spark thrift server port

pyspark connect to spark thrift server port

16 matches

Site Navigation

Mail list logo

Footer information