date:20200625

Metrics Problem

2020-06-25 Thread Bryan Jeffrey

Hello. I am running Spark 2.4.4. I have implemented a custom metrics producer. It works well when I run locally, or specify the metrics producer only for the driver. When I ask for executor metrics I run into ClassNotFoundExceptions *Is it possible to pass a metrics JAR via --jars? If so what a

Blog : Apache Spark Window Functions

2020-06-25 Thread neeraj bhadani

Hi Team, I would like to share with the community that my blog on "Apache Spark Window Functions" got published. PFB link if anyone interested. Link: https://medium.com/expedia-group-tech/deep-dive-into-apache-spark-window-functions-7b4e39ad3c86 Please share your thoughts and feedback. Rega

Re: Getting PySpark Partitions Locations

2020-06-25 Thread Sean Owen

You can always list the S3 output path, of course. On Thu, Jun 25, 2020 at 7:52 AM Tzahi File wrote: > Hi, > > I'm using pyspark to write df to s3, using the following command: > "df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)". > > Is there any way to get the

Re: Getting PySpark Partitions Locations

2020-06-25 Thread Sanjeev Mishra

You can use catalog apis see following https://stackoverflow.com/questions/54268845/how-to-check-the-number-of-partitions-of-a-spark-dataframe-without-incurring-the/54270537 On Thu, Jun 25, 2020 at 6:19 AM Tzahi File wrote: > I don't want to query with a distinct on the partitioned columns, the

Re: Getting PySpark Partitions Locations

2020-06-25 Thread Tzahi File

I don't want to query with a distinct on the partitioned columns, the df contains over 1 Billion of records. I just want to know the partitions that were created.. On Thu, Jun 25, 2020 at 4:04 PM Jörn Franke wrote: > By doing a select on the df ? > > Am 25.06.2020 um 14:52 schrieb Tzahi File : >

Re: Getting PySpark Partitions Locations

2020-06-25 Thread Jörn Franke

By doing a select on the df ? > Am 25.06.2020 um 14:52 schrieb Tzahi File : > > > Hi, > > I'm using pyspark to write df to s3, using the following command: > "df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)". > > Is there any way to get the partitions create

Getting PySpark Partitions Locations

2020-06-25 Thread Tzahi File

Hi, I'm using pyspark to write df to s3, using the following command: "df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)". Is there any way to get the partitions created? e.g. day=2020-06-20/hour=1/country=US day=2020-06-20/hour=2/country=US .. -- Tzahi File

Re: Where are all the jars gone ?

2020-06-25 Thread Anwar AliKhan

I know I can arrive at the same result with this code, val range100 = spark.range(1,101).agg((sum('id) as "sum")).first.get(0) println(f"sum of range100 = $range100") so I am not stuck, I was just curious 😯 why the code breaks using the current link libraries. spark.range(1,101).r

Suggested Amendment to ./dev/make-distribution.sh

2020-06-25 Thread Anwar AliKhan

😯 May I suggest😎 amending your ./dev/make-distribution.sh. 🤐 To include a 😬 check if these two previously mentioned packages 😍 are installed and if not 🤔 install them as part of build process . The build process time 😱will increase if the packages are not installed. Long build process is normal 😴

Metrics Problem

Blog : Apache Spark Window Functions

Re: Getting PySpark Partitions Locations

Re: Getting PySpark Partitions Locations

Re: Getting PySpark Partitions Locations

Re: Getting PySpark Partitions Locations

Getting PySpark Partitions Locations

Re: Where are all the jars gone ?

Suggested Amendment to ./dev/make-distribution.sh

9 matches

Site Navigation

Mail list logo

Footer information