Hello.
I am running Spark 2.4.4. I have implemented a custom metrics producer. It
works well when I run locally, or specify the metrics producer only for the
driver. When I ask for executor metrics I run into ClassNotFoundExceptions
*Is it possible to pass a metrics JAR via --jars? If so what a
Hi Team,
I would like to share with the community that my blog on "Apache Spark
Window Functions" got published. PFB link if anyone interested.
Link:
https://medium.com/expedia-group-tech/deep-dive-into-apache-spark-window-functions-7b4e39ad3c86
Please share your thoughts and feedback.
Rega
You can always list the S3 output path, of course.
On Thu, Jun 25, 2020 at 7:52 AM Tzahi File wrote:
> Hi,
>
> I'm using pyspark to write df to s3, using the following command:
> "df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)".
>
> Is there any way to get the
You can use catalog apis see following
https://stackoverflow.com/questions/54268845/how-to-check-the-number-of-partitions-of-a-spark-dataframe-without-incurring-the/54270537
On Thu, Jun 25, 2020 at 6:19 AM Tzahi File wrote:
> I don't want to query with a distinct on the partitioned columns, the
I don't want to query with a distinct on the partitioned columns, the df
contains over 1 Billion of records.
I just want to know the partitions that were created..
On Thu, Jun 25, 2020 at 4:04 PM Jörn Franke wrote:
> By doing a select on the df ?
>
> Am 25.06.2020 um 14:52 schrieb Tzahi File :
>
By doing a select on the df ?
> Am 25.06.2020 um 14:52 schrieb Tzahi File :
>
>
> Hi,
>
> I'm using pyspark to write df to s3, using the following command:
> "df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)".
>
> Is there any way to get the partitions create
Hi,
I'm using pyspark to write df to s3, using the following command:
"df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)".
Is there any way to get the partitions created?
e.g.
day=2020-06-20/hour=1/country=US
day=2020-06-20/hour=2/country=US
..
--
Tzahi File
I know I can arrive at the same result with this code,
val range100 = spark.range(1,101).agg((sum('id) as
"sum")).first.get(0)
println(f"sum of range100 = $range100")
so I am not stuck,
I was just curious 😯 why the code breaks using the current link
libraries.
spark.range(1,101).r
😯 May I suggest😎 amending your ./dev/make-distribution.sh. 🤐
To include a 😬 check if these two previously mentioned packages 😍 are
installed and if not 🤔 install them
as part of build process . The build process time 😱will increase if the
packages are not installed. Long build process is normal 😴