Hello Community,
I am working on optimizations for file sizes and number of files. In the
data frame there is a function input_file_name which returns the file
name. I miss a counterpart to get the size of the file. Just the size,
like "ls -l" returns. Is there something like that?
Kind rega
Hi,
I found this (
https://spark.apache.org/docs/latest/api/java/org/apache/spark/util/SizeEstimator.html)
that may be helpful, i use Java:
> org.apache.spark.util.SizeEstimator.estimate(dataset));
Le sam. 18 juin 2022 à 22:33, mbreuer a écrit :
> Hello Community,
>
> I am working on opt
import java.io.Fileval someFile = new File("somefile.txt")val fileSize
= someFile.length
This one?
On Sun, Jun 19, 2022 at 4:33 AM mbreuer wrote:
> Hello Community,
>
> I am working on optimizations for file sizes and number of files. In the
> data frame there is a function input_file_name wh
Given you already know your input files (input_file_name), why not
getting their size and summing this up?
|import java.io.File ||import java.net.URI|
|import| org.apache.spark.sql.functions.input_file_name
|ds.select(input_file_name.as("filename")) .distinct.as[String]
.map(filename => new F