input file size

2022-06-18 Thread mbreuer
Hello Community, I am working on optimizations for file sizes and number of files. In the data frame there is a function input_file_name which returns the file name. I miss a counterpart to get the size of the file. Just the size, like "ls -l" returns. Is there something like that? Kind rega

Re: input file size

2022-06-18 Thread marc nicole
Hi, I found this ( https://spark.apache.org/docs/latest/api/java/org/apache/spark/util/SizeEstimator.html) that may be helpful, i use Java: > org.apache.spark.util.SizeEstimator.estimate(dataset)); Le sam. 18 juin 2022 à 22:33, mbreuer a écrit : > Hello Community, > > I am working on opt

Re: input file size

2022-06-18 Thread Yong Walt
import java.io.Fileval someFile = new File("somefile.txt")val fileSize = someFile.length This one? On Sun, Jun 19, 2022 at 4:33 AM mbreuer wrote: > Hello Community, > > I am working on optimizations for file sizes and number of files. In the > data frame there is a function input_file_name wh

Re: input file size

2022-06-18 Thread Enrico Minack
Given you already know your input files (input_file_name), why not getting their size and summing this up? |import java.io.File ||import java.net.URI| |import| org.apache.spark.sql.functions.input_file_name |ds.select(input_file_name.as("filename")) .distinct.as[String] .map(filename => new F