Hi,

Just so that we understand the intention why do you need to know the
file size? Are you not using splittable file format?

If you use spark streaming to read the files, using just once, then you
will be able to get the metadata of the files I believe.



Regards,
Gourav Sengupta

On Sun, Jun 19, 2022 at 8:00 AM Enrico Minack <i...@enrico.minack.dev>
wrote:

> Given you already know your input files (input_file_name), why not getting
> their size and summing this up?
>
> import java.io.Fileimport java.net.URIimport 
> org.apache.spark.sql.functions.input_file_name
> ds.select(input_file_name.as("filename"))
>   .distinct.as[String]
>   .map(filename => new File(new URI(filename).getPath).length)
>   .select(sum($"value"))
>   .show()
>
>
> Enrico
>
>
> Am 19.06.22 um 03:16 schrieb Yong Walt:
>
> import java.io.Fileval someFile = new File("somefile.txt")val fileSize = 
> someFile.length
>
> This one?
>
>
> On Sun, Jun 19, 2022 at 4:33 AM mbreuer <msbre...@gmail.com> wrote:
>
>> Hello Community,
>>
>> I am working on optimizations for file sizes and number of files. In the
>> data frame there is a function input_file_name which returns the file
>> name. I miss a counterpart to get the size of the file. Just the size,
>> like "ls -l" returns. Is there something like that?
>>
>> Kind regards,
>> Markus
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>

Reply via email to