Reading the last line of each file in a set of text files

2021-08-02 Thread Sayeh Roshan
Hi users, Does anyone here has experience with written spark code that just read the last line of each text file in a directory, s3 bucket, etc? I am looking for a solution that doesn’t require reading the whole file. I basically wonder whether you can create a data frame/Rdd using file seek. Not s

Sparse vs. Dense vector memory usage

2021-08-02 Thread Gerard Maas
Dear Spark folks, Is there somewhere a guideline on the density tipping point when it makes more sense to use a spark ml dense vector vs. a sparse vector with regards to the memory usage on fairly large (image processing) vectors? My google-foo didn't deliver me anything useful. Thanks in advance