based on this blog post https://sergei-ivanov.medium.com/why-you-should-not-use-randomsplit-in-pyspark-to-split-data-into-train-and-test-58576d539a36 , I noticed a recommendation against using randomSplit for data splitting due to data sorting. Is the information provided in the blog accurate? I understand that the reason for data sorting is to partition the data using Spark. Could anyone clarify whether we should continue using randomSplit to divide our data into training and test sets or use filter() instead?
Thank you