randomsplit has issue?

second_co...@yahoo.com.INVALID Wed, 31 Jan 2024 02:04:15 -0800

based on this blog post 
https://sergei-ivanov.medium.com/why-you-should-not-use-randomsplit-in-pyspark-to-split-data-into-train-and-test-58576d539a36
 , I noticed a recommendation against using randomSplit for data splitting due 
to data sorting. Is the information provided in the blog accurate? I understand 
that the reason for data sorting is to partition the data using Spark. Could 
anyone clarify whether we should continue using randomSplit to divide our data 
into training and test sets or use filter() instead?


Thank you

randomsplit has issue?

Reply via email to