[SparkStreaming] How To Stop The SparkStreamingContenxt Gracefully Without Extra Time Cost?

2020-10-11 Thread Lyx
hi, I've being using StreamingContext.stop(true,true) ,trying to stop my application gracefully,which means it can promise all received data will be processed before the whole application terminated. It dose works ,but I also noticed that it will also lead to extra time just waiting for empty rd

??Spark ML??How to get access of the MLlib's LogisticRegressionWithSGD after 3.0.0?

2020-09-21 Thread Lyx
Hi,     I have updated my Spark to the version of 3.0.0, and it seems that the LogisticRegressionWithSGD's  constructor is private within the mllib  package,  So i can't initialize an instance of this class. How can i still use this class rather than switch to the LogisticRegressionWithLBFGS ? 

[Spark SQL] issue about diffrence in memory size between DataFrame and RDD

2020-04-19 Thread Lyx
Hello,    I'm using Spark to deal with my project these days, however i noticed that when load data stored in Hadoop hdfs, it seems that there is a huge difference in JVM memory size between using DataFrame and using RDD format.Below lists my shell script  when using spark-shell, my original