Hi
Just upgraded to Spark 1.3.1.
I am getting an warning
Warning (from warnings module):
File
"D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\sql\context.py",
line 191
warnings.warn("inferSchema is deprecated, please use createDataFrame
instead")
UserWarning: inferSchema is deprecated, please use createDataFrame instead
However, documentation still says to use inferSchema.
Here: http://spark.apache.org/docs/latest/sql-programming-guide.htm in
section
Also, I am getting an error in mlib.ALS.train function when passing
dataframe (do I need to convert the DF to RDD?)
Code:
training = ssc.sql("select userId,movieId,rating from ratings where
partitionKey < 6").cache()
print type(training)
model = ALS.train(training,rank,numIter,lmbda)
Error:
<class 'pyspark.sql.dataframe.DataFrame'>
Rank:8 Lmbda:1.0 iteration:10
Traceback (most recent call last):
File "D:\Project\Spark\code\movie_sql.py", line 109, in <module>
bestConf = getBestModel(sc,ssc,training,validation,validationNoRating)
File "D:\Project\Spark\code\movie_sql.py", line 54, in getBestModel
model = ALS.train(trainingRDD,rank,numIter,lmbda)
File
"D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\mllib\recommendation.py",
line 139, in train
model = callMLlibFunc("trainALSModel", cls._prepare(ratings), rank,
iterations,
File
"D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\mllib\recommendation.py",
line 127, in _prepare
assert isinstance(ratings, RDD), "ratings should be RDD"
AssertionError: ratings should be RDD
--
Best Regards,
Ayan Guha