I found some main issues and wrote it on my blog:
https://eilianyu.wordpress.com/2016/10/27/be-aware-of-hidden-data-errors-using-spark-sas7bdat-pacakge-to-ingest-sas-datasets-to-spark/
Hello,
I wonder what is the state-of-art best practice to achieve best performance
running complicated SQL query today in 2016? I am new to this topic and
have read about
Hive on Tez
Spark on Hive
Spark SQL 2.0 (It seems Spark 2.0 supports complicated nest query)
The documentation I read sugge
Hello,
*Question 1: *I am new to Spark. I am trying to train classification model
on Spark DataFrame. I am using PySpark. And aFrame object in df:ted a
Spark DataFrame object in df:
from pyspark.sql.types import *
query = """select * from table"""
df = sqlContext.sql(query)
My question is how
a single Vector, 2nd is RDD[Vector]
>
> Robin
>
> On 12 Feb 2015, at 06:37, Shi Yu wrote:
>
> Hi there,
>
> I am new to spark. When training a model using K-means using the following
> code, how do I obtain the cluster assignment in the next step?
>
>
> val cluster
Hi there,
I am new to spark. When training a model using K-means using the
following code, how do I obtain the cluster assignment in the next
step?
val clusters = KMeans.train(parsedData, numClusters, numIterations)
I searched around many examples but they mostly calculate the WSSSE.
I am sti