Format RDD/SchemaRDD contents to screen?

2015-05-29 Thread Minnow Noir
I"m trying to debug query results inside spark-shell, but finding it cumbersome to save to file and then use file system utils to explore the results, and .foreach(print) tends to interleave the results among the myriad log messages. Take() and collect() truncate. Is there a simple way to present

Extracting k-means cluster values along with centers?

2015-06-12 Thread Minnow Noir
Greetings. I have been following some of the tutorials online for Spark k-means clustering. I would like to be able to just "dump" all the cluster values and their centroids to text file so I can explore the data. I have the clusters as such: val clusters = KMeans.train(parsedData, numClusters,

Convert Spark SQL table to RDD in Scala / error: value toFloat is a not a member of Any

2015-03-22 Thread Minnow Noir
I'm following some online tutorial written in Python and trying to convert a Spark SQL table object to an RDD in Scala. The Spark SQL just loads a simple table from a CSV file. The tutorial says to convert the table to an RDD. The Python is products_rdd = sqlContext.table("products").map(lambda

Arguments/parameters in Spark shell scripts?

2015-03-29 Thread Minnow Noir
How does one consume parameters passed to a Scala script via spark-shell -i? 1. If I use an object with a main() method, the println outputs nothing as if not called: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object Test { de

Query REST web service with Spark?

2015-03-31 Thread Minnow Noir
We have have some data on Hadoop that needs augmented with data only available to us via a REST service. We're using Spark to search for, and correct, missing data. Even though there are a lot of records to scour for missing data, the total number of calls to the service is expected to be low, so