How to use an anonymous function with DataFrame.explode() ?

2017-01-07 Thread dpapathanasiou
I need to take a DataFrame of events, and explode them row-wise so that there's at least one representation per time interval (usually day) in between events. Here's a simplified version of the problem, which I have gotten to work in spark-shell: case class Meal (food: String, calories: Double, d

Re: Spark Read from Google store and save in AWS s3

2017-01-07 Thread neil90
Here is how you would read from Google Cloud Storage(note you need to create a service account key) -> os.environ['PYSPARK_SUBMIT_ARGS'] = """--jars /home/neil/Downloads/gcs-connector-latest-hadoop2.jar pyspark-shell""" from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSess

Spark 2.0.2, KyroSerializer, double[] is not registered.

2017-01-07 Thread Yan Facai
Hi, all. I enable kyro in spark with spark-defaults.conf: spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryo.registrationRequired true A KryoException is raised when a logistic regression algorithm is running: Note: To register this class use: kryo.register(doub

Re: Spark 2.0.2, KyroSerializer, double[] is not registered.

2017-01-07 Thread smartzjp
You can have a try the following code. ObjectArraySerializer serializer = new ObjectArraySerializer(kryo, Double[].class); kryo.register(Double[].class, serializer); --- Hi, all. I enable kyro in spark with spark-defaults.conf: spark.serializer org.apache.spark.ser