Spark has sc.wholeTextFiles() which returns RDD of tuple. First element of
tuple if the file name and second element is the file content.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mai
This does not look like Spark error. Looks like yarn has not been able to
allocate resources for spark driver. If you check resource manager UI you
are likely to see this as spark application waiting for resources. Try
reducing the driver node memory and/ or other bottlenecks based on what you
see
Not sure why you are dividing by 1000. from_unixtime expects a long type
which is time in milliseconds from reference date.
The following should work:
val ds = dataset.withColumn("hour",hour(from_unixtime(dataset.col("ts"
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com
Try to_json on the vector column. That should do it.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
If your dataframe has columns types like vector then you cannot save as csv/
text as there are no direct equivalent supported by flat formats like csv/
text. You may need to convert the column type appropriately (eg. convert the
incompatible column to StringType before saving the output as csv. You
Twitter functionality is not part of Core Spark. We have successfully used
the following packages from maven central in past
org.apache.bahir:spark-streaming-twitter_2.11:2.2.0
Earlier there used to be a twitter package under spark, but I find that it
has not been updated beyond Spark 1.6
org.ap
Looking at description of problem window functions may solve your issue. It
allows operation over a window that can include records before/ after the
particular record
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
--