Is there a version of withColumn or withColumnRenamed that accept Column
instead of String? That way I can specify FQN in case when there is
duplicate column names.
I can Drop column based on Column type argument then why can't I rename
them based on same type argument.
Use case is, I have Datafr
I've come across an issue with Mesos 1.4.1 and Spark 2.2.1. We launch Spark
tasks using the MesosClusterDispatcher in cluster mode. On a couple of
occasions, we have noticed that when the Spark Driver crashes (to various
causes - human error, network error), sometimes, when the Driver is
restarted,
This afternoon @ 3pm pacific I'll be looking at review tooling for Spark &
Beam https://www.youtube.com/watch?v=ff8_jbzC8JI.
Next week's regular Friday code (this time July 20th @ 9:30am pacific)
review will once again probably have more of an ML focus for folks
interested in watching Spark ML PRs
Hi,
I would like to compare different implementations of linear regression (and
possibly generalised linear regression) in Spark. I was wondering why the
functions for linear regression (and GLM) with stochastic gradient descent
have been deprecated?
I have found some old posts of people having
I have a columns like below
root
|-- metadata: struct (nullable = true)
||-- "drop":{"dropPath":"
https://dstpath.media27.ec2.st-av.net/drop?source_id: string (nullable =
true)
||-- "selection":{"AlllURL":"
https://dstpath.media27.ec2.st-av.net/image?source_id: string (
Just thinking out loud… repartition by key? create a composite key based on
company and userid?
How big is your dataset?
> On Jul 13, 2018, at 06:20, 崔苗 wrote:
>
> Hi,
> when I want to count(distinct userId) by company,I met the data skew and the
> task takes too long time,how to count disti
Hi,
when I want to count(distinct userId) by company,I met the data skew and the
task takes too long time,how to count distinct by keys on skew data in spark
sql ?
thanks for any reply