Hi. I have a DateType column and I want to filter all the values greater or
equal than a certain Timestamp. This works, for example
df.col(columnName).geq(value) evaluates to a column with DateTypes greater
or equal than value. Except for one case: if the value of the Timestamp is
initialized to "1
I've been trying to figure out this one for some time now, I have JSONs
representing Products coming (physically) partitioned by Brand and I would like
to create a DataFrame from the JSON but also keep the partitioning information
(Brand)
```
case class Product(brand: String, value: String)
val
try divide and conquer, create a column x for the fist character of userid,
and group by company+x. if still too large, try first two character.
On 17 July 2018 at 02:25, 崔苗 wrote:
> 30G user data, how to get distinct users count after creating a composite
> key based on company and userid?
>
>
Hi All,
I am trying to use Spark 2.2.0
Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0)
code to run the Hive Query on Kerberos Enabled cluster. Spark-submit's fail
for the Hive Queries, but pass when I am trying to access the hdfs. Is this a
known limitation
I generally write to Parquet when I want to repeat the operation of reading
data and perform different operations on it every time. This would save db
time for me.
Thanks
Muthu
On Thu, Jul 19, 2018, 18:34 amin mohebbi
wrote:
> We do have two big tables each includes 5 billion of rows, so my que