See
https://databricks.com/blog/2016/05/19/approximate-algorithms-in-apache-spark-hyperloglog-and-quantiles.html
you most probably do not require exact counts.
Am Di., 11. Dez. 2018 um 02:09 Uhr schrieb 15313776907 <15313776...@163.com
>:
> i think you can add executer memory
>
> 15313776907
> 邮箱
i think you can add executer memory
| |
15313776907
|
|
邮箱:15313776...@163.com
|
签名由 网易邮箱大师 定制
On 12/11/2018 08:28, lsn24 wrote:
Hello,
I have a requirement where I need to get total count of rows and total
count of failedRows based on a grouping.
The code looks like below:
myDataset.createO
I found out the problem. Grouping by a constant column value is indeed
impossible.
The reason it was "working" in my project is that I gave the constant
column an alias that exists in the schema of the dataframe. The dataframe
contained a "data_timestamp" representing an hour, and I added to the
se
Doh :) Thanks.. seems like brain freeze.
On Fri, Feb 6, 2015 at 3:22 PM, Michael Armbrust
wrote:
> You can't use columns (timestamp) that aren't in the GROUP BY clause.
> Spark 1.2+ give you a better error message for this case.
>
> On Fri, Feb 6, 2015 at 3:12 PM, Mohnish Kodnani > wrote:
>
>>
You can't use columns (timestamp) that aren't in the GROUP BY clause.
Spark 1.2+ give you a better error message for this case.
On Fri, Feb 6, 2015 at 3:12 PM, Mohnish Kodnani
wrote:
> Hi,
> i am trying to issue a sql query against a parquet file and am getting
> errors and would like some help