See
https://databricks.com/blog/2016/05/19/approximate-algorithms-in-apache-spark-hyperloglog-and-quantiles.html
you most probably do not require exact counts.
Am Di., 11. Dez. 2018 um 02:09 Uhr schrieb 15313776907 <15313776...@163.com
>:
> i think you can add executer memory
>
> 15313776907
> 邮箱
Please refer the structured streaming guide doc which is very clear of
representing when the query will have unbounded state.
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#inner-joins-with-optional-watermarking
Quoting the doc:
In other words, you will hav
You mean to say that Spark will store all the data in memory forever :)
> On 10-Dec-2018, at 6:16 PM, Sandeep Katta
> wrote:
>
> Hi Abhijeet,
>
> You are using inner join with unbounded state which means every data in
> stream ll match with other stream infinitely,
> If you want the inten
i think you can add executer memory
| |
15313776907
|
|
邮箱:15313776...@163.com
|
签名由 网易邮箱大师 定制
On 12/11/2018 08:28, lsn24 wrote:
Hello,
I have a requirement where I need to get total count of rows and total
count of failedRows based on a grouping.
The code looks like below:
myDataset.createO
Hello,
I have a requirement where I need to get total count of rows and total
count of failedRows based on a grouping.
The code looks like below:
myDataset.createOrReplaceTempView("temp_view");
Dataset countDataset = sparkSession.sql("Select
column1,column2,column3,column4,column5,column6,co
Hello,
I’m using watermark to join two streams as you can see below:
val order_wm = order_details.withWatermark("tstamp_trans", "20 seconds")
val invoice_wm = invoice_details.withWatermark("tstamp_trans", "20 seconds")
val join_df = order_wm
.join(invoice_wm, order_wm.col("s_order_id") === invo