Hi,
I have an issue is reduce in streaming. I don't get a reduced stream when I
use a custom object.
Here is the code snippet that I used to test this.
Issue is, the reduction clearly works for a simple sequence, but what I
really want is to do is
send an array by adding such an array at a t
No, as you shuffle each time again (you always partition by different
windows)
Instead: could you choose a single window (w3 with more columns =fine
granular) and the nfilter out records to achieve the same result?
Or instead:
df.groupBy(a,b,c).agg(sort_array(collect_list(foo,bar,baz))
and then an
With the mentioned parameter the capacity can be increased but the main
question is more like why is that happening.
Even on a really beefy machine having more than 64 consumers is quite
extreme. Maybe better horizontal scaling (more executors) would be a better
option to reach maximum performance.
You can try improve setting spark.streaming.kafka.consumer.cache.maxCapacity
发件人: Shyam P [mailto:shyamabigd...@gmail.com]
发送时间: 2019年10月21日 20:43
收件人: kafka-clie...@googlegroups.com; spark users
主题: Issue : KafkaConsumer cache hitting max capacity of 64, removing consumer
for CacheKey
H
Hi ,
I am using spark-sql-2.4.1v with kafka
I am facing slow consumer issue
I see warning "KafkaConsumer cache hitting max capacity of 64, removing
consumer for
CacheKey(spark-kafka-source-33321dde-bfad-49f3-bdf7-09f95883b6e9--1249540122-executor)"
in logs
more on the same
https://stackoverfl
I looked into this. But I found it is possible like this
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala#L229
Line no 230. This is for executors.
Just wanna cross verify is that right?
On Mon, 21 Oct 2019, 17:24 Alonso Isidoro Rom
Hi All,
Any suggestions?
Thanks,
-Rishi
On Sun, Oct 20, 2019 at 12:56 AM Rishi Shah
wrote:
> Hi All,
>
> I have a use case where I need to perform nested windowing functions on a
> data frame to get final set of columns. Example:
>
> w1 = Window.partitionBy('col1')
> df = df.withColumn('sum1',
Hi,
I wanna monitor how much memory executor and task used for a given job. Is
there any direct method available for it which can be used to track this
metric?
--
*Sriram G*
*Tech*
Many clusters still use cdh5, and want to continue to support cdh5,cdh5
based on hadoop 2.6
melin li 于2019年10月21日周一 下午3:02写道:
> 很多集群还是使用cdh5,希望继续支持cdh5,cdh5是基于hadoop 2.6
>
> dev/make-distribution.sh --tgz -Pkubernetes -Pyarn -Phive-thriftserver
> -Phive -Dhadoop.version=2.6.0-cdh5.15.0 -DskipTes
很多集群还是使用cdh5,希望继续支持cdh5,cdh5是基于hadoop 2.6
dev/make-distribution.sh --tgz -Pkubernetes -Pyarn -Phive-thriftserver
-Phive -Dhadoop.version=2.6.0-cdh5.15.0 -DskipTest
```
[INFO] Compiling 25 Scala sources to
/Users/libinsong/Documents/codes/tongdun/spark-3.0/resource-managers/yarn/target/scala-2.12/
10 matches
Mail list logo