Thanks for the detailed explanation. Just tested it, worked like a charm.
On Mon, Jun 20, 2016 at 1:02 PM, N B wrote:
> Its actually necessary to retire keys that become "Zero" or "Empty" so to
> speak. In your case, the key is "imageURL" and values are a dictionary, one
> of whose fields is "co
Its actually necessary to retire keys that become "Zero" or "Empty" so to
speak. In your case, the key is "imageURL" and values are a dictionary, one
of whose fields is "count" that you are maintaining. For simplicity and
illustration's sake I will assume imageURL to be a strings like "abc". Your
s
Hi,
According to the docs (
https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.DStream.reduceByKeyAndWindow),
filerFunc can be used to retain expiring keys. I do not want to retain any
expiring key, so I do not understand how can this help me stabilize it.
Plea
We had this same issue with the reduceByKeyAndWindow API that you are
using. For fixing this issue, you have to use different flavor of that
API, specifically the 2 versions that allow you to give a 'Filter function'
to them. Putting in the filter functions helped stabilize our application
too.
H
Hi all,
I have a python streaming job which is supposed to run 24x7. I am unable to
stabilize it. The job just counts no of links shared in a 30 minute sliding
window. I am using reduceByKeyAndWindow operation with a batch of 30
seconds, slide interval of 60 seconds.
The kafka queue has a rate of