Hi,
I’m running into a problem of tensorflow-data-validation with direct runner
to generate statistics from some large datasets over 400GB.
It seems that all workers stopped working after an error message of
“Keepalive watchdog fired. Closing transport.” It seems to be a grpc
keepalive timeout.
Generally you should not rely on PCollection being ordered, though there
have been discussions about adding some time-ordering semantics.
On Sun, Aug 23, 2020 at 9:06 PM Rui Wang wrote:
> Current Beam model does not guarantee an ordering after a GBK (i.e.
> Combine.perKey() in your). So you ca
Another person reported something similar for Dataflow and it seemed as
though in their scenario they were using locks and either got into a
deadlock or starved processing for long enough that the watchdog also
failed. Are you using locks and/or having really long single element
processing times?
As for the question of writing tests in the face of non-determinism,
you should look into TestStream. MyStatefulDoFn still needs to be
updated to not assume an ordering. (This can be done by setting timers
that provide guarantees that (modulo late data) one has seen all data
up to a certain timesta
Hi contributors,
Sorry to bother you! I met a problem when I was trying to apply a windowing
aggregation Beam SQL query to a Kafka input source.
The details of the question are in the following link:
https://stackoverflow.com/questions/63566057/how-to-integrate-beam-sql-windowing-query-with-kafka
Hello,
Firstly I am on java sdk 2.23.0 and we heavily use Elasticsearch as one of
our sinks.
It's been a while since beam got upgraded to support elasticsearch version
greater than 6.x.
Elasticsearch has now moved on to 7.x and we want to use some of their new
security features.
I want to know w
This ticket indicates Elasticsearch 7.x has been supported since Beam 2.19:
https://issues.apache.org/jira/browse/BEAM-5192
Are there any specific features you need that aren't supported?
On Mon, Aug 24, 2020 at 11:33 AM Mohil Khare wrote:
> Hello,
>
> Firstly I am on java sdk 2.23.0 and we hea
Hi,
I checked the query in your SO question and I think the SQL usage is
correct.
My current guess is that the problem is how does watermark generate and
advance in KafkaIO. It could be either the watermark didn't pass the end of
your SQL window for aggregation or the data was lagging behind the
hello Kyle,
Thanks a lot for your prompt reply. Hmm.. strange, I think I was getting
some error that version was not supported. Not sure if I updated my beam
version last time when I tried with ES 7.x.
Let me try it again and let you know.
Thanks and Regards
Mohil
On Mon, Aug 24, 2020 at 11:54 A
Thanks Reuven for the input and Wang for CC'ing to Reuven.
Generally you should not rely on PCollection being ordered
Is it because Beam splits PCollection into multiple input splits and tries
to process it as efficiently as possible without considering times?
This one is very confusing as I've b
10 matches
Mail list logo