Anyone using this combination for prod? I am planning to use for a use case
with 15000 events per second from few Kafka topics. Through events are big,
I would just have to take the businessIds, frequency, first and last event
timestamp and save this into mapGroupWithState. I need to keep them for
Hi,
did you try exploding the arrays, then doing the aggregation/count and
at the end applying a udf to add the 0 values ?
my experience is working on arrays is usually a bad idea.
sakag writes:
> Hi all,
>
> We have a rather interesting use case, and are struggling to come up with an
> appr
An interesting puzzle indeed.
What is your measure of "that scales"? Does not fail, does not spill,
does not need a huge amount of memory / disk, is O(N), processes X
records per second and core?
Enrico
Am 11.03.20 um 16:59 schrieb sakag:
Hi all,
We have a rather interesting use case, an
foreachBatch gives you the micro-batch as a DataFrame, which is
distributed. If you don't call collect on that DataFrame, it shouldn't have
any memory implications on the Driver.
On Tue, Mar 10, 2020 at 3:46 PM Ruijing Li wrote:
> Hi all,
>
> I’m curious on how foreachbatch works in spark struct
Hi all,
We have a rather interesting use case, and are struggling to come up with an
approach that scales. Reaching out to seek your expert opinion/feedback and
tips.
What we are trying to do is to find the count of numerical ids over a
sliding time window where each of our data records has a
Hi Team,
We are trying to read hbase table from spark using hbase-spark connector.
But our job is failing in the pushdown part of the filter in stage 0, due
the below error. kindly help us to resolve this issue.
caused by : java.lang.NoClassDefFoundError:
scala/collection/immutable/StringOps
at Dyn