date:20200311

structured streaming with mapGroupWithState

2020-03-11 Thread Srinivas V

Anyone using this combination for prod? I am planning to use for a use case with 15000 events per second from few Kafka topics. Through events are big, I would just have to take the businessIds, frequency, first and last event timestamp and save this into mapGroupWithState. I need to keep them for

Re: Time-based frequency table at scale

2020-03-11 Thread Nicolas Paris

Hi, did you try exploding the arrays, then doing the aggregation/count and at the end applying a udf to add the 0 values ? my experience is working on arrays is usually a bad idea. sakag writes: > Hi all, > > We have a rather interesting use case, and are struggling to come up with an > appr

Re: Time-based frequency table at scale

2020-03-11 Thread Enrico Minack

An interesting puzzle indeed. What is your measure of "that scales"? Does not fail, does not spill, does not need a huge amount of memory / disk, is O(N), processes X records per second and core? Enrico Am 11.03.20 um 16:59 schrieb sakag: Hi all, We have a rather interesting use case, an

Re: ForEachBatch collecting batch to driver

2020-03-11 Thread Burak Yavuz

foreachBatch gives you the micro-batch as a DataFrame, which is distributed. If you don't call collect on that DataFrame, it shouldn't have any memory implications on the Driver. On Tue, Mar 10, 2020 at 3:46 PM Ruijing Li wrote: > Hi all, > > I’m curious on how foreachbatch works in spark struct

Time-based frequency table at scale

2020-03-11 Thread sakag

Hi all, We have a rather interesting use case, and are struggling to come up with an approach that scales. Reaching out to seek your expert opinion/feedback and tips. What we are trying to do is to find the count of numerical ids over a sliding time window where each of our data records has a

Error in using hbase-spark connector

2020-03-11 Thread PRAKASH GOPALSAMY

Hi Team, We are trying to read hbase table from spark using hbase-spark connector. But our job is failing in the pushdown part of the filter in stage 0, due the below error. kindly help us to resolve this issue. caused by : java.lang.NoClassDefFoundError: scala/collection/immutable/StringOps at Dyn

structured streaming with mapGroupWithState

Re: Time-based frequency table at scale

Re: Time-based frequency table at scale

Re: ForEachBatch collecting batch to driver

Time-based frequency table at scale

Error in using hbase-spark connector

6 matches

Site Navigation

Mail list logo

Footer information