Re: [External] Re: [E] Re: Choice of Flink vs Spark for using DataSketches with streaming data

2021-04-08 Thread Alex Garland
e.org<mailto:users@datasketches.apache.org> mailto:users@datasketches.apache.org>> Subject: [External] Re: Choice of Flink vs Spark for using DataSketches with streaming data Hi, I've implemented jobs using datasketches in Kafka Streams, Flink streaming, and in Spark batch (through the

Re: [E] Re: Choice of Flink vs Spark for using DataSketches with streaming data

2021-04-08 Thread Marko Mušnjak
> >> >> In the longer term (later this year), one option we might consider is >> creating an OSS configurable library/ framework for running checks based on >> DataSketches in Flink (we also need to see whether for example Bullet >> already covers a lot of

Re: [E] Re: Choice of Flink vs Spark for using DataSketches with streaming data

2021-04-08 Thread Will Lauer
ers a lot of what we need in terms of setting up stream > queries). If anyone else feels there is a gap and might be interested in > collaborating, please let me know and I can publish more details of what > we’re proposing if and when that evolves. > > > > Many thanks > >

Re: Choice of Flink vs Spark for using DataSketches with streaming data

2021-04-08 Thread Alex Garland
@datasketches.apache.org Subject: [External] Re: Choice of Flink vs Spark for using DataSketches with streaming data Hi, I've implemented jobs using datasketches in Kafka Streams, Flink streaming, and in Spark batch (through the Hive UDFs provided). Things went smoothly in all setups, with the gotcha

Re: Choice of Flink vs Spark for using DataSketches with streaming data

2021-04-06 Thread Marko Mušnjak
Hi, I've implemented jobs using datasketches in Kafka Streams, Flink streaming, and in Spark batch (through the Hive UDFs provided). Things went smoothly in all setups, with the gotcha that hive UDFs represent incoming strings as utf-8 byte arrays (or something like that, i forgot by now), so if yo

Re: Choice of Flink vs Spark for using DataSketches with streaming data

2021-04-06 Thread Jon Malkin
I'll echo what Ben said -- if a pre-existing solution does what you need, certainly use that. Having said that, I want to revisit frequent directions in light of the work Charlie did on using it for ridge regression. And when I asked internally I was told that Flink is where at least my company se

Re: Choice of Flink vs Spark for using DataSketches with streaming data

2021-04-06 Thread Ben Krug
I can't answer about Spark or Flink, but as a druid person, I'll put in a plug for druid for the "if necessary" case. It can ingest from kafka and aggregate and do sketches during ingestion. (It's a whole new ballpark, though, if you're not already using it.) On Tue, Apr 6, 2021 at 9:56 AM Alex