e.org<mailto:users@datasketches.apache.org>
mailto:users@datasketches.apache.org>>
Subject: [External] Re: Choice of Flink vs Spark for using DataSketches with
streaming data
Hi,
I've implemented jobs using datasketches in Kafka Streams, Flink streaming, and
in Spark batch (through the
>
>>
>> In the longer term (later this year), one option we might consider is
>> creating an OSS configurable library/ framework for running checks based on
>> DataSketches in Flink (we also need to see whether for example Bullet
>> already covers a lot of
ers a lot of what we need in terms of setting up stream
> queries). If anyone else feels there is a gap and might be interested in
> collaborating, please let me know and I can publish more details of what
> we’re proposing if and when that evolves.
>
>
>
> Many thanks
>
>
@datasketches.apache.org
Subject: [External] Re: Choice of Flink vs Spark for using DataSketches with
streaming data
Hi,
I've implemented jobs using datasketches in Kafka Streams, Flink streaming, and
in Spark batch (through the Hive UDFs provided). Things went smoothly in all
setups, with the gotcha
Hi,
I've implemented jobs using datasketches in Kafka Streams, Flink streaming,
and in Spark batch (through the Hive UDFs provided). Things went smoothly
in all setups, with the gotcha that hive UDFs represent incoming strings as
utf-8 byte arrays (or something like that, i forgot by now), so if yo
I'll echo what Ben said -- if a pre-existing solution does what you need,
certainly use that.
Having said that, I want to revisit frequent directions in light of the
work Charlie did on using it for ridge regression. And when I asked
internally I was told that Flink is where at least my company se
I can't answer about Spark or Flink, but as a druid person, I'll put in a
plug for druid for the "if necessary" case. It can ingest from kafka and
aggregate and do sketches during ingestion. (It's a whole new ballpark,
though, if you're not already using it.)
On Tue, Apr 6, 2021 at 9:56 AM Alex