Re: Integration of DataSketches into Flink

2020-04-29 Thread leerho
Seth, Thanks for the enthusiastic reply. However, I have some questions ... and concerns :) 1) Create a page on the flink packages website. I looked at this website and it raises a number of red flags for me: - There is no instructions anywhere on the site on how to add a listing. - The

Re: Integration of DataSketches into Flink

2020-04-27 Thread Seth Wiesman
One more point I forgot to mention. Flink SQL supports Hive UDF's[1]. I haven't tested it, but the datasketch hive package should just work out of the box. Seth [1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_functions.html On Mon, Apr 27, 2020 at 2:27 PM Se

Re: Integration of DataSketches into Flink

2020-04-27 Thread Seth Wiesman
Hi Lee, I really like this project, I used it with Flink a few years ago when it was still Yahoo DataSketches. The projects clearly complement each other. As Arvid mentioned, the Flink community is trying to foster an ecosystem larger than what is in the main Flink repository. The reason is that t

Re: Integration of DataSketches into Flink

2020-04-27 Thread Flavio Pompermaier
If this can encourage Lee I'm one of the Flink users that already use datasketches and I found it an amazing library. When I was trying it out (lat year) I tried to stimulate some discussion[1] but at that time it was probably too early.. I really hope that now things are mature for both communitie

Re: Integration of DataSketches into Flink

2020-04-27 Thread leerho
Hi Arvid, Note: I am dual listing this thread on both dev lists for better tracking. 1. I'm curious on how you would estimate the effort to port datasketches >to Flink? It already has a Java API, but how difficult would it be to >subdivide the tasks into parallel chunks of work? Since

Re: Integration of DataSketches into Flink

2020-04-27 Thread Arvid Heise
Hi Lee, I must admit that I also heard of data sketches for the first time (there are really many Apache projects). Datasketches sounds really exciting. As a (former) data engineer, I can 100% say that this is something that (end-)users want and need and it would make so much sense to have it in