Re: Creating custom Spark-Native catalyst/codegen functions

2019-08-21 Thread Georg Heiler
Look at https://github.com/DataSystemsLab/GeoSpark/tree/master/sql/src/main/scala/org/apache/spark/sql/geospark sql for an example. Using custom function registration and functions residing inside sparks private namespace should work. But I am not aware of a public user facing API. Is there any I

Creating custom Spark-Native catalyst/codegen functions

2019-08-21 Thread Arwin Tio
Hi friends, I am looking into converting some UDFs/UDAFs to Spark-Native functions to leverage Catalyst and codegen. Looking through some examples (for example: https://github.com/apache/spark/pull/7214/files for Levenshtein) it seems like we need to add these functions to the Spark framework

Re: Questions for platform to choose

2019-08-21 Thread Magnus Nilsson
Well, you are posting on the Spark mailing list. Though for streaming I'd recommend Flink over Spark any day of the week. Flink was written as a streaming platform from the beginning quickly aligning the API with the theoretical framework of Google's Dataflow whitepaper. It's awesome for streaming.

Re: Questions for platform to choose

2019-08-21 Thread Mich Talebzadeh
Hi, What is the definition of real time here? The engineering definition of real time is roughly fast enough to be interactive. However, I put a stronger definition. In real time application or data, there is no such thing as an answer which is supposed to be late and correct. The timeliness is p

Re: Questions for platform to choose

2019-08-21 Thread Eliza
Also I found a excellent article for comparision. https://www.linkedin.com/pulse/spark-streaming-vs-flink-storm-kafka-streams-samza-choose-prakash regards. on 2019/8/21 16:53, Eliza wrote: Hi, on 2019/8/21 16:44, Aziret Satybaldiev wrote: In my experience, Kafka + Spark streaming + (perhaps HB

Re: Questions for platform to choose

2019-08-21 Thread Eliza
Hi, on 2019/8/21 16:44, Aziret Satybaldiev wrote: In my experience, Kafka + Spark streaming + (perhaps HBase if you want to store the metrics) is so far the best combo. Not only because the technology is mature, but also because there are a lot of examples available on the web and books. They t

Re: Questions for platform to choose

2019-08-21 Thread Aziret Satybaldiev
Hi Eliza, In my experience, Kafka + Spark streaming + (perhaps HBase if you want to store the metrics) is so far the best combo. Not only because the technology is mature, but also because there are a lot of examples available on the web and books. They typically should cover most of what you woul