Hi Julian, thanks for your answer and your insights. I agree with you on many points (especially our last discussion on the Calcite ML made me think a lot). So I agree with your "layered" approach, and in fact this is what we currently do (without stating it explicit enough, I think).
Basically, we do two thinks, I guess.. first, we provide a (Java-)DSL to make it easy to write specific operations (and do some very limited optimization, not at all comparable to what Calcite does). Second, we also provide some functions which are useful or necessary for signal processing (smoothing, filtering, ...) and we plan to extend them soon with things like short or long term predictions, anomaly detection, ... . By providing suitable wrappers for all that stuff we are able to translate this to "real" streaming engines (currently Flink and Akka Streams) and run it there. And indeed MATCH_RECOGNIZE could be a good implementation for many situations (definitely not all) and I hope that I can contribute soon to your recent work (I will continue the discussion on the Calcite list). But overall I'm really unsure if our problem can be seen as a problem of relational algebra. I know and like the overall framework very much (it's one of the most elegant applications of math I've seen so far I would even say). But it feels like it doesn’t fit that well. As soon as you have a problem where relations are related, even for simple things like LAG or LEAD as window functions it gets pretty complicated and unnatural with regards to the definition of the algebra. But, as I'm lacking a lot of expertise there I would love to discuss the matter further with you (but again, I think we should do it on the calcite list). The following small ASCII Image depicts my thinking of these "layers", and from our perspective MATCH_RECOGNIZE is one way to solve the problem and we can also provide "native" blocks to run directly on a streaming engine and there are surely pros and cons for both sides: O CRUNCH Evaluation | ---------------------- | | STREAM Rel. Expression with MATCH_RECOGNIZE | | Streaming Engines | | SQL based Engines So, I'm not exactly sure what approach you would prefer from your mail, but my suggestion for the next steps with CRUNCH would be to enrich the DSL, add more domain specific functions, find more use-cases and get more users on-board. So to say, work on the semantics side of things. But in parallel we should follow a path to get a better separation of "business logic" and execution with support for multiple frameworks and especially the relational algebra side. Perhaps, we can conclude at one point that we can cover everything by Calcite (I'm skeptical right now) but I think whats needed for this discussion is a valid basis to also show you calcite devs what exactly we are doing in-depth. Julian Am 16.12.18, 08:20 schrieb "Julian Hyde" <jh...@apache.org>: Hi Julian, Regarding whether to do this as a streaming engine (with its own query language) or as a framework above a streaming engine, I’d say that’s a false choice. If there is relational algebra inside your system, you can provide a high-level query language that can be translated to a lower-level query language in a streaming engine. This approach of “layered” databases has worked well for me for several projects, and is ever more applicable these days as data is becoming federated. You and I have discussed SQL’s MATCH_RECOGNIZE clause as a way to build complex time-based logic. You have probably noticed that is now in Flink, I am working on it in Calcite, and Beam will probably get it at some point. Even if MATCH_RECOGNIZE doesn’t solve your problem, let’s follow the same approach - convert your problem to a DSL that maps to or extends relational algebra, and then figure out how to translate that to SQL in an underlying engine. Calcite is a very good platform for building new “data languages”, so let’s carry on talking. Julian > On Dec 14, 2018, at 2:11 AM, Julian Feinauer <j.feina...@pragmaticminds.de> wrote: > > Hi all, > > I just joined the incubator ML and wanted to present myself and possibly also start a discussion about a software project we developed in the past. > But first things first. My name is Julian Feinauer and I come from Germany where I run two “start-up” companies where we work a lot on the “industrial IoT” topics, data science and processing of “larger amounts of data”. We love open source and so we love the ASF. Most notably, I closely follow the Apache Calcite project and hopefully find some time soon to contribute a bit more than in the last monts. Futhermore, I am engaged in the (incubating) PLC4X project as (P)PMC and in the (incubating) Edgent project where I try to “revive” the community as new (P)PMC together with Christopher Dutz. > > Now to the real topic. Over the last 3 years I started to develop a “Framework/Library” (currently a set of jars) to facilitate processing of timeseries data. The focus is mostly on processing of data from test stands, e.g., automotive tests, driving profiles and so on. Furthermore, in the recent year we added a lot of functionality for processing of “industrial data”. This means that we want to make it easy to analyze things like “how long did the machine spend in this state”, “when are the following set of bits set” or “nofity when the following conditions is true for the first time”. > It is a bit technical and I don’t want to go too deep into it, but generally speaking we try to introduce the “right” semantics to answer the typical questions when analyzing machine or test data. This project is called “CRUNCH” and we are in the process of making it open source (will be moved to a public github repo in this year) under the Apache 2.0 License. > > As there can be seen a close relationship to other (incubating or TLP) projects we are thinking about if this project could fit into the incubator. Some examples for Apache projects that we see as “related” are Apache Flink (which we can use as the Streaming Engine to process the stream), (incubating) Edgent which we also can support as Streaming Engine and where we try to find a suitable project goal and community currently as some of the (P)PMC members retired or went inactive. Finally, CRUNCH has a very natural fit with PLC4X because it can directly process the data gathered form PLCs (and in fact we are already using it in some of our projects that way). I had several discussions with some of the (P)PMCs of PLC4X, namely Sebastian Rühl and Christpher Dutz wo encouraged me to introduce the project to the incubator because they also see some potential for the project to enrich the OSS ecosystem with regards to edge / stream processing of (I)IoT data. > > So please feel free to ask questions or discuss your view on this topic as I would like to find out if this project could fit in the Apache Ecosystem and the Incubator or not. > > Thank you already! > Julian --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org