Hi Julian and Masayuki, This indeed sounds quite important. Masayuki, thanks for taking the initiative. I would like to do I what I can to help. I can help with writing some of the operators, UDFs/UDF APIs, and integration with Calcite.
Thanks, Walaa. On Fri, Jun 29, 2018 at 11:40 AM Julian Hyde <[email protected]> wrote: > We already have two JIRA cases for Arrow integration: > https://issues.apache.org/jira/browse/CALCITE-2040 and > https://issues.apache.org/jira/browse/CALCITE-2173. > > I think this is an extremely important area of work for the Calcite > project, because it helps us realize the vision of a deconstructed > database[1]. There is a lot of work to do, much of it very interesting > (e.g. writing a thread scheduler, IPC mechanisms, and algorithms for > sort, join and aggregation that work effectively on Arrow data > structures). > > If you want to help Masayuki, please step up! > > Julian > > [1] > https://www.slideshare.net/julienledem/from-flat-files-to-deconstructed-database > > On Thu, Jun 28, 2018 at 2:24 PM, Michael Mior <[email protected]> wrote: > > That's great! If you could create a JIRA case to track your progress, > that > > would be helpful for others who might want to follow along or contribute. > > Thanks! > > > > -- > > Michael Mior > > [email protected] > > > > > > > > Le mar. 26 juin 2018 à 10:36, Masayuki Takahashi <[email protected]> > a > > écrit : > > > >> Hi Julian, > >> > >> > Masayuki Takahashi has started to develop an Arrow adapter for > >> Calcite[2], but a lot of work remains to implement all SQL built-in > >> functions and basic relational operators. Building on top of Gandiva we > >> could save a lot of this effort. > >> > >> I will start to build Gandiva development environment and try to > >> consider a way to incorporate. > >> > >> thanks. > >> > >> > >> > >> 2018年6月23日(土) 3:54 Julian Hyde <[email protected]>: > >> > > >> > Suppose a company wishes to build a graph database using their own > >> innovative graph index data structure. They nevertheless need to > implement > >> core relational algebra, core data types, and core built-in functions > (+, > >> CASE, SUM, SUBSTRING). And they want to implement these on a > >> memory-efficient data structure (tens of thousands of rows, stored > >> column-oriented, per memory block). This is a massive effort. > >> > > >> > With Calcite+Gandiva+Arrow they just need to create a sequence of > >> relational operators (using RelBuilder, say) and efficient machine code > is > >> generated. They can then start adding their own data types, built-in > >> functions, and relational operators, using the same architecture. > >> > > >> > Julian > >> > > >> > > >> > > On Jun 22, 2018, at 11:33 AM, Xiening Dai <[email protected]> > wrote: > >> > > > >> > > I was in a talk regarding Gandiva yesterday. Impressive work! > >> > > > >> > > But I am not sure why Calcite would like to integrate with it. To me > >> Gandiva is on execution side, in which scenarios a query planner would > need > >> a arrow engine? I read the original Jira about implementing file > >> enumerator, but the intent is still not clear to me. Would appreciate if > >> you can elaborate. Thanks. > >> > > > >> > > > >> > >> On Jun 22, 2018, at 11:20 AM, Julian Hyde <[email protected]> > wrote: > >> > >> > >> > >> There is a discussion on dev@arrow about Gandiva, a kernel for > >> Arrow[1]. > >> > >> > >> > >> I think it would be an interesting library on which to build our > >> Arrow engine. (Without a kernel, Arrow is just a data format, but with > >> Gandiva it becomes an engine upon which we can implement all relational > >> operations, albeit on a multi-threaded single node. Potentially this > >> approach can process each row in a few machine cycles, i.e. billions of > >> records per second. Therefore single-node would be sufficient for many > >> queries.) > >> > >> > >> > >> Masayuki Takahashi has started to develop an Arrow adapter for > >> Calcite[2], but a lot of work remains to implement all SQL built-in > >> functions and basic relational operators. Building on top of Gandiva we > >> could save a lot of this effort. > >> > >> > >> > >> Julian > >> > >> > >> > >> [1] > >> > https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E > >> < > >> > https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E > >> > > >> > >> > >> > >> [2] https://issues.apache.org/jira/browse/CALCITE-2173 < > >> https://issues.apache.org/jira/browse/CALCITE-2173> > >> > > > >> > > >> > >> > >> -- > >> 高橋 真之 > >> >
