Dear Buddhika, thank you for your interest in CouchDB and the CouchDB View Server!
This is an area where you can make significant contributions to CouchDB. It is also a little bit involved, but you seem to have all the skills required to pull this off :) I’m happy to mentor you. > On 16 Mar 2015, at 10:03, Buddhika Jayawardhana <[email protected]> > wrote: > > Hi, > I am an Undergraduate of Department of Computer Science and Engineering > University of Moratuwa. I have been subscribed to couchdb mailing list > since months and I have been trying to learn some Erlang to work with > couchdb. I noticed project "COUCHDB-1743 Make the view server & protocol > faster" is related to GSoC. I am willing to submit a project proposal for > this project. > > I have theoretical knowledge in software process, design patterns, and > other Engineering concepts. I've been using 'java', 'C++' for high-level > programming and 'C', a little bit of assembly for low-level programming and > PHP and JavaScript for web development. Also I have sound knowledge on > Erlang. I would be much thankful if you can guide to get familiar with the > project as soon as possible. > > Here are the problems in my mind > > - Are the other programming languages that I should get familiar with? Erlang and JavaScript will do, some knowledge of C to understand the current system will help. > - What are the technologies I should get familiar with? General knowledge of Unix/POSIX fundamentals (processes, fds, stdio etc.) will be required. Windows equivalent APIs too (but not strictly a requirement just yet). > - I can work 40 hours per week for the project. Would that be enough to > successfully complete the project? I can’t estimate whether you’d be able to complete this 100%, but I’m sure that this enough time to make a significant contribution, that the community then can take and finish up, should you not get to the end. E.g. don’t worry too much about this :) > - What are the other resources that I should read before submitting the > proposal? Familiarity with the CouchDB source can’t hurt. More in-depth knowledge of Erlang as well, http://learnyousomeerlang.com is a great free resource and the main Erlang docs are worth a read, as well. As are the various print books that are available from various publishers. It will definitely also help to read through the CouchDB Guide: http://guide.couchdb.org Although some parts have already been integrated into http://docs.couchdb.org, which you should also read, especially the bits about Design Documents, Views and List, Show, Validation, Filter and Update functions. In addition, check out the query_server_spec, it codifies the current query server protocol: https://github.com/apache/couchdb/blob/master/test/view_server/query_server_spec.rb > Hope you will guide me through the project. Again, thanks for taking an interest in this! :) To get things rolling, here’s my rough idea for how this could play out: Generally, there are three components, the Erlang and the JavaScript part and the JavaScript runtime or couchjs. We call all these things Query Server or View Server. The Erlang part lives in https://github.com/apache/couchdb-couch-mrview The JavaScript part lives in https://github.com/apache/couchdb/tree/master/share/server The current JavaScript runtime is Spidermonkey. We have our own C-wrapper around Spidermonkey, to make it a CLI tool that talks stdio: https://github.com/apache/couchdb-couch/tree/master/priv/couch_js We’d generally like to move away from the custom C-wrapped Spidermonkey and have V8 be the execution engine. We also like to get away from having to maintain C/C++. It’d probably be simplest to use Node.js as a wrapper, because then many more people can contribute to this. Also, Node.js is good at streaming protocols, so it is a natural fit. Here is how I would start: 1. Create a new Query Server that *only* handles Show, List, Filter, Validation and Update functions as that is a lot simpler on both the Erlang and JavaScript side. 2. As part of 1: Design a new Query Server protocol that works in a streaming fashion. The current one is request/response based and both sides are waiting for one another while one of them is doing actual work. It’d be nice if both could just keep working on whatever they need to do. 3. Once 1. and 2. are in place and working correctly, expand the new Query Server to also handle Views. At this point, adding view support should not be too complicated anymore. Things to watch out for: - map/reduce functions for CouchDB views need to be “pure”, e.g. we need to guarantee they stay the same unless CouchDB can see any changes (and then invalidate the view index). This means we need some extra isolation of the JS execution. And some limitation or observation of the require() system. There is a project that demonstrated we can do this. Jason Smith has run this, but I can’t seem to find it on his GitHub. Jason, do you have any pointers? - A couchjs process can be used for multiple databases and different access control can be configured per database. Data MUST NOT leak between databases. E.g. Errors that are thrown when requesting a view result on database A must not show any process state data that comes from database B (and vice versa). - The current system works much like CGI. A single process can handle one concurrent request, if there are two concurrent requests, a new process is spawned. The new Query Server should be able to handle multiple concurrent requests. But there will be a time when a single process is saturated, at that point, we should be able to spawn more Query Servers to help with the load. — In the 1./2./3. list above, I’d either solve this upfront, or after 3., depending on what you are more comfortable with. It might be easier to get started without this, but it might be harder to add later and easier overall to have thought this through upfront. - Windows stdio can be troublesome, beware :) - Windows process handling can also be troublesome, that’s why we are using https://github.com/apache/couchdb-couch/tree/master/priv/spawnkillable to kill/reap couchjs process there. Not sure we still need this when we use Node.js, but worth checking out. - I’ve had a bit time last year to experiment with streaming Erlang/Node.js communication. It worked fine, but I didn’t get very far (the JavaScript part just echos commands back to Erlang). The projects could help as inspiration: https://github.com/janl/couch_query_server2 https://github.com/janl/node-couch-query-server2 key code is in src/couch_query_server2_sup.erl It uses the Erlang pid as a stream marker so we can interleave requests. Please excuse the lack of a README or other instructions! This is all I have for now. Other folks may want to chime in with their opinions :) If you have any more questions, let me know. If you want to take this into JIRA, let’s open a new ticket. Best Jan -- > Thank You. > > -- > *Buddhika Jayawardhana* > Undergraduate | Department of Computer Science & Engineering > University of Moratuwa > *[email protected] <[email protected]>* | LinkedIn > <http://lk.linkedin.com/in/buddhikajay/> -- Professional Support for Apache CouchDB: http://www.neighbourhood.ie/couchdb-support/
