On Thu, Dec 30, 2010 at 5:34 AM, Harald Schilly <harald.schi...@gmail.com> wrote: > On Thu, Dec 30, 2010 at 13:24, Alex Leone <acle...@gmail.com> wrote: >> Comments? It would be good to discuss this before the upcoming bug days if >> we are going to do anything with the notebook.
Alex -- can you also post your document to the wiki (or a link to it)? http://wiki.sagemath.org/Notebook%20scalability > > Hi, I read most of it and I also read your posting on mongodb-users ;) > > Basically, most of it is also what I would have in my mind ... just some > notes: > > 1. I wouldn't do a "isAdmin" property for users. Rather, create one or > more groups that are marked as isAdmin and then add the users to that > group. This is basically how it is done nowdays in linux via the > /etc/sudoers file where a group "admin" is marked as being special and > the sudo command checks if the user is in the group admin. > > 2. The permissions, I don't really understand it. Why are they in each > group? I understand that each worksheet has a list of users (owners) > with permissions and groups with permissions, but at the group level I > don't get it. What does db.groups.users.{ perms: 1 or 0 } actually do? > I think, just a list of usernames is enough. > > Do you know ACLs [1]? Maybe compare your mix of usernames and > permissions with the approach over there and do it this way! > Therefore, I would attach such a ACL list to each worksheet. Harald, any chance you could create some example MongoDB documents that illustrate use of ACL's? This would be very helpful. > 3. Worksheets reference to a collection of cells? Each document has a > 4mb limit ... I know, that's a lot and it will probably never be hit, > but if there is some crazy long output it might happen. Two comments: * the 4MB limit is officially going to be raised to 16MB soon. * all this database stuff is aimed (in my mind) mainly to be used by large notebook server deployments like sagenb.org, which will have say 100,000 users. Having a <=16MB limit per worksheet is a really good idea no matter what, even if mongodb didn't enforce it. So I have no problem with having such a limit in our database (per worksheet). We really really don't want people trivially making 1terabyte worksheets (right now with sagenb.org, it would be possible for somebody to do that!). > Second, > updates on worksheets only happen on the cell level, never on the > whole document. I know, mongodb has the ability to update a part of a > document via the update command, but I think it's easier to have a > collection of all cells and reference to them. I'm not sure. If you read mongodb documentation/books, the way Alex laid things (with all cells in a single document) out is repeatedly recommended by them as the recommended way to go. The updating on parts of documents with mongodb is very robust, in my experience. Also, the data locality (having all the cells in the same document) is evidently a big win efficiency wise. > But still, when a cell is updated, only it's "out" field is modified. It's "in" field can also be modified, right, e.g., when you modify the input? And somebody maybe even the type (why not?). > Therefore, I propose to define a worksheet as a list of cells [or > later and more advanced, also as a nested list of worksheets ... i.e. > to make it possible to reference to another worksheet, to do "dynamic" > embeddings, sections/parts, meta-worksheet-documents ...]. A worksheet can't be defined to be a list of cells, since there is lots of other meta data, e.g., the title, owner, etc. One can certainly have references to other worksheets, etc., with Alex's proposed schema, right? (via the _id field). > Additionally, I could also envision cases, where each of those cells > get additional permissions, e.g. a "lock" that adds a "all: ---" > permission, so that nobody is able to edit a cell. (editing > permissions is of course still possible by the users and users in the > associated groups if they are allowed) That would already fit fine with Alex's proposal. It would be good to add it as an example to his document though. It's just another key:value in one of the cells. Alex, I don't think you should use an _id field in the individual cells though. They aren't complete mongodb documents themselves, so don't have to have an "_id" field, and if they do it isn't treated specially like the _id of a complete monogodb document (which is forced to be unique, etc.). Thus using _id could be misleading. > > > 4. something trivial, instead of > out: [{ t:"stdout", data: "..."} , {t:"stderr", data: "..."}] > please just do > out: { stdout: "...", stderr: "..." } > Mongodb allows to list all keys in such an associative list and no > need for this {t: "..."} thing. > (or even better, get rid of "out" and just a stdout and stderr key is > good enough since their relative ordering doesn't matter.) +1 -- very good idea. > 5. Images might probably be referenced explicitly, i.e. out: { img: > <file-id-reference> } There should be no actual disk-based files. The images should themselves be stored in mongodb. It can store binary data (like images) just fine -- mongodb's main target domain is web applications, where multimedia data is very common. > 6. same as 5. for data files attached to worksheets. That's probably > what you mean with db.files anyways ... but it might be nice to know > when attached files are no longer needed to be able to run a > background process that removes unreferenced files. db.files uses "GridFS" which is something mongodb provides for storing large binary data (which can be much bigger than 4MB). I'm fine with attached files also having a 4MB (or later 16MB) limit, again at least for big servers like sagenb.org. Thus using GridFS for this application (the Sage notebook) isn't (in my mind) necessary. > h > > > [1] http://linux.die.net/man/5/acl > first, look at the "long text form", then read the algorithm for checking it. > -- William Stein Professor of Mathematics University of Washington http://wstein.org -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org