On Thu, Dec 30, 2010 at 5:34 AM, Harald Schilly
<harald.schi...@gmail.com> wrote:
> On Thu, Dec 30, 2010 at 13:24, Alex Leone <acle...@gmail.com> wrote:
>> Comments?  It would be good to discuss this before the upcoming bug days if
>> we are going to do anything with the notebook.

Alex -- can you also post your document to the wiki (or a link to it)?
http://wiki.sagemath.org/Notebook%20scalability

>
> Hi, I read most of it and I also read your posting on mongodb-users ;)
>
> Basically, most of it is also what I would have in my mind ... just some 
> notes:
>
> 1. I wouldn't do a "isAdmin" property for users. Rather, create one or
> more groups that are marked as isAdmin and then add the users to that
> group. This is basically how it is done nowdays in linux via the
> /etc/sudoers file where a group "admin" is marked as being special and
> the sudo command checks if the user is in the group admin.
>
> 2. The permissions, I don't really understand it. Why are they in each
> group? I understand that each worksheet has a list of users (owners)
> with permissions and groups with permissions, but at the group level I
> don't get it. What does db.groups.users.{ perms: 1 or 0 } actually do?
> I think, just a list of usernames is enough.
>
> Do you know ACLs [1]? Maybe compare your mix of usernames and
> permissions with the approach over there and do it this way!
> Therefore, I would attach such a ACL list to each worksheet.

Harald, any chance you could create some example MongoDB documents
that illustrate use of ACL's?   This would be very helpful.

> 3. Worksheets reference to a collection of cells? Each document has a
> 4mb limit ... I know, that's a lot and it will probably never be hit,
> but if there is some crazy long output it might happen.

Two comments:

   * the 4MB limit is officially going to be raised to 16MB soon.

   * all this database stuff is aimed (in my mind) mainly to be used
by large notebook server deployments like sagenb.org, which will have
say 100,000 users.  Having a <=16MB limit per worksheet is a really
good idea no matter what, even if mongodb didn't enforce it.  So I
have no problem with having such a limit in our database (per
worksheet).     We really really don't want people trivially making
1terabyte worksheets (right now with sagenb.org, it would be possible
for somebody to do that!).

> Second,
> updates on worksheets only happen on the cell level, never on the
> whole document. I know, mongodb has the ability to update a part of a
> document via the update command, but I think it's easier to have a
> collection of all cells and reference to them.

I'm not sure.  If you read mongodb documentation/books, the way Alex
laid things (with all cells in a single document) out is repeatedly
recommended by them as the recommended way to go.  The updating on
parts of documents with mongodb is very robust, in my experience.
Also, the data locality (having all the cells in the same document) is
evidently a big win efficiency wise.

> But still, when a cell is updated, only it's "out" field is modified.

It's "in" field can also be modified, right, e.g., when you modify the
input?  And somebody maybe even the type (why not?).

> Therefore, I propose to define a worksheet as a list of cells [or
> later and more advanced, also as a nested list of worksheets ... i.e.
> to make it possible to reference to another worksheet, to do "dynamic"
> embeddings, sections/parts, meta-worksheet-documents ...].

A worksheet can't be defined to be a list of cells, since there is
lots of other meta data, e.g., the title, owner, etc.

One can certainly have references to other worksheets, etc., with
Alex's proposed schema, right?  (via the _id field).

> Additionally, I could also envision cases, where each of those cells
> get additional permissions, e.g. a "lock" that adds a "all: ---"
> permission, so that nobody is able to edit a cell. (editing
> permissions is of course still possible by the users and users in the
> associated groups if they are allowed)

That would already fit fine with Alex's proposal.  It would be good to
add it as an example to his document though.  It's just another
key:value in one of the cells.

Alex, I don't think you should use an _id field in the individual
cells though.  They aren't complete mongodb documents themselves, so
don't have to have an "_id" field, and if they do it isn't treated
specially like the _id of a complete monogodb document (which is
forced to be unique, etc.).   Thus using _id could be misleading.

>
>
> 4. something trivial, instead of
> out: [{ t:"stdout", data: "..."} , {t:"stderr", data: "..."}]
> please just do
> out: { stdout: "...", stderr: "..." }
> Mongodb allows to list all keys in such an associative list and no
> need for this {t: "..."} thing.
> (or even better, get rid of "out" and just a stdout and stderr key is
> good enough since their relative ordering doesn't matter.)

+1  -- very good idea.

> 5. Images might probably be referenced explicitly, i.e. out: { img:
> <file-id-reference> }

There should be no actual disk-based files.   The images should
themselves be stored in mongodb.  It can store binary data (like
images) just fine -- mongodb's main target domain is web applications,
where multimedia data is very common.

> 6. same as 5. for data files attached to worksheets. That's probably
> what you mean with db.files anyways ... but it might be nice to know
> when attached files are no longer needed to be able to run a
> background process that removes unreferenced files.

db.files uses "GridFS" which is something mongodb provides for storing
large binary data (which can be much bigger than 4MB).

I'm fine with attached files also having a 4MB (or later 16MB) limit,
again at least for big servers like sagenb.org.    Thus using GridFS
for this application (the Sage notebook) isn't (in my mind) necessary.

> h
>
>
> [1] http://linux.die.net/man/5/acl
> first, look at the "long text form", then read the algorithm for checking it.
>



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to