[sage-devel] Re: [sage-notebook] Scalable Sage Server Architecture Proposal v0.1

William Stein Sat, 01 Jan 2011 22:43:24 -0800

On Sat, Jan 1, 2011 at 7:15 PM, Alex Leone <acle...@gmail.com> wrote:
>> Alex -- can you also post your document to the wiki (or a link to it)?
>> http://wiki.sagemath.org/Notebook%20scalability
>
>  Done.  It's in the Notes section.
>
>
>> > 1. I wouldn't do a "isAdmin" property for users. Rather, create one or
>> > more groups that are marked as isAdmin and then add the users to that
>> > group. This is basically how it is done nowdays in linux via the
>> > /etc/sudoers file where a group "admin" is marked as being special and
>> > the sudo command checks if the user is in the group admin.
>
> This makes sense.  I thought that it would have to be a property on each
> user so that lookups would be fast, but I realize that it would probably
> just be set as a session variable when the user logs in.
>
>
>> > 2. The permissions, I don't really understand it. Why are they in each
>> > group?
>
> For worksheets, I could think of a few different kinds of permissions:
> 1. viewing
> 2. editing
> 3. changing the title?
> 4. deleting the worksheet
> For groups, There's not as many, but I thought it would be good to reuse the
> same mechanism:
> 1. adding other people to the group
> 2. changing the group name?
> 3. deleting the group
> The 'perms' number is a bit-field.  If the first bit (0b0001) is set, then
> the user has permission to do x.  If the second bit (0b0010) is set, the
> user has permission to do y, etc.
>
>
>
>> > but if there is some crazy long output it might happen.
>
> If the output gets too long, it would get saved to a separate file, just
> like the current notebook saves long output to "full_output.txt".
>
>>
>> > Second,
>> > updates on worksheets only happen on the cell level, never on the
>> > whole document. I know, mongodb has the ability to update a part of a
>> > document via the update command, but I think it's easier to have a
>> > collection of all cells and reference to them.
>>
>> I'm not sure.  If you read mongodb documentation/books, the way Alex
>> laid things (with all cells in a single document) out is repeatedly
>> recommended by them as the recommended way to go.  The updating on
>> parts of documents with mongodb is very robust, in my experience.
>> Also, the data locality (having all the cells in the same document) is
>> evidently a big win efficiency wise.
>>
>> > But still, when a cell is updated, only it's "out" field is modified.
>>
>> It's "in" field can also be modified, right, e.g., when you modify the
>> input?  And somebody maybe even the type (why not?).
>
> I considered both.  Here's what I was thinking about:
> 1. List references to cells that would go in a separate collection
> (db.cells):
>   a. If there was ever fine-grain revision history (eg see google docs), old
> cell contents could stay in the db (maybe as diffs), and the worksheet
> object wouldn't get huge.  But then again this could be implemented as a
> diff of the whole worksheet object or something.
> 2. Put the cells in the worksheet object (as proposed):
>   a.  Like William said, it might be better to have all the data localized.
>
>
>> Alex, I don't think you should use an _id field in the individual
>> cells though.  They aren't complete mongodb documents themselves, so
>> don't have to have an "_id" field, and if they do it isn't treated
>> specially like the _id of a complete monogodb document (which is
>> forced to be unique, etc.).   Thus using _id could be misleading.
>
> This id helps keep track of cells on the client-side, and also if the cells
> get rearranged.  Perhaps just 'id' would be a better name.
>
>
>> > 4. something trivial, instead of
>> > out: [{ t:"stdout", data: "..."} , {t:"stderr", data: "..."}]
>> > please just do
>> > out: { stdout: "...", stderr: "..." }
>> > Mongodb allows to list all keys in such an associative list and no
>> > need for this {t: "..."} thing.
>> > (or even better, get rid of "out" and just a stdout and stderr key is
>> > good enough since their relative ordering doesn't matter.)
>>
>> +1  -- very good idea.
>
> The output from a cell is a sequence of messages (Stdout, Stderr, Stdin,
> Html, ...).  Consider the following code:
> sys.stdout.write("out1");
> sys.stderr.write("err1");
> sys.stdout.write("out2");
> sys.stderr.write("err2");
> this would generate
> Stdout("out1")
> Stderr("err1")
> Stdout("out2")
> Stderr("err2")
> The messages need to be displayed in the order that they are produced.


A compromise between your two suggestions is:

   out: [{stdout:"out1"}, {stderr:"err1"}, {stdout:"out2"},
{stderr:"err2"}, {image:"foo.png"}]


>
>
>> > 5. Images might probably be referenced explicitly, i.e. out: { img:
>> > <file-id-reference> }
>
> I was thinking that there would be a Plot(...) message, a JMol(,,,) message,
> etc, which would reference files.

out: [{stdout:"out1"}, {stderr:"err1"}, ..., {image:"foo.png"},
{jmol:"foo.jmol"}, ...]

?

In some cases it might make sense to be able to specify coordinates or
other rich data:

{image:"foo.png", position:[3,7]}

This argues for making the output document have a type like you
suggested above, e.g.,

  {t:'image', data:'foo.png', position:[3,7]}


> Currently in the notebook, any computation output is just a stream of bytes.
>  But that stream contains different kinds of data - stdout, stderr, latex,
> plots, html tables, jmol plots, references to data files that the cell
> created, etc.  So why not have the computation output be that series of
> "messages"?
>  - Alex

That does make sense.

William



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: [sage-notebook] Scalable Sage Server Architecture Proposal v0.1

Reply via email to