Re: [Twisted-Python] Is there pb documentation somewhere?

2014-08-05 Thread Glyph Lefkowitz

On Aug 4, 2014, at 9:47 PM, Daniel Sank  wrote:

> glyph,
> 
> >> 2. Is there a specification for the pb dialect of banana?
> >>
> > Beyond the code, no.
> 
> Ok.
> 
> > I would be happy to answer questions, but obviously I'm not super 
> > responsive :).
> > Let me know what you need.
> 
> For two personal projects, I would like to have a reasonable remote objects 
> library in python. I need something which can announce state changes to 
> clients, and receive state change requests from clients. My solution:
> 
> 1. Make server side class which can spawn Cacheables when it wants to tell 
> clients of its existence.
> 2. Give RemoteCaches to clients and use observe_* methods as appropriate.
> 3. Stuff a Viewable into the RemoteCaches so that clients can request state 
> changes.
> 
> Question #1: Is this a reasonable use of pb?

Yes.

> This all worked great until I ran into a bug. In trying to fix the bug, I 
> found that
> 
> 1. pb code is really hard to understand

Sorry about that.

> 2. exarkun's thinks pb is bad and that I should implement what I need in AMP.

I really wish we would stop calling things "bad" and "good".  This isn't a 
helpful classification.  PB is adequate for a particular set of requirements.  
Those requirements are somewhat unusual, and AMP is better for a lot of 
use-cases.

It sounds to me like you are a lot more interested in 

> 3. exarkun thinks banana and jelly are reasonable.

Again, what does "reasonable" mean in this context?

Let me explain my own opinion about this.

Banana is a perfectly serviceable low-level marshaling format.  It's pretty 
efficient when compared to something like JSON, and has compression mechanisms 
which can make it even more efficient (the "dialect" support you referred to).  
The only thing about it that isn't very general is that its implementation 
(although not the protocol specification) hard-codes the PB abbreviated-string 
dialect.

Jelly is higher level, but more language-specific. Its specification implicitly 
encodes numerous Python implementation details, like the distinction between 
"tuple" and "list".  It also couples very tightly to your program's structure.  
This can be a real benefit to getting a protocol up and running quickly, but it 
still allows you to create protocols where you don't really know what the wire 
format is, where you develop hidden dependencies.  In more complex protocols 
(where the "ease of getting up and running quickly" thing really starts to 
shine) this attribute of Jelly can cause real difficulty in any kind of 
cross-system communication: communicating with a peer from a different 
language, or even in Python without access to all the protocol class 
definitions from the original system, is hard because it requires 
reverse-engineering.  This is where it becomes "bad".  Still, it isn't as big 
of a disaster security- and maintenance-wise as Pickle.  The information you 
need is recorded in the code, it's just spread out, you don't need to work 
backwards from protocol dumps.  If I were going to spend some time maintaining 
PB, this is where I'd focus: if the schemas were a bit more explicit, could be 
collected into one place more easily, and were all validated in advance (before 
passing deserialized objects to the application code, or serializing them 
across the wire), then these problems could be addressed without changing the 
API too much.

PB basically just inherits all of the benefits and caveats of Jelly.  It's a 
trivial serialization of remote references to objects.

> Question #2: Would you recommend implementing a simplified replacement for pb 
> on top of banana/jelly, or starting over from AMP? I favor the banana/jelly 
> route because the protocol seems intrinsically flexible, but I read your blog 
> explaining why protocols like banana are bad, so I'm confused about what I 
> "should" do.

First of all, don't take my development advice as gospel.  When I write an 
article and publish it, I'm just trying to make people aware of issues they may 
not have considered; make your own decisions about how to write your own code.

(Unless your decision is to write it yourself in PHP, of course, in which case 
you are a danger to yourself and others and should be remanded to compulsory 
treatment.)

It seems like PB fits your style, and the problems with it are all tractable 
and fixable.  I am sad that you're not getting the development support you need 
to maintain it (most of all I'm sad you're not getting it from me!) but let's 
see if we can fix that.  I'll start by replying to your other email.

One thing that might speed things along is if you can help out with some code 
reviews.  We've got a _really_ long queue right now and that's making it hard 
for me to spend any focused effort in one particular area.  I'm happy to trade 
2-for-1 - if you do two code reviews, I will regard it as an immediate 
obligation for me to review a ticket you direct me to ;).

It might also help to write m

Re: [Twisted-Python] Is there pb documentation somewhere?

2014-08-05 Thread Glyph Lefkowitz

On Aug 4, 2014, at 10:07 PM, Daniel Sank  wrote:

> glyph
> 
> > I would be happy to answer questions, but obviously I'm not super 
> > responsive :).
> > Let me know what you need.
> 
> I am trying to understand jelly's serialization strategy:
> 
> 1. In t.s.jelly._Jellier, what is the meaning of persistentStore?

From the perspective of PB, you can ignore this completely.  It's effectively 
an unused feature.

There are two entry-point call-sites for jelly in Pb.  Broker.unserialize and 
Broker.serialize.  Both explicitly pass "None" for the "persistent" argument, 
"persistentStore" and "persistentLoad" respectively.

Reaching back into my dim and distant memory of the ancient past, I believe 
that the purpose of these callables was to allow you to use Jelly (and perhaps 
PB) to refer to objects in some kind of pluggable long-term storage.  The 
reason they're called "persistent" was that "ephemeral" storage was local to 
the connection, and therefore short-lived enough that we could trust that an 
in-memory Python dictionary would be both large enough and long-lived enough to 
serve it.

But if you have your objects in a database, you might want a different database 
backend with an application-provided callable for loading objects by ID.

Again, this was never really used, so you can probably ignore it.  (I think 
there might have been a 4X massively multiplayer video game which used it in 
2002 or so, but nothing since then that I'm aware of, especially since PB 
doesn't even have a way to pass in your own without subclassing and overriding 
'serialize'.)

> 2. In t.s.jelly._Jellier, what is the meaning of cooked? The comment here 
> doesn't make sense to me yet.

I just read the comment in _cook, and I hate my younger self right now.  
Seriously.  Screw that guy.

When you make a jelly, you have to cook the fruit first.  So part of the 
metaphor here is that you are "cooking" the objects as you're serialize them.

The "cooked" attribute maps object IDs (integers representing pointers, at 
least in CPython) to "dereference" jelly expressions.  It is said to be 
"cooked" at that point because you no longer need to put in the energy (I guess 
heat, in this metaphor?) to serialize the internal state.  A "dereference" 
expression is one that points at an object within the same Jelly, so this is 
not like something pointing at a remote reference.

It uses object IDs for keys and not the objects themselves because these 
objects are (since they can participate in circular references) implicitly 
mutable, and mutable objects often don't have a working __hash__ 
implementation, so we can't rely on that.

This happens in a weird order because an object may circularly refer to itself, 
so we prepare it and put it in the "preserved" map before actually beginning 
the serialization process of its initial state.

We also don't want to pollute the jelly output with reference IDs for every 
single object that _might_ be referenced more than once, we only want to add 
the ['reference'] expression if we actually refer to it twice.

If you look at this example:

>>> from twisted.spread.jelly import jelly
>>> circular = [1, 2]
>>> circular.append(circular)
>>> jelly(circular)
['reference', 1, ['list', 1, 2, ['dereference', 1]]]
>>> acyclic = [1, 2]
>>> jelly(acyclic)
['list', 1, 2]

You can see that the circular list allocates a reference ID '1' for the 
circular list.  The output list there would have been the thing that went into 
the _Jellier's "cooked" list, keyed by the 'id' for the serialized list, and 
then 'reference 1' would have been inserted into the beginning and its body 
appended.

So the steps are:

Here's a mutable object.  Let me remember that I've seen it, just in case I see 
it again.
Now I'm going to recursively serialize it.
Oh, here it is again, I know it's the same object because it has the same ID.  
Instead of serializing it, I'll change the ['list'] into a ['reference', 1] and 
stick in a ['dereference', 1] here.

If we never get to step 3, we never see the ['reference'] at all, and it's as 
if this functionality didn't exist.

> 3. In t.s.jelly._Jellier, what is the meaning of cooker?

The "cooker" attribute is a hack related to the use of "id" for the unique IDs. 
 If we used the object itself as the key (which we shouldn't do, for reasons I 
mentioned above), then we could just rely on it sticking around until the end 
of the 'jelly' call.  But instead, we use its 'id', which is its pointer 
address, so we need to make sure that it lives on until the end of the 
_Jellier's lifetime, so we just stick it into the "cooker" map as the value.  
You'll notice that there's no store of the object itself anywhere else: in 
"cooked" the key is the ID, and the value is the serialized output value that 
Jelly is going to write out.

If we didn't make sure the object stuck around, a different object might get 
the same ID, and that would produce spurious back-references (like, we might 
get a ['derefere