Simultaneous destructors

Post by Benjamin Smedberg
How does IPDL deal with destructors that are fired simultaneously from both
sides, e.g. ~PBrowserStream? Even if we make one side "win the race", won't
IPDL reject the second destructor message (because it has an invalid actor
ID) and abort the process? If it doesn't abort the process, what is the
behavior (return values) of the second destructor message?

Right now, IPDL treats dtors as it does any other message, and by "any
message" standards that's not a race.

But this is just a good old fashioned bug, as dtors define an implicit
transition, [alive]-->DEAD. I'll add this in soon.

Cheers,
Chris

Chris Jones

2009-09-17 15:50:04 UTC

Right now, IPDL treats dtors as it does any other message, and by "any
message" standards that's not a race.
But this is just a good old fashioned bug, as dtors define an implicit
transition, [alive]-->DEAD. I'll add this in soon.

Hmm, I was hasty in my response. On further consideration this is a
deeper problem.

There are two facets to the problem. The first mainly applies to
nominally stateless, inherently racy protocols like NPAPI. The example
you present with ~PBrowserStream forces a design decision: whether or
not to allow destructor messages to race. I would prefer to just
disallow racy destructors; I think there's a workaround for the NPAPI
(generally, crappy API) case. The workaround is two-phase destruction.
For example:

protocol PBrowserStream {
child: RequestDestruction; // first phase
parent: NotifyDestruction; // first phase
};

protocol PPluginInstance {
manages PBrowserStream;

parent: PBrowserStream;
~PBrowserStream; // second phase
};

(I'm not sure "which" NPStream the PBrowserStream models; I'm assuming
it's the one created with NPN_NewStream() and so sort of "owned" by the
child).

If the browser wants to destroy the BrowserStream, instead of directly
invoking the dtor, it destroys the "real" NPStream wrt to NPAPI and then
sends "RequestDestruction." In the common case, the plugin will be idle
wrt the stream and simply send back the actual ~PBrowserStream message.

If the plugin wants to destroy a BrowserStream, it first sends
"NotifyDestruction" to warn the browser. In the common case, the
browser will be idle wrt the stream, clean up the NPStream, and send
"RequestDestruction" back to the plugin, upon which ~PBrowserStream will
be sent.

If, however, RequestDestruction and NotifyDestruction race, it causes no
problems as the browser is guaranteed to have processed
NotifyDestruction *before* ~PBrowserStream arrives. The C++
implementations of browser stream actors just have to be careful not to
double-"free" the NPStreams.

That's one option, again (to me eyes) a hack around NPAPI limitations.
Another option for this case is to *allow* racy destructor messages.
All that has to be done in IPDL is, only for protocols with racy
destructors, not to blow up when a destructor references an unknown
actor. This approach trades a safety check for racy destructors. Both
are feasible and easy, but again I lean towards two-phase destruction.

I now see that the second facet of this problem isn't worth discussing
until we resolve this first issue.

Cheers,
Chris

Benjamin Smedberg

2009-09-17 18:46:41 UTC

Post by Chris Jones
There are two facets to the problem. The first mainly applies to
nominally stateless, inherently racy protocols like NPAPI. The example
you present with ~PBrowserStream forces a design decision: whether or
not to allow destructor messages to race. I would prefer to just
disallow racy destructors; I think there's a workaround for the NPAPI
(generally, crappy API) case. The workaround is two-phase destruction.

I don't think this is specific to crappy APIs or destructors, *any* message
could race with the destructor, really. Let's say the parent always calls
the destructor. You still have potentially parent: ~PBrowserStream() racing
with any other allowable message sent by the child, e.g. NPN_RequestRead.

What about a perfectly async API such as networking streams, with a one-way
destructor:

async protocol PNecko {
parent:
PChannel()
~PChannel();
}

async protocol PChannel {
child:
DataAvailable();
};

Isn't is possible for DataAvailable to race with ~PChannel?

--BDS

Chris Jones

2009-09-17 20:25:26 UTC

Right now, IPDL treats dtors as it does any other message, and by "any
message" standards that's not a race.
But this is just a good old fashioned bug, as dtors define an implicit
transition, [alive]-->DEAD. I'll add this in soon.

Hmm, I was hasty in my response. On further consideration this is a
deeper problem.
There are two facets to the problem. The first mainly applies to
nominally stateless, inherently racy protocols like NPAPI. The example
you present with ~PBrowserStream forces a design decision: whether or
not to allow destructor messages to race. I would prefer to just
disallow racy destructors; I think there's a workaround for the NPAPI
(generally, crappy API) case. The workaround is two-phase destruction.
protocol PBrowserStream {
child: RequestDestruction; // first phase
parent: NotifyDestruction; // first phase
};
protocol PPluginInstance {
manages PBrowserStream;
parent: PBrowserStream;
~PBrowserStream; // second phase
};
(I'm not sure "which" NPStream the PBrowserStream models; I'm assuming
it's the one created with NPN_NewStream() and so sort of "owned" by the
child).
If the browser wants to destroy the BrowserStream, instead of directly
invoking the dtor, it destroys the "real" NPStream wrt to NPAPI and then
sends "RequestDestruction." In the common case, the plugin will be idle
wrt the stream and simply send back the actual ~PBrowserStream message.
If the plugin wants to destroy a BrowserStream, it first sends
"NotifyDestruction" to warn the browser. In the common case, the
browser will be idle wrt the stream, clean up the NPStream, and send
"RequestDestruction" back to the plugin, upon which ~PBrowserStream will
be sent.
If, however, RequestDestruction and NotifyDestruction race, it causes no
problems as the browser is guaranteed to have processed
NotifyDestruction *before* ~PBrowserStream arrives. The C++
implementations of browser stream actors just have to be careful not to
double-"free" the NPStreams.
That's one option, again (to me eyes) a hack around NPAPI limitations.
Another option for this case is to *allow* racy destructor messages. All
that has to be done in IPDL is, only for protocols with racy
destructors, not to blow up when a destructor references an unknown
actor. This approach trades a safety check for racy destructors. Both
are feasible and easy, but again I lean towards two-phase destruction.
I now see that the second facet of this problem isn't worth discussing
until we resolve this first issue.

OK, looks like we should cover both issues together.

As Benjamin just pointed out, destructors are kind of busted
semantically right now. C'est la guerre. (Beware the easy analogy to C++!)

I see a few ways to fix them.

(1)
Make destructor messages part of the protocol of the actors they
destruct, rather than part of the manager's protocol. This would imply
that dtors can't race with any other message (including other dtors),
since dtors transition into a special _DEAD_ state. (I.e., the dtor and
another message can never commute.) For example

protocol PluginInstance {
manages BrowserStream;
parent: BrowserStream();
// no more ~BrowserStream()
}

protocol BrowserStream {
child: M1();
parent: M2();

child: delete();
// could also be ~BrowserStream()

state S1: send M1 goto S2;
state S2; recv M2 goto DYING;
state DYING: send delete;
}

This proposal adds a new keyword "delete" which is understood to mean
"the destructor message for this protocol's actors." (We could also
just reuse ~Protocol, it's just a syntax thing that I don't have strong
feelings about.)

I think this would work quite well ... except for "stateless" protocols
like some in NPAPI. For them it's impossible to declare a non-racy
destructor. They could be special-cased in that for them, IPDL would
not treat sending a message to a nonexistent actor as a "fatal" error,
only to the extent that doing so would cause a spurious fatal error.
(Fancier implementations are possible.)

(2)
Keep dtors as part of the manager protocol, but only allow dtors to be
uni-directional. I.e., |both: ~Subprotocol()| would be explicitly
disallowed. Then, on the side that *sends* a particular dtor, receiving
a message addressed to a nonexistent actor would no longer be fatal.

(3)
Do either or none of { (1), (2) }, and treat destructor messages as
special in that the IPDL actors aren't "actually" destroyed until the
other side ACKs the dtor message. That is, actors can continue to
receive messages (though not send them) until the other side replies to
the dtor message. It's not really clear what should happen when a
"half-deleted" actor receives a message. Options:
(a) IPDL automatically returns EDESTROYED or something like that
(b) IPDL invokes a special handler that decides what to do
(c) the actor is marked as DEAD, that state is visible to C++, and
it's up to the implementation to take care of itself.

Strong opinions? I lean towards (1) plus (3a), that is, use (3a) for
stateless protocols.

Cheers,
Chris

Benjamin Smedberg

2009-10-07 17:58:05 UTC

Post by Chris Jones
(1)
Make destructor messages part of the protocol of the actors they
destruct, rather than part of the manager's protocol. This would imply
that dtors can't race with any other message (including other dtors),
since dtors transition into a special _DEAD_ state. (I.e., the dtor and
another message can never commute.) For example
protocol PluginInstance {
manages BrowserStream;
parent: BrowserStream();
// no more ~BrowserStream()
}
protocol BrowserStream {
child: M1();
parent: M2();
child: delete();
// could also be ~BrowserStream()
state S1: send M1 goto S2;
state S2; recv M2 goto DYING;
state DYING: send delete;
}
This proposal adds a new keyword "delete" which is understood to mean
"the destructor message for this protocol's actors." (We could also
just reuse ~Protocol, it's just a syntax thing that I don't have strong
feelings about.)
I think this would work quite well ... except for "stateless" protocols
like some in NPAPI. For them it's impossible to declare a non-racy
destructor. They could be special-cased in that for them, IPDL would
not treat sending a message to a nonexistent actor as a "fatal" error,
only to the extent that doing so would cause a spurious fatal error.
(Fancier implementations are possible.)

I don't understand how the extra state helps, at least if there are any
async messages involved:

protocol Channel {
child: dataavailable();
child: datadone()
parent: cancel();
parent: delete();

state NORMAL:
send dataavailable goto NORMAL;
send datadone goto DYING;
recv cancel goto DYING;

state DYING:
send delete;
}

Let's say the parent is sending dataavailable(). The child sends cancel()
and then immediately sends delete(). Unless the parent somehow acknowledges
that there has been a state transition to DYING, you could still end up
sending a message to a dead actor. This is almost exactly how I see necko
channels being modeled.

Post by Chris Jones
(2)
Keep dtors as part of the manager protocol, but only allow dtors to be
uni-directional. I.e., |both: ~Subprotocol()| would be explicitly
disallowed. Then, on the side that *sends* a particular dtor, receiving
a message addressed to a nonexistent actor would no longer be fatal.

* Are protocol IDs re-used after destruction?
* If so, we need to make sure that we don't reassign the protocol ID until
we're sure the other side won't be sending a racy message to it.

(If they aren't re-used, aren't we likely to run out of IDs in normal
runtime situations?)

Post by Chris Jones
(3)
Do either or none of { (1), (2) }, and treat destructor messages as
special in that the IPDL actors aren't "actually" destroyed until the
other side ACKs the dtor message. That is, actors can continue to
receive messages (though not send them) until the other side replies to
the dtor message. It's not really clear what should happen when a
(a) IPDL automatically returns EDESTROYED or something like that
(b) IPDL invokes a special handler that decides what to do
(c) the actor is marked as DEAD, that state is visible to C++, and
it's up to the implementation to take care of itself.
Strong opinions? I lean towards (1) plus (3a), that is, use (3a) for
stateless protocols.

Right now I don't see how #1 helps, so I'm leaning towards 3a or 3c. 3c has
the advantage that on the child side IPDL methods still cannot fail, which
avoids some error-checking code.

--BDS

Chris Jones

2009-10-07 18:36:02 UTC

I don't understand how the extra state helps, at least if there are any
protocol Channel {
child: dataavailable();
child: datadone()
parent: cancel();
parent: delete();
send dataavailable goto NORMAL;
send datadone goto DYING;
recv cancel goto DYING;
send delete;
}

This protocol would fail type-checking because it has a race condition
in the NORMAL state, datadone vs. cancel. The protocol is stateful, and
datadone/cancel don't adhere to the Diamond Rule.

Post by Benjamin Smedberg
Let's say the parent is sending dataavailable(). The child sends cancel()
and then immediately sends delete(). Unless the parent somehow acknowledges
that there has been a state transition to DYING, you could still end up
sending a message to a dead actor. This is almost exactly how I see necko
channels being modeled.

If this will be common, I agree that may need to special-case dtors
more, i.e. special case them beyond stateless protocols. One way would
be to allow racy dtor messages, and generate special code for protocols
that have racy dtors (IPDL already has that information for any state).

* Are protocol IDs re-used after destruction?

Currently no, but that probably should change.

Post by Benjamin Smedberg
* If so, we need to make sure that we don't reassign the protocol ID until
we're sure the other side won't be sending a racy message to it.

Yup.

Right now I don't see how #1 helps, so I'm leaning towards 3a or 3c. 3c has
the advantage that on the child side IPDL methods still cannot fail, which
avoids some error-checking code.

I still lean towards (1) plus (3a), possibly adding to (1) the
special-case of racy dtors in stateful protocols. It seems to me that
(3c) still requires error checking, or at least |actor->IsLive()| checks
which I don't think would save much code compared to error checks.

Cheers,
Chris

Chris Jones

2009-10-07 20:13:46 UTC

Post by Benjamin Smedberg
I don't understand how the extra state helps, at least if there are any
protocol Channel {
child: dataavailable();
child: datadone()
parent: cancel();
parent: delete();
send dataavailable goto NORMAL;
send datadone goto DYING;
recv cancel goto DYING;
send delete;
}

This protocol would fail type-checking because it has a race condition
in the NORMAL state, datadone vs. cancel. The protocol is stateful, and
datadone/cancel don't adhere to the Diamond Rule.

I was remiss in not showing how the races could be avoided in this
example. The way to avoid them is a two-phase delete like what I
described in an earlier post. For this protocol, it would look like

protocol Channel {
child:
dataavailable();
datadone(); NOTREACHED(); //<--ignore this
parent:
moar();
cancel();
delete();

state NORMAL:
send dataavailable goto SENTDATA;
send datadone goto DONE;
recv cancel goto CANCELED;

state SENTDATA:
recv moar goto NORMAL;
recv cancel goto DEAD;

state DONE:
recv cancel goto DEAD;

state CANCELED:
send dataavailable goto DEAD;
send datadone goto DEAD;

state DEAD:
recv delete goto __deleted__;

// IPDL would generate this "state" automatically, so it can be ignored
state __deleted__: send NOTREACHED goto __deleted__;
};

Speaking generally, this protocol allows the child to send |cancel()|
and the parent to send |datadone()| (the parent's equivalent to cancel()
wrt to deletion) at any time.

If the child sends cancel(), then a delete sequence is initiated,
finished by the child sending the actual |delete()| message back to the
parent. The tricky part here is that if the child sends an "abrupt
cancel", a cancel from the NORMAL state, then the child must wait for
the parent to ack the cancel with datavailable or datadone before
sending the actual |delete()| message.

Likewise, if the parent sends |datadone()|, a delete sequence is also
initiated. Like |cancel()| sent from the NORMAL state, if |datadone()|
is sent from NORMAL, then the child must ack with |cancel()| before it
can send |delete()|.

The substantial modification from the original protocol is the addition
of the SENTDATA state and the "moar()" message. This captures the
decision a child makes between wanting more data vs. cancelling. It
prevents two dataavailable() messages from being sent in a row. This
might seem strange, but we'll need this feature when the child and
parent are transferring network data through two shmem buffers. The
|moar()| message is where the child would send back to the parent a
handle to the "free" shmem buffer (i.e., the one it's not currently
using). (The parent would then fill that buffer and transfer it back to
the child using |datavailable()|).

I agree that two-phase deletion is complicated, but (i) it's the price
to be paid for race-free deletion; (ii) two-phase deletion is a kind of
"IPDL pattern", like "empty subprotocols", that can be documented and
copied by other protocols.

The alternative is, as I mentioned in the previous message, allowing
|delete()| messages to be inherently racy even in stateful protocols,
having IPDL detect when |delete()| indeed races, and generating special
code to handle the races. Personally I prefer expending the effort to
declare non-racy protocols, like the one above.

Cheers,
Chris

Benjamin Smedberg

2009-10-08 14:12:52 UTC

Post by Chris Jones
I was remiss in not showing how the races could be avoided in this
example. The way to avoid them is a two-phase delete like what I
described in an earlier post. For this protocol, it would look like
protocol Channel {
dataavailable();
datadone(); NOTREACHED(); //<--ignore this
moar();
cancel();
delete();
send dataavailable goto SENTDATA;
send datadone goto DONE;
recv cancel goto CANCELED;
recv moar goto NORMAL;
recv cancel goto DEAD;
recv cancel goto DEAD;
send dataavailable goto DEAD;
send datadone goto DEAD;
recv delete goto __deleted__;
// IPDL would generate this "state" automatically, so it can be ignored
state __deleted__: send NOTREACHED goto __deleted__;
};
If the child sends cancel(), then a delete sequence is initiated,
finished by the child sending the actual |delete()| message back to the
parent. The tricky part here is that if the child sends an "abrupt
cancel", a cancel from the NORMAL state, then the child must wait for
the parent to ack the cancel with datavailable or datadone before
sending the actual |delete()| message.

Well, I don't think you can use dataavailable/datadone for the ack messages
(since there may already be those messages in the async pipeline). So you'd
need an explicit "acknowledgecancel" message, right? But otherwise, this
seems to be theoretically sound.

It seems like you could apply the same idea to the mostly-stateless NPAPI
protocols, even... you'd just have a single main state for normal operation
and a secondary CANCELED state, and perhaps some logic to deal with
interleaved RPC messages.

Post by Chris Jones
The substantial modification from the original protocol is the addition
of the SENTDATA state and the "moar()" message. This captures the
decision a child makes between wanting more data vs. cancelling. It
prevents two dataavailable() messages from being sent in a row. This
might seem strange, but we'll need this feature when the child and
parent are transferring network data through two shmem buffers. The
|moar()| message is where the child would send back to the parent a
handle to the "free" shmem buffer (i.e., the one it's not currently
using). (The parent would then fill that buffer and transfer it back to
the child using |datavailable()|).

Oh, I misread the protocol. Although it might be good/necessary to use a
single shmem buffer like this, it's not strictly necessary to prevent
racing, right? Or to put it another way, is the following protocol is also
race-free and allows multiple dataavailable() calls in a row?

protocol Channel {
child:
dataavailable();
datadone();
oncanceled()
parent:
cancel();
delete();

state NORMAL:
send dataavailable goto NORMAL;
send datadone goto DONE;
recv cancel goto CANCELED;

state DONE:
recv cancel goto DONE;
recv delete goto __deleted__;

state CANCELED:
send dataavailable goto CANCELED;
send datadone goto CANCELED;
send oncanceled goto DONE;
};

Post by Chris Jones
I agree that two-phase deletion is complicated, but (i) it's the price
to be paid for race-free deletion; (ii) two-phase deletion is a kind of
"IPDL pattern", like "empty subprotocols", that can be documented and
copied by other protocols.

Ok, let's try it. Hopefully we can develop some mixins/C++ helpers which
make two-phase cancellation less confusing for developers: e.g. something
which would automatically throw away dataavailable in the CANCELED state.

--BDS

Jason Duell

2009-10-09 00:49:43 UTC

Is the following protocol also
race-free and allows multiple dataavailable() calls in a row?

It looks like it to me. Hopefully the IPDL compiler will be able to
always tell us for certain (?).

I hope so. Right now it's starting to look like a large percentage of
the complexity of our protocols is likely to be in getting destruction
right.

There's also something gross about how much the needs of IPDL itself
are getting inserted into the state protocols. It's a bit like
writing TCP applications, and suddenly having to programmatically deal
with incoming packets after you've already called close(). Can we
really not make things more convenient?

One thing that might help a little, if we do go down this road, is the

send dataavailable goto CANCELED;
send datadone goto CANCELED;
send oncanceled goto DONE;

Right now I'd have to make sure my onDataAvailable method kept some
state variable (essentially duplicating IPDL's), so that I know to
simply discard a packet if I'm in the CANCELLED state. If I forget
this, I might try to blindly deliver packets to to data structures
I've discarded, or something like that, and there's nothing IPDL's
compiler could do to help catch the mistake. So I propose something
like

send dataavailable goto CANCELED DISCARD;

which tells IPDL that I know about the possibility of these msgs
arriving in this state, but that I just want them dropped on the
floor.

Jason

Chris Jones

2009-10-09 08:03:54 UTC

Is the following protocol also
race-free and allows multiple dataavailable() calls in a row?

It looks like it to me. Hopefully the IPDL compiler will be able to
always tell us for certain (?).

Yes, per its rules (one-transition commutativity only).

I hope so. Right now it's starting to look like a large percentage of
the complexity of our protocols is likely to be in getting destruction
right.

Agreed, concurrent, explicit destruction is complicated.

Post by Jason Duell
There's also something gross about how much the needs of IPDL itself
are getting inserted into the state protocols.

Why do you say that? These are all design decisions, IPDL doesn't
"need" anything, it's malleable.

Post by Jason Duell
It's a bit like
writing TCP applications, and suddenly having to programmatically deal
with incoming packets after you've already called close(). Can we
really not make things more convenient?

I'm not sure you're approaching this with the right mindset. IPDL is
*our* language, we get to define what's kosher and not. So far my
working hypothesis has been that protocols should be declared
"race-free". If making destruction race-free is too much of a burden,
we can change the rules. In fact, we're changing the rules right now,
because the original destructor design was crap :S.

Post by Jason Duell
One thing that might help a little, if we do go down this road, is the

send dataavailable goto CANCELED;
send datadone goto CANCELED;
send oncanceled goto DONE;

Right now I'd have to make sure my onDataAvailable method kept some
state variable (essentially duplicating IPDL's), so that I know to
simply discard a packet if I'm in the CANCELLED state.

No, you don't. You would use your IPDL actor's state to decide what to
do. Referencing the race-free protocol I posted in an earlier message,
the parent-side actor's (the one generating data) C++ code doesn't have
to know or care about destruction, except insofar as it has to
unregister itself from Necko notification when destroyed.

On the child side, you'd need code like:

class ChannelChild {
bool MaybeDie() {
if (State() == DONE)
SendCancel();
if (State() == DEAD) {
SendDestructor();
return true;
}
return false;
}

bool RecvDataAvailable(...) {
if (MaybeDie()) return true;

// handle data normally

if needToCancel {
SendCancel()
SendDestructor()
}
}

bool RecvDataDone(...) {
if (MaybeDie()) return true;
}

I don't find this much of a burden on the C++ author. Again, IMHO,
writing the IPDL protocol is the hard work. Once the protocol is
written, the above pseudocode can be easily written by just looking at
the IPDL protocol (that's what I did).

Post by Jason Duell
If I forget
this, I might try to blindly deliver packets to to data structures
I've discarded, or something like that, and there's nothing IPDL's
compiler could do to help catch the mistake.

Not sure what you mean here. IPDL won't let you write a racy protocol,
and if your C++ code sends a bad message from a particular state, the
IPDL-generated C++ code will blow up loudly.

Post by Jason Duell
So I propose something
like
send dataavailable goto CANCELED DISCARD;
which tells IPDL that I know about the possibility of these msgs
arriving in this state, but that I just want them dropped on the
floor.

I'm not much of a fan of this specific proposal, but I think this is an
interesting approach.

I tried to emphasize in earlier messages that we *can* allow racy
destruction sequences by changing IPDL's rules. *If* we decide that the
non-racy ideas we've bounced around so far are too complicated, and we
decide that we want to allow some messages to race no matter what, then
we can add a keyword to mark these messages as such. I don't have a
good idea for what that keyword should be ... maybe |hidden|? |data|?
|packet| (by analogy to "racy" TCP packets)? |ghost|? Anyways, in IPDL
we could do something like

protocol Channel {
//[snip]
ghost ondataavailable();
//[snip]

state NORMAL:
send dataavailable goto NORMAL;
send datadone goto DONE;
recv cancel goto CANCELED;

state DONE:
recv cancel goto DEAD;

state CANCELLED:
send datadone DEAD;

state DEAD:
recv delete;
};

What the |ghost| specifier would mean is that *if* the message is
sent/received from/in a state that allows it, then it's processed
normally. If it's sent/received from/in a state that *doesn't* allow
it, then it's dropped. There are a few more requirements we'd want to
make of |ghost| messages, but that's the general idea.

That said, I think the more important decision is whether we want to
stick with the original plan of only admitting non-racy protocols.

Cheers,
Chris

Chris Jones

2009-10-09 08:39:51 UTC

|unchecked| ?

Jason Duell

2009-10-09 18:38:13 UTC

Post by Jason Duell
There's also something gross about how much the needs of IPDL
itself are getting inserted into the state protocols.

Why do you say that? These are all design decisions, IPDL doesn't
"need" anything, it's malleable.

All I was saying is that right now IPDL imposes a certain burden on
the developer because of the 1) need to avoid sending more than one
message at a time in the same direction (for flow control and/or the
state mismatch issue), and 2) the need to do subtle two-phase
destruction. But that's probably an acceptable cost in order to get a
provably race-free protocol, which is a very nice thing to have. So
I'm still with the program, and hope I'm not sounding too grumpy :)

Post by Jason Duell
One thing that might help a little, if we do go down this road, is
the ability to have IPDL discard msgs in certain states. For
instance: ... Right now I'd have to make sure my onDataAvailable
method kept some state variable (essentially duplicating IPDL's),
so that I know to simply discard a packet if I'm in the CANCELLED
state.

No, you don't. You would use your IPDL actor's state to decide what
to do.... Writing the IPDL protocol is the hard work. Once the
protocol is written, the above pseudocode can be easily written by
just looking at the IPDL protocol (that's what I did).

Fair enough. We may not need to both with DISCARD or |ghost| or
whatever, then.

What the |ghost| specifier would mean is that *if* the message is
sent/received from/in a state that allows it, then it's processed
normally. If it's sent/received from/in a state that *doesn't*
allow it, then it's dropped.

If we do go down this route, it seems like it would be much safer to
require the author to specify exactly which states would throw away
msgs, rather than adding a blanket "throw this away if I don't account
for it" attribute. (I.e., I still like my DISCARD proposal better).

Explicitly acknowledging channel messages seems like a high burden
on total throughput.

This is a potential issue. In the case of a small HTTP reply, for
instance, the parent necko will get it all in one socket read(), and
thus would essentially call OnStartRequest, OnDataAvailable, and
onStopRequest in succession. Requiring an IPDL reply for each of
these would introduce two IPC round-trip latencies to the process.
I'm not sure how much that would show up in the overall reponsiveness
of the browser, but the sequence itself would possibly be orders of
magnitude more expensive, time-wise.

This to me looks like another reasonable proposal for making it
easier to write race-free destructors (i.e. by loosening IPDL's
race-free-protocol rules). I need to think more carefully about the
implications of modifying the particular check that makes this
protocol fail the race detector, though.

Given the potential throughput issue, this might be worth looking into
(it's also both an interesting research question, and a potential way
of making writing protocols more natural). But it doesn't look like
we necessarily need it to get started.

Jason

Robert O'Callahan

2009-10-11 22:01:15 UTC

Post by Jason Duell
This is a potential issue. In the case of a small HTTP reply, for
instance, the parent necko will get it all in one socket read(), and
thus would essentially call OnStartRequest, OnDataAvailable, and
onStopRequest in succession. Requiring an IPDL reply for each of
these would introduce two IPC round-trip latencies to the process.
I'm not sure how much that would show up in the overall reponsiveness
of the browser, but the sequence itself would possibly be orders of
magnitude more expensive, time-wise.

This seems like something we should definitely avoid.

Rob

Chris Jones

2009-10-12 23:37:55 UTC

Post by Chris Jones
What the |ghost| specifier would mean is that *if* the message is
sent/received from/in a state that allows it, then it's processed
normally. If it's sent/received from/in a state that *doesn't*
allow it, then it's dropped.

I don't like DISCARD because it refers to other states, not the
state/trigger at which it's written.

I'm fine with either

(1) |unchecked| specifier for messages as described earlier, only
allowed if the messages meet the conditions
(a) async
(b) never cause state changes (i.e. always self-loop); e.g. the
dataavailable message we've been discussing

or

(2) Modification of |discard|: place it at the state where the message
should be discarded. So the Channel protocol could be something like

state NORMAL:
send dataavailable goto NORMAL;
send datadone goto DONE;
recv cancel goto CANCELLED;

state DONE:
recv cancel goto DEAD;

state CANCELLED:
discard dataavailable;
send datadone goto DEAD;

state DEAD:
recv delete;

(Or thereabouts, I didn't verify this protocol.)

The caveat is that you wouldn't be able to |discard| in the
"__deleted__" state, so the shutdown sequence would still need to be
race-free except where |discard dataavailable| was concerned.

Post by Chris Jones
Explicitly acknowledging channel messages seems like a high burden
on total throughput.

OnStart/OnStop haven't entered into the discussion so far, so I'd
tentatively claim that you're arguing against a strawman. How do they
fit into the protocols that have been tossed around so far?

And you're worried about the IPC/IPDL overhead being "more expensive"
than what? The current case where OnStart/OnData/OnStop are processed
as Runnable events in a single event loop?

Post by Chris Jones
This to me looks like another reasonable proposal for making it
easier to write race-free destructors (i.e. by loosening IPDL's
race-free-protocol rules). I need to think more carefully about the
implications of modifying the particular check that makes this
protocol fail the race detector, though.

I don't think there'll be throughput issues with double-buffering, but I
agree that we don't need to solve this to get started.

Cheers,
Chris

Jason Duell

2009-10-13 00:32:43 UTC

discard dataavailable;
send datadone goto DEAD;

That's fine. I'd prefer to not lose the "recv", so it would look
like

recv discard dataavailable;

But I don't care very much.

Post by Benjamin Smedberg
Explicitly acknowledging channel messages seems like a high burden
on total throughput.

OnStart/OnStop haven't entered into the discussion so far, so I'd
tentatively claim that you're arguing against a strawman. How do they
fit into the protocols that have been tossed around so far?

Once an HTTP channel is opened by the child process, the chrome HTTP
channel that gets created will go through the regular steps of getting
data from the network, and calling OnStartRequest/OnDataAvailable/
OnStopRequest. These functions (and the data/headers) need to be made
available in the child process. Right now my plan is to simply proxy
each OnStart/OnData/OnStop function to the child via an IPDL message
(it may be possible to glob them together into one message which is
sent to the child, but that may not be trivial, given that there may
be chains of listeners/observers that need to be called in the correct
order).

So we're looking at 3 IPC messages in a row, for a small HTTP reply
that comes in with one socket read(). One possibility is that we can
send them all in a row:

PARENT CHILD
-------------------------
OnStart --->
OnData --->
OnStop --->

We still have the overhead of 3 IPC messages, but they can be "in
flight" at the same time.

But under your proposed rule that one side can't send multiple
messages in a row in IPDL, we have to add acknowledgements to each
message:

PARENT CHILD PARENT CHILD PARENT CHILD
-----------------------------------------------------
OnStart -->
moar -->
OnData -->
moar -->
OnStop -->

(I hope the ASCII art is comprehensible). Here we're adding two round-
trip IPC times, because each message needs to be ACK'd before we can
send the next one. This certainly adds extra latency, and might
affect overall bandwidth as well (for larger replies with many
onDataAvailable calls), depending on the overheads involved.

And you're worried about the IPC/IPDL overhead being "more expensive"
than what? The current case where OnStart/OnData/OnStop are processed
as Runnable events in a single event loop?

Yes. Obviously we're going be to more expensive (3 IPC calls can't be
free), but there's a potentially big difference between issuing them
in a row versus waiting for an ACK between each.

I don't think there'll be throughput issues with double-buffering, but I
agree that we don't need to solve this to get started.

I agree that double-buffering doesn't inherently cause a throughput
issue here.

Jason

Chris Jones

2009-10-13 02:12:09 UTC

Post by Chris Jones
discard dataavailable;
send datadone goto DEAD;

That's fine. I'd prefer to not lose the "recv", so it would look
like
recv discard dataavailable;
But I don't care very much.

Post by Benjamin Smedberg
Explicitly acknowledging channel messages seems like a high burden
on total throughput.

OnStart/OnStop haven't entered into the discussion so far, so I'd
tentatively claim that you're arguing against a strawman. How do they
fit into the protocols that have been tossed around so far?

So OnStopRequest() is what bsmedberg has been referring to as
"datadone()"? (Sorry, no spreche Necko.) That's what I'm assuming below.

Post by Jason Duell
So we're looking at 3 IPC messages in a row, for a small HTTP reply
that comes in with one socket read(). One possibility is that we can
PARENT CHILD
-------------------------
OnStart --->
OnData --->
OnStop --->
We still have the overhead of 3 IPC messages, but they can be "in
flight" at the same time.

For all intents and purposes, we can treat the overhead of these 3
messages as the same as sending only 1. (The overhead should be about
the same already, and if it's not, then it's easy to optimize away.)

Post by Jason Duell
But under your proposed rule that one side can't send multiple
messages in a row in IPDL, we have to add acknowledgements to each
PARENT CHILD PARENT CHILD PARENT CHILD
-----------------------------------------------------
OnStart -->
moar -->
OnData -->
moar -->
OnStop -->
(I hope the ASCII art is comprehensible). Here we're adding two round-
trip IPC times, because each message needs to be ACK'd before we can
send the next one. This certainly adds extra latency, and might
affect overall bandwidth as well (for larger replies with many
onDataAvailable calls), depending on the overheads involved.

Right, this is an argument against a strawman.

Assuming I understand OnStart/OnStop correctly, then in IPDL as it
exists now your protocol might look something like

protocol NeckoChannel {
child:
OnStart(); OnData(); OnStop();
parent:
Moar(); Cancel();

// start in OPENING after creating channel
state OPENING:
send OnStart goto NORMAL;

state NORMAL:
send OnData goto SENT;
send OnStop goto STOPPED;
recv Cancel goto CANCELLED;

state SENT:
recv Moar goto NORMAL;
recv Cancel goto DEAD;

state STOPPED:
recv Cancel goto DEAD;

state CANCELLED:
send OnData goto DEAD;
send OnStop goto DEAD;

state DEAD:
recv delete;
};

In which case the timing diagram for your small HTTP response scenario
would be

PARENT CHILD PARENT CHILD
-----------------------------------------
OnStart -->
OnData -->
Moar -->
OnStop -->
Cancel-->
delete-->

It appears that we have an "extra" roundtrip with Moar/Stop. But
remember that the import performance measure here is critical path to
user-visible response. Speaking abstractly, the critical path for the
small HTTP response case is

network --> [buf = read(socket)] --> IPC --> parse/layout/draw

In this case, having the Moar/OnStop round-trip *doesn't* affect the
critical path, because the content process is parsing/laying out/drawing
the HTTP response while Moar/OnStop fly back and forth.

But wait, there's more! ;)

We could also *optimize away* the Moar/OnCancel round trip, assuming
that Necko already knows there's no more data available (i.e. response's
done or socket's closed) when it sends the first OnData() message.
Assuming this knowledge is available, then with the new message signature

async OnData(bool isMoreData, ...);

then we can do

PARENT CHILD
-----------------------
OnStart -->
OnData(false) -->
Cancel-->
delete-->

without race conditions and with no "extra" round trips. I should add
that this is somewhat of a moot point performance-wise, because removing
the "extra" Moar/OnStop doesn't alter the critical path affecting
user-visible responsiveness.

Post by Chris Jones
And you're worried about the IPC/IPDL overhead being "more expensive"
than what? The current case where OnStart/OnData/OnStop are processed
as Runnable events in a single event loop?

Yes. Obviously we're going be to more expensive (3 IPC calls can't be
free), but there's a potentially big difference between issuing them
in a row versus waiting for an ACK between each.

Yes, 3 consecutive "overlapping" async messages vs. 3 send/recv pairs
would be a huge difference. Like I wrote above, we can assume that the
former case is "free" compared to sending only 1 message.

Post by Chris Jones
I don't think there'll be throughput issues with double-buffering, but I
agree that we don't need to solve this to get started.

I agree that double-buffering doesn't inherently cause a throughput
issue here.

We could work this out in a timing diagram if you wish. The important
issue again is critical path to user-visible response, and double
buffering ensures that IPC isn't the bottleneck: the chrome process
fills a network buffer while the content process drains another, so
except in exotic cases, either network or parse/layout/draw will be the
bottleneck (common case, probably the latter).

Cheers,
Chris

Chris Jones

2009-10-13 02:29:40 UTC

Post by Chris Jones
But wait, there's more! ;)
We could also *optimize away* the Moar/OnCancel round trip, assuming
that Necko already knows there's no more data available (i.e. response's
done or socket's closed) when it sends the first OnData() message.
Assuming this knowledge is available, then with the new message signature
async OnData(bool isMoreData, ...);

Actually, I forgot that IPDL already has linguistic support for this
scenario (though it's not currently used). You could write

async OnData(...);

as you would otherwise, but then

//...
state NORMAL:
send OnData goto SENT, STOPPED;

which means that sending OnData can affect either a transition to SENT
*or* STOPPED, the C++ code must choose. Since C++ must choose the next
state, the IPDL compiler would generate a method like

SendOnData(..., State next)

and in the small HTTP response case when Necko knows there are no more
packets to deliver, it could invoke

SendOnData(..., STOPPED)

after which the content process would *only* be allowed to send Cancel()
then delete().

Another alternative would be to define a new |OnLastData()| message that
transitions to STOPPED.

async OnLastData(...);

state NORMAL:
send OnData goto SENT;
send OnLastData goto STOPPED;

and send OnLastData() in the case above where you would have sent
OnData(..., STOPPED).

Neither materially changes anything, but both of these ways feel a
little cleaner to me.

Cheers,
Chris

Chris Jones

2009-10-13 22:23:21 UTC

I just did two basic tests of IPC latency on my machine. The first was
a ping/pong test that measured the time of

Parent Child
------------- ------------
start timer
SendPing
RecvPing
SendPong
RecvPong
stop timer

On my machine, this round-trip takes about 200 microseconds.

The second test was a multi-ping/pong, measuring

Parent Child
------------- ------------
start timer
SendPing
SendPing
SendPing
SendPing
SendPing
RecvPing
SendPong
RecvPong RecvPing
SendPong
RecvPong RecvPing
SendPong
RecvPong RecvPing
SendPong
RecvPong RecvPing
SendPong
RecvPong
stop timer

(The stuff after the SendPing's happens concurrently in an undefined
order, that diagram is for illustration purposes only.)

On my machine, this takes about 400 microseconds.

So I'm comfortable sticking by the assertion that sending 3 consecutive,
overlapping async messages is basically free compared to sending only 1.

Cheers,
Chris

Jason Duell

2009-10-13 00:32:54 UTC

discard dataavailable;
send datadone goto DEAD;

That's fine. I'd prefer to not lose the "recv", so it would look
like

recv discard dataavailable;

But I don't care very much.

Post by Benjamin Smedberg
Explicitly acknowledging channel messages seems like a high burden
on total throughput.

OnStart/OnStop haven't entered into the discussion so far, so I'd
tentatively claim that you're arguing against a strawman. How do they
fit into the protocols that have been tossed around so far?

And you're worried about the IPC/IPDL overhead being "more expensive"
than what? The current case where OnStart/OnData/OnStop are processed
as Runnable events in a single event loop?

Yes. Obviously we're going be to more expensive (3 IPC calls can't be
free), but there's a potentially big difference between issuing them
in a row versus waiting for an ACK between each.

I don't think there'll be throughput issues with double-buffering, but I
agree that we don't need to solve this to get started.

I agree that double-buffering doesn't inherently cause a throughput
issue here.

Jason

Chris Jones

2009-10-09 07:31:20 UTC

Oh, I misread the protocol. Although it might be good/necessary to use a
single shmem buffer like this, it's not strictly necessary to prevent
racing, right?

I was imaging a double-buffering scheme in which, roughly speaking, the
parent fills a buffer at the same time the child is draining another
buffer. Then when the parent fills its buffer, it sends it to the child
with OnDataAvailable() (transferring "ownership"). Upon receiving that
buffer, the child sends back its drained buffer to the parent with
Moar(), and starts draining the buffer it had just received while the
parent starts filling buffer it received in Moar().

But in this case, regardless of the data transfer scheme, the Moar()
message /is/ necessary to prevent racing. See below.

Post by Benjamin Smedberg
Or to put it another way, is the following protocol is also
race-free and allows multiple dataavailable() calls in a row?
protocol Channel {
dataavailable();
datadone();
oncanceled()
cancel();
delete();
send dataavailable goto NORMAL;
send datadone goto DONE;
recv cancel goto CANCELED;
recv cancel goto DONE;
recv delete goto __deleted__;
send dataavailable goto CANCELED;
send datadone goto CANCELED;
send oncanceled goto DONE;
};

No, this is a racy protocol. BTW, I should have mentioned in the last
message that you can run these protocols through IPDL to check whether
they pass the race detector. (I have a script called "ipdlc" that
basically does |exec python $srcdir/ipdl.py $*|).

The reason this protocol is racy is exactly because it specifies that
the parent can send an indefinite number of dataavailable() messages in
a row. The child can send cancel() at any time during that indefinitely
long sequence of datavailable() messages, and if it does,
dataavailable() will arrive at the child in a state where it's not allowed.

IMHO specifying a protocol that successfully implements two-phase
deletion is the most complicated task. The C++ code is actually pretty
simple, see my upcoming response to jduell's post.

Cheers,
Chris

Chris Jones

2009-10-09 08:21:02 UTC

No, this is a racy protocol. BTW, I should have mentioned in the last
message that you can run these protocols through IPDL to check whether
they pass the race detector. (I have a script called "ipdlc" that
basically does |exec python $srcdir/ipdl.py $*|).
The reason this protocol is racy is exactly because it specifies that
the parent can send an indefinite number of dataavailable() messages in
a row. The child can send cancel() at any time during that indefinitely
long sequence of datavailable() messages, and if it does,
dataavailable() will arrive at the child in a state where it's not allowed.

Sorry, this isn't quite right. The protocol is indeed racy per IPDL's
rules, but the reason it's racy is that it allows the parent's and
child's states to get out of sync for more than one "transition". I
don't see any cases in which a message would arrive in a state where
it's not allowed.

This to me looks like another reasonable proposal for making it easier
to write race-free destructors (i.e. by loosening IPDL's
race-free-protocol rules). I need to think more carefully about the
implications of modifying the particular check that makes this protocol
fail the race detector, though. The basic problem is that this protocol
as written would allow the parent's and child's states to get out sync
for an indefinite number of messages, which doesn't seem desirable in
general.

Cheers,
Chris

Benjamin Smedberg

2009-10-09 13:10:47 UTC

Post by Chris Jones
Sorry, this isn't quite right. The protocol is indeed racy per IPDL's
rules, but the reason it's racy is that it allows the parent's and
child's states to get out of sync for more than one "transition". I
don't see any cases in which a message would arrive in a state where
it's not allowed.
This to me looks like another reasonable proposal for making it easier
to write race-free destructors (i.e. by loosening IPDL's
race-free-protocol rules). I need to think more carefully about the
implications of modifying the particular check that makes this protocol
fail the race detector, though. The basic problem is that this protocol
as written would allow the parent's and child's states to get out sync
for an indefinite number of messages, which doesn't seem desirable in
general.

Oh, I figured it wasn't a "race" until the states which were out of sync
actually caused disallowed messages... it seems to me that we want the
behavior that the two sides can have different states as long as all the
possible message states remain legal.

It does require that you compute not only the transitions from the current
state, but also the transitions from any future states which may be
asynchronously entered while the other side still thinks you're in a prior
state.

Explicitly acknowledging channel messages seems like a high burden on total
throughput.

--BDS

Chris Jones

2009-10-12 23:40:21 UTC

Oh, I figured it wasn't a "race" until the states which were out of sync
actually caused disallowed messages... it seems to me that we want the
behavior that the two sides can have different states as long as all the
possible message states remain legal.
It does require that you compute not only the transitions from the current
state, but also the transitions from any future states which may be
asynchronously entered while the other side still thinks you're in a prior
state.

I'd prefer not to make this change until we have a dire need. I think
it will make protocols harder to reason about, and I like the
|unchecked| and |discard| proposals better than this.

Post by Benjamin Smedberg
Explicitly acknowledging channel messages seems like a high burden on total
throughput.

I disagree, if you're referring to the double-buffering scheme I was
referring to earlier.

Cheers,
Chris

Benjamin Smedberg

2009-10-14 13:32:13 UTC

Post by Chris Jones
I'd prefer not to make this change until we have a dire need. I think
it will make protocols harder to reason about, and I like the
|unchecked| and |discard| proposals better than this.

Explicitly acknowledging messages has additional costs beyond just the IPC
latency and ping-pong times. Most importantly is that we currently have
programming models where we don't have to wait to fire multiple
onDataAvailable events. Since we can't completely change necko, the channel
parent class will have to accept "immediate" calls to
onDataAvailable/onStopRequest and create a queue for them. This is a fair
bit of complexity in the channel classes that we could avoid.

In addition, every time you require a ping pong you're not just dealing with
the IPC latency: you're also dealing with the latency of the event loop in
both processes.

It seems to me that we'd be better off putting more work into IPDL to allow
parent and child to do out-of-sync state transitions. We can prove that the
possible set of transitions is still theoretically sound, and at the same
time reduce the burden on protocol implementers.

--BDS

Chris Jones

2009-10-19 18:45:22 UTC

I'm not sure about this added complexity. Can't it be avoided by having
Necko "queue" OnDataAvailable events by appending the data to an
out-buffer? This would hide the IPC impl from Necko producers in the
parent process and consumers in the child process.

I think in the long run we might want something like this anyway, to
avoid spamming the content process with a lot of smallish buffers. But
I would agree that this a latency/throughput trade-off that might be
hard to get right. It seems worthwhile for us to make a simple
mathematical model of network latency and throughput in Mozilla. Do we
have something like this already?

Post by Benjamin Smedberg
In addition, every time you require a ping pong you're not just dealing with
the IPC latency: you're also dealing with the latency of the event loop in
both processes.

Sure, but double buffering (in common cases) will automatically adjust
itself so that IPC/event-loop latency isn't a bottleneck.

Post by Benjamin Smedberg
It seems to me that we'd be better off putting more work into IPDL to allow
parent and child to do out-of-sync state transitions. We can prove that the
possible set of transitions is still theoretically sound, and at the same
time reduce the burden on protocol implementers.

We could certainly do this, IPDL has all the information it needs. The
problem is that this could allow actor state to diverge permanently,
which I think would make protocols harder to understand.

If we end needing to add this kind of feature (it's still not clear to
me that we need it), then IMHO, jduell's |discard| proposal is the best
solution to the problem --- we keep the simplicity of "entangled" state,
and the programmer has to explicitly acknowledge that he understands the
race conditions and wants IPDL to drop the messages that would cause bad
transitions.

Cheers,
Chris

Jason Duell

2009-10-21 20:52:29 UTC

OK, Chris, you've convinced me that the extra 'moar' messages (and
round-trip
latency) is only needed when multiple OnDataAvailable events need to
happen.
And it seems likely as you suggest that these latencies are not likely
to be
noticeable given that parsing, etc. will be underway.

So performance doesn't seem to be a big consideration here, at least
for the
necko case. I still have usability concerns.

Post by Benjamin Smedberg
Explicitly acknowledging messages has additional costs beyond just the IPC
latency and ping-pong times. Most importantly is that we currently have
programming models where we don't have to wait to fire multiple
onDataAvailable events. Since we can't completely change necko, the channel
parent class will have to accept "immediate" calls to
onDataAvailable/onStopRequest and create a queue for them. This is a fair
bit of complexity in the channel classes that we could avoid.

I'm not sure about this added complexity. Can't it be avoided by having
Necko "queue" OnDataAvailable events by appending the data to an
out-buffer?

Um, that /is/ the additional complexity :) Yes, it sounds like
nothing much in
theory, but it's certainly going to be more work, and might be subtle
(we
currently report to listeners when OnDataAvailable is fired: would we
consolidate the notifications, too? Firebug might get less accurate
info about
network activity this way, for instance).

We could certainly do this, IPDL has all the information it needs. The
problem is that this could allow actor state to diverge permanently,
which I think would make protocols harder to understand.

I understand that this is a concern in theory, but am wondering how
likely it is
in practice. What's the "permanent" divergence here, and why is it so
bad if we
still detect/handle all transitions correctly? If we're talking about
a
trade-off between the protocol being a bit more subtle (but still
provably
correct), versus a "simpler" protocol (only in the sense that
divergence isn't
possible: but I still need to add application-orthogonal messages
like "moar"
to make it correct, i.e. it's not unambugiously "simpler" IMHO) that
makes me do more
work do accomodate IPDL, I'd much rather choose the first.

If we end needing to add this kind of feature (it's still not clear to me that
we need it), then IMHO, jduell's |discard| proposal is the best solution to
the problem --- we keep the simplicity of "entangled" state, and the
programmer has to explicitly acknowledge that he understands the race
conditions and wants IPDL to drop the messages that would cause bad
transitions.

A nit: my "discard" proposal was just to explicitly drop messages
when it's
known that they aren't needed (i.e. an OnDataAvailable arriving after
the child
has already Cancel'ed the the channel). I'm not sure what you mean by
"causing
bad state transitions", as the example given doesn't transition state
(we start
CLOSED and stay CLOSED).

Jason

Chris Jones

2009-11-09 23:36:12 UTC

Post by Jason Duell
OK, Chris, you've convinced me that the extra 'moar' messages (and
round-trip
latency) is only needed when multiple OnDataAvailable events need to
happen.
And it seems likely as you suggest that these latencies are not likely
to be
noticeable given that parsing, etc. will be underway.

Yes, that's one side of the coin. The other side is when network is the
bottleneck; in that case the critical path (--network --> [chrome]
--IPC--> [content]) is the same with or without double buffering.

I'm not sure about this added complexity. Can't it be avoided by having
Necko "queue" OnDataAvailable events by appending the data to an
out-buffer?

I can't really comment on this because know very little about Necko
code. bsmedberg?

We could certainly do this, IPDL has all the information it needs. The
problem is that this could allow actor state to diverge permanently,
which I think would make protocols harder to understand.

I can offer a hand-wavy argument about why I think that state divergence
makes protocols harder to reason about.

protocol Diverges {
state START:
send M1 goto S1;
recv M2 goto S2;

state S1:
send M1 goto S1;
recv M2 goto S1;

state S2:
send M1 goto S2;
recv M2 goto S2;
};

is a protocol that allows state to permanently diverge. When reasoning
about this protocol, I need to keep the "branch" of the divergence in
mind, and consider two paths (states synced, states not). Now, I could
take this same pattern, recursively graft it into S1 and S2, and now I
have two more branch points. So there are 6 stable states in the
protocol, and my C++ code has to keep track of two levels of possibly
different paths taken through parent/child. Then recursively graft ...

IOW, divergence can make reasoning about protocols a path-sensitive
operation, and my brain doesn't like those. Maybe this is a little too
abstract, but that's where I'm coming from.

Post by Chris Jones
If we end needing to add this kind of feature (it's still not clear to me that
we need it), then IMHO, jduell's |discard| proposal is the best solution to
the problem --- we keep the simplicity of "entangled" state, and the
programmer has to explicitly acknowledge that he understands the race
conditions and wants IPDL to drop the messages that would cause bad
transitions.

I think we're saying the same thing. I understood |discard| to be a
mechanism that allows you to tell the compiler exactly what you wrote
above, that a message is no longer needed.

More concretely, here's what I had in mind. If you run the IPDL
compiler on this protocol,

protocol Channel {
child:
dataavailable(); datadone();
parent:
cancel(); delete();

state NORMAL:
send dataavailable goto NORMAL;
send datadone goto DONE;
recv cancel goto CANCELED;

state DONE:
recv cancel goto DEAD;

state CANCELED:
send datadone goto DEAD;

state DEAD:
recv delete goto DEAD; // ignore |goto DEAD| for now
};

it will complain thusly

/home/cjones/ipdl/Channel.ipdl:10: error: in protocol `Channel' state
`NORMAL', trigger `dataavailable' potentially races (does not commute)
with `cancel'
Specification is not well typed.

With |discard|, you could tell IPDL that "Hey, if I the child receives
dataavailable() in CANCEL, I want to pretend like it never happened.
Drop dataavailable() on the floor." You would communicate this thusly

...
state CANCELED:
discard send datavailable;
send datadone goto DEAD;
...

and the IPDL type checker would shut up about that particular race
condition, and your C++ code would never see dataavailable() in the
CANCELED state. Does that make more sense?

Cheers,
Chris

Benjamin Smedberg

2009-11-11 15:26:48 UTC

Post by Jason Duell
Um, that /is/ the additional complexity :) Yes, it sounds like
nothing much in
theory, but it's certainly going to be more work, and might be subtle
(we
currently report to listeners when OnDataAvailable is fired: would we
consolidate the notifications, too? Firebug might get less accurate
info about
network activity this way, for instance).

I can't really comment on this because know very little about Necko
code. bsmedberg?

It seems desirable to me that we do not require acknowledging every message:
when the channel has data available, it sends that data along and doesn't
cache it.

I slightly prefer the divergent-path approach rather than the discard
approach, for one main reason: it means that we don't have to implement
special handling for actors which have already been "destructed" but are
still receiving messages.

I don't think my opinion is strong enough to *insist* on the divergent-path
approach, though. If we do go with the discard-bad-messages approach, I'd
like a very solid approach torwards zombie actor IDs.

In any case, I'm also concerned with getting a theoretically sound approach
implemented for plugin destructor races for 1.9.3.

--BDS

Benjamin Smedberg

2009-11-11 15:56:02 UTC

Post by Benjamin Smedberg
I slightly prefer the divergent-path approach rather than the discard
approach, for one main reason: it means that we don't have to implement
special handling for actors which have already been "destructed" but are
still receiving messages.
I don't think my opinion is strong enough to *insist* on the divergent-path
approach, though. If we do go with the discard-bad-messages approach, I'd
like a very solid approach torwards zombie actor IDs.
In any case, I'm also concerned with getting a theoretically sound approach
implemented for plugin destructor races for 1.9.3.

I discovered another edge-case that needs consideration: when you pass an
actor as a parameter, it must be in a state that does not race with destruction.

--BDS

Chris Jones

2009-11-19 20:22:55 UTC

I discovered another edge-case that needs consideration: when you pass an
actor as a parameter, it must be in a state that does not race with destruction.

This is a great point, I thought about this a while back. It's
checkable (a finite-state problem modulo RPC stacks), but there are two
issues (i) we need more powerful machinery, in the form of the SPIN
model checker; and (ii) the way we've been passing actors around so far,
Msg(fooActor), will result in a 100% false-positive rate because
|fooActor| syntax means "fooActor may be in any state." We need to
start passing actors as |fooActor:STATE| before cross-protocol checks
would be useful. (That feature is slightly broken in IPDL right now,
but easy to fix.)

Cheers,
Chris

Chris Jones

2009-11-19 21:08:48 UTC

Post by Benjamin Smedberg
I slightly prefer the divergent-path approach rather than the discard
approach, for one main reason: it means that we don't have to implement
special handling for actors which have already been "destructed" but are
still receiving messages.
I don't think my opinion is strong enough to *insist* on the
divergent-path
approach, though. If we do go with the discard-bad-messages approach, I'd
like a very solid approach torwards zombie actor IDs.
In any case, I'm also concerned with getting a theoretically sound approach
implemented for plugin destructor races for 1.9.3.

I discovered another edge-case that needs consideration: when you pass an
actor as a parameter, it must be in a state that does not race with destruction.

(ii) the way we've been passing actors around so far,
Msg(fooActor), will result in a 100% false-positive rate because
|fooActor| syntax means "fooActor may be in any state."

That's actually not entirely true ... more specifically, it would result
in a positive (error) any time an actor was passed from A-->B when a
dtor might possibly go from B-->A. I imagine the false positive rate in
this situation would be high.

Cheers,
Chris

Chris Jones

2009-11-19 20:20:26 UTC

I can't really comment on this because know very little about Necko
code. bsmedberg?

when the channel has data available, it sends that data along and doesn't
cache it.

Hmm, OK. From my limited perspective of not knowing necko code, I still
think that the burden of queueing DataAvailable() favors well against
relaxing IPDL checks. But if we really want them to be immediately
forwarded, we'll need either |discard| or divergent state, see below.

You mean for stateless protocols? We don't need this special handling
for necko channels, the destruction sequence isn't racy with both
explicit ACK and |discard|.

Post by Benjamin Smedberg
I don't think my opinion is strong enough to *insist* on the divergent-path
approach, though. If we do go with the discard-bad-messages approach, I'd
like a very solid approach torwards zombie actor IDs.

Divergent state doesn't help inherently racy protocols, because by
definition those are stateless protocols. I haven't fully worked out
the details of keep-alive for stateless protocols, but the basic idea
would be to keep the actor alive on the side(s) sending __delete__ until
the other side ACKs the delete, regardless of async/sync/RPC __delete__.
(The ACK is the part I haven't fully worked out.)

Cheers,
Chris

Chris Jones

2009-11-19 21:34:03 UTC

You mean for stateless protocols? We don't need this special handling
for necko channels, the destruction sequence isn't racy with both
explicit ACK and |discard|.

... or divergent state, I forgot to add. All three of these proposals
were made explicitly to avoid racy deletion.

Cheers,
Chris

Jason Duell

2009-11-11 23:26:20 UTC

Post by Chris Jones
I can offer a hand-wavy argument about why I think that state divergence
makes protocols harder to reason about.
is a protocol that allows state to permanently diverge. When reasoning
about this protocol, I need to keep the "branch" of the divergence in
mind, and consider two paths (states synced, states not). Now, I could
take this same pattern, recursively graft it into S1 and S2, and now I
have two more branch points. So there are 6 stable states in the
protocol, and my C++ code has to keep track of two levels of possibly
different paths taken through parent/child. Then recursively graft ...

Perhaps we could set a limit on how many such divergences a protocol
is allowed
to have. We certainly don't need to support the sort of example
you're giving.

Post by Chris Jones
when the channel has data available, it sends that data along and doesn't
cache it.

I agree with bsmedberg here. But I can start by seeing how much of a
pain it is
to cache data if we can't modify IPDL (or do it in time: it hopefully
won't be
long).

Post by Chris Jones
I understood |discard| to be a mechanism that allows you to tell the compiler
exactly what you wrote above, that a message is no longer needed.

Your proposal for |discard| sounds fine.

Jason

Chris Jones

2009-11-19 20:40:41 UTC

BTW, I think a better way of saying this is "with divergent state,
reasoning about the protocol becomes an O(n^2) operation in the number
of states, rather than O(n)."

Post by Jason Duell
Perhaps we could set a limit on how many such divergences a protocol
is allowed
to have. We certainly don't need to support the sort of example
you're giving.

Seems like a slippery slope to me, since that limit would have to
monotonically increase with time.