A minimal timing and synchronization model

After having spent some time surveying issues related to managing the flow of time in programming, we are ready to propose a minimal timing and synchronization model of our own, built on a very small number of constructs and primitives, but intended to be expressive enough to represent a wide variety of patterns while being simple enough to be relatively easy to implement and extend in a variety of different environments. Our model is based on logical time, instantaneous computations, and cooperative scheduling of lightweight threads, or fibers. This presentation will be very abstract but it will be followed shortly by the concrete description of a practical browser-based implementation; it is a also snapshot of the current state of the model, which is still under development.

I. Time and computations. We chose to represent time as a discrete set of instants t. An instant can be definite, indefinite, or unresolved (this terminology is borrowed from SMIL). Resolved (definite or indefinite) values should form a total order and have an additive and a multiplicative operation, so we chose to use non-negative real numbers for definite values, extended with +∞ for indefinite instants. Unresolved values are just unique values distinct from resolved values.

A duration is a value d defined as the difference between two instants. Durations are also definite, indefinite, or unresolved. When resolved, durations are never negative, that is, durations are only defined between a later instant and an earlier instant. When the later instant is indefinite, then the duration is indefinite; when the later instant is unresolved, then the duration is also unresolved.

Following the synchronous model of the ChucK and Esterel programming languages, we loosely define a computation as an operation, or sequence of operations, that can be perceived as running uninterrupted within an instant. This can be as simple as a single CPU instruction like adding or branching, or calling an arbitrarily complex function, itself calling itself or other functions, in a high-level programming language. Whatever the actual complexity of the computation, its duration is always zero; it may produce effects during execution, and result in a value or an error. An ordered sequence of computations C can be associated with an instant: every computation c ∈ C is executed at the associated time t.

By ordering computations within an instant, it is possible to order all computations that happen at a definite time: first by instant, then by order within the sequence of computations. Computations associated with an indefinite or unresolved time are never executed, but an unresolved value may eventually be resolved as the effect of another computation; if the resolved value is an instant that is already associated with a sequence of computations, the computations at the the previously unresolved time are appended at the end of the sequence, maintaining the ordering of computations.

II. Synchronizing computations. Based on these concepts of time and computations, we can now define synchronization mechanisms to manage the ordering and timing of computations. A fiber is an ordered sequence of zero or more instructions, where an instruction is either a computation or one of three synchronization primitives (Delay, Spawn, Join). A fiber can be scheduled to begin in a given instant; after a fiber begins, its instructions are executed in order, and the fiber ends when its last instruction does.

Delay allows a fiber to suspend execution (i.e., wait) for a given duration before executing its next instruction. Waiting for an indefinite or unresolved duration means that execution is suspended indefinitely. An unresolved duration may eventually become resolved and a delay duration may change during execution.

Spawn allows a fiber to create a new child fiber scheduled to begin in the same instant. Fibers are organized in trees: a fiber may spawn multiple children, and may have a single parent (the fiber that they were spawned from). While the child fiber begins in the same instant that it was spawned, spawning itself is an instantaneous computation with no duration, so the parent fiber keeps executing, possibly spawning other fibers, before the child fibers actually begin. Child fibers that were spawned in a given instant t do begin as soon as the parent fiber is suspended or ends, in the order in which they were spawned.

Join is the counterpart to Spawn and allows a fiber to wait until all of its children have ended. If no child fiber was previously spawned, then joining is instantaneous and the fiber resumes execution immediately. Otherwise, the parent fiber is suspended for an unresolved duration. When a fiber ends, it gets removed from the list of children of its parent (if any). If there is a parent and it is joining, the parent is notified and given the opportunity to execute a computation immediately (i.e., in the same instant, and before any other computation scheduled in that instant) to handle the effect of that child joining. When the last child has ended, the join ends and the parent fiber, which has no children anymore, resumes execution immediately.

Fibers provide the basis for sequential execution of computations. Delay allows for asynchronous execution. Spawn and Join are the building blocks for concurrent execution: they often go together, as a fiber can spawn multiple children to execute subtasks concurrently then join to wait for all subtasks to finish before resuming; but it is possible to spawn a child fiber and not join, so that both fibers keep running independently.

A cyclops is trying out
pairs of glasses handed to him by a shop assistant. — “Hmmm... I don’t know... there’s something not quite right with these too...”

Error handling is another kind of synchronization mechanism. A fiber executes its instructions one after the other as long as no error occurs. If a computation ends in error, then the fiber resumes execution but is failing, and every subsequent instruction is treated as a no-op computation, with no effect and no duration. This results in a failing fiber ending as soon as an error occurs; the parent fiber is then notified, and if it is joining, may decide how to handle the error from the child fiber (e.g., by failing itself, ignoring the error, initiating a fallback, &c.) The reason why we still go through the motions of executing the instructions of a failing fiber instead of simply ending right there and then is that it is possible to protect instructions from failure: a protected instruction will still execute even if its fiber is failing, providing an opportunity for error handling and recovery. Note that protection is not an instruction or computation, but a property of an instruction.

Errors happen during computations, but it is also possible that a fiber may fail another suspended fiber, as we will see below. In that case, the failing fiber resumes execution immediately, unless the instruction that caused the fiber to be suspended in the first place (a Delay or a Join) was protected. If that instruction was a Join and was not protected, the child fibers fail as well, so that the parent eventually resumes execution. This behaviour can be used for cancelling other fibers, usually children, by using a special Cancel error (an error that can be distinguished from all other possible erros).

For example, a timeout can be implemented by spawning two fibers. The first fiber can execute any series of instructions. The second fiber only has one Delay with a definite duration. The parent fiber then joins, and as soon as one child fiber ends, cancels the other one by making it fail with a Cancel error. If the first fiber ends before the timeout delay expires, then the timeout is cancelled and execution continues normally: the task completed before the timeout expired. If on the other hand the delay ends first, then the task was still in progress and is cancelled, and execution can resume with an error or a default value.

Because cancelling is just failing a fiber, using protected instructions judiciously leads to different patterns. In the previous timeout scenario, protecting the Delay in the second fiber means that whatever task is undertaken by the first fiber, execution will always take the same amount of time: if a result is provided before the timeout expires, then cancelling the other fiber has no effect and the join will only end when that fiber ends, but if the timeout expires before the task is complete, then the first fiber will be cancelled. Either way, the resulting duration is always that of the delay. Another use of protection is to give a fiber a chance to end gracefully even in the case of errors, including cancellation; if the task in the timeout scenario requires some cleaning up at the end (such as freeing resources), the last instructions can be protected to make sure that they are always executed, whether or not execution ends succesfully.

This article may be a little dry, so here is a 5-second racing microgame adapted from an example of using the browser implementation of the model. Click the game window to (re)start the game, then use the up and down arrow keys to move your car.

III. Scheduling computations. We have described all but one of the components of our model: instants, computations, delays, fibers, spawning and joining, and error handling. The last piece of the puzzle is the scheduler, which bridges the gap between fibers and instants, ensuring that computations from the different fibers are executed at the right time and in the right order; but also translates the logical time of the model into the actual time experienced by the user of the program, acting as the interface between the idealized environment of the model and the real world in which programs actually run. It also provides specific computations for fibers to affect other fibers, like failing a fiber (e.g., for cancellation) or changing the duration of a delay (resolving an unresolved duration, extending or shortening a resolved duration).

Our model is an example of cooperative multithreading (which is one reason why the term fiber was chosen in opposition to thread, which is generally associated with a preemptive model). Scheduling is relatively straightforward: once a fiber is scheduled to begin at a given instant, start executing the computations on that fiber until it explictly yields, which happens when encountering a Delay or a Join, or it ends. The scheduler simply needs to keep a queue of fibers ordered by the time at which they are scheduled to begin or resume; when the first instant happens, the fiber at the head of the queue is removed and it starts executing. If the fiber yields, it is put back in the queue at a new instant (following a delay with a definite duration), or stored in a pool of fibers suspended indefinitely; the scheduler can then move to the next fiber in the queue.

Actually running a program using this model requires chosing a timescale for durations; for instance, if our application has a graphical user interface, we may want to think of durations at the millisecond scale. A Delay of 500 would then translate to half a second of physical time; a fiber scheduled at t = 7777 would begin executing 7.777s after the main fiber began. Note that the timescale does not need to fixed from one execution to the next, or even through one execution. We will discuss time manipulation more thoroughly in a forthcoming article, but one aspect that we can already mention is that the scheduler is not a clock: it may be driven by a running clock, but that clock may be paused and resumed; and the clock could be bypassed altogether in favor of of feeding time values to the scheduler at an arbitrary rate.

This model is designed to be as small as possible in order to keep complexity in check, while still making it possible for interesting emergent behaviours to emerge; this was achieved by removing any feature that can be realized using the existing primitives. For example, we can think of asynchronous function calls as a sequence of three instructions: a computation that initiate the call, returning a future (or promise, or some similar value) synchronously; a Delay with an unresolved duration; and a computation that handles the eventual value or error. The scheduler manages the actual asynchronous function call and resolves the duration of the delay when it completes. DOM events in a browser environment can be implemented in a very similar fashion (setting up the listener and resuming when an event notification is received). These features may need to be provided by a practical implementation, but using the model ensures that they fit consistently with the actual primitives.

Another example of a feature that may seem fundamental but can also be completely defined in terms of the primitives of the model is repetition. We can define a new Repeat instruction as a pair of Spawn and Join: first, we spawn a new fiber for the body of the loop; then join. Recall that a parent fiber may execute a computation when a child fiber ends; here, that computation consists of spawning the child fiber one more time. Spawning the fiber again prevents the join from ending, since it still has a child fiber, and the same fiber will repeat over and over again.

Interestingly, our model makes it easy to limit the duration of a loop (e.g., by using a timeout as discussed above), but not the number of iterations! This can be addressed by modifying Repeat to keep track of the number of times the child fiber was spawned, and evaluate a predicate (i.e., any computation resulting in a boolean value) before spawning, and only spawn if the predicate is true. If the fiber is not spawned, then the Join ends immediately as there is no child fiber, and the loop ends. Adding this test before even the first iteration means that we can also use Repeat to conditionally spawn any fiber, keeping with our minimalisitic mindset. ⚄⚃