A minimal timing and synchronization model

Logo

After having spent some time surveying issues related to managing the flow of time in computing, we now propose a minimal timing and synchronization model of our own, built on a very small number of constructs and primitives, intended to be expressive enough to represent a wide variety of patterns, while being simple enough to be relatively easy to implement in a variety of environments. Our model is based on logical time, instantaneous computations, and cooperative scheduling of lightweight threads. This presentation will be very abstract but it will be followed by the concrete description of practical browser-based implementation.

I. Time and computations. We chose to represent time as a discrete set of instants t, where t can be definite, indefinite, or unresolved (this terminology is borrowed from SMIL). A definite time is a non-negative real number t ∈ ℝ+ (or [0, +∞[), an indefinite time is +∞ (here we use the extended real numbers line for resolved times), and an unresolved time is a unique value u from an infinite set U of unresolved values.

A duration is a value d ∈ [0, +∞] ∪ U that is defined as the difference between two instants. Durations are also definite, indefinite, or unresolved. When resolved, durations are never negative, that is durations are always defined between a later instant and an earlier instant. When the later instant is indefinite, then the duration is indefinite; when the later instant is unresolved, then the duration is also unresolved.

Following the synchronous model of the ChucK and Esterel programming languages, we loosely define a computation as an operation or sequence of operations that is executed uninterrupted within an instant. This can be as simple as a single CPU instruction like adding or branching, or calling an arbitrarily complex function in a high-level programming language. Whatever the actual complexity of the computation, its duration is always zero; it may produce effects during execution, and result in a value or an error. An ordered sequence of computations C can be associated with an instant, meaning that every computation cC is executed at the associated time t.

By ordering computations within an instant, it is therefore possible to order all computations that happen at a definite time: first by instant, then by order within the sequence of computations. Computations associated with an indefinite or unresolved time are never executed, but an unresolved value may eventually be resolved as the effect of another computation. If the resolved value is an instant that is already associated with a sequence of computations, the computations at the the previously unresolved time are appended at the end of the sequence, maintaining the ordering of computations.

II. Synchronizing computations. Based on these concepts of time and computations, we can now define synchronization tools to manage the ordering and timing of computations. A fiber is an ordered sequence of zero or more instructions, where an instruction is either a computation or one of three synchronization primitives (Delay, Spawn, and Join). A fiber can be scheduled to begin at a given time; its instructions are executed in order until the fiber ends at the time when its last instruction does.

Delay is a primitive that allows a fiber to suspend execution, i.e., wait, for a given duration, before executing its next instruction. Waiting for an indefinite or unresolved duration means that execution is suspended indefinitely, but unresolved durations may become resolved so execution may eventually resume.

Spawn is a primitive that allows a fiber to create a new child fiber and schedule to begin in the same instant. Fibers are thus structured as trees, as a fiber may spawn multiple children, and may have a single parent (the fiber that they were spawned from). While the child fiber begins in the same instant that it was spawned, spawning itself is instantaneous, so the parent fiber keeps executing, possibly spawning other fibers, before the child fibers actually begin. Child fibers that were spawned in a given instant t do begin as soon as the parent fiber is suspended or ends, in the order in which they were spawned. Since Spawn has no duration, it can be considered a regular computation.

Join is a primitive that allows a fiber to wait until all of its children have ended. If no child fiber was previously spawned, then joining is instantaneous and the fiber resumes execution immediately. Otherwise, the parent fiber is suspended for an unresolved duration. When a fiber ends, it gets removed from the list of children of its parent (if any). If there is a parent and it is joining, it is then notified and given the opportunity to execute a computation immediately (i.e., in the same instant, and before any other computation scheduled in that instant) to handle the effect of the child joining. When the last child has ended, the join ends and the parent fiber, which has no children anymore, resumes execution immediately.

Fibers provide the basis for sequential execution of computations. Delay allows for asynchronous execution. Spawn and Join are the building blocks for concurrent execution: they often go together, as a fiber can spawn multiple children to execute subtasks concurrently then join to wait for all subtasks to finish before resuming; but it is possible to spawn a child fiber and not join, so that both fibers keep running independently.

Error handling is another synchronization mechanism. A fiber executes its instructions one after the other as long as no error occurs. If a computation ends in error, then the fiber resumes execution but is failing, and every subsequent instruction is treated as a special Nop computation, which has no effect and no duration. In effect, a failing fiber ends as soon as an error occurs; the parent fiber is then notified, and if it is joining, may decide how to handle the error from the child fiber (e.g., by failing itself, or by ignoring the failure).

The reason why we still go through the motions of executing the instructions of a failing fiber instead of simply ending right there is that it is possible to protect instructions from failure. A protected instruction will still execute even if its fiber is failing, which provides an opportunity for error handling and recovery. It is also possible that an error occurs while a fiber is suspended (as we will see next), in which case execution of the fiber resumes in the instant that the error occurs in a failing state, unless the instruction that led to the suspension (a Delay or Join) was protected.

While errors, like computations, are left purposefully underspecified, the model defines a specific Cancel error (distinct from all other possible errors) which can be used to prevent a fiber from continuing execution. For example, a timeout can be implemented by spawning two fibers; the first fiber can execute a series of instructions that is expected to have a non-zero duration, while the second fiber only has a Delay for a definite duration. The parent fiber then joins after having spawned these two child fibers, and as soon as one fiber ends, cancels the other one by making it fail with a Cancel error. If the first fiber ends before the timeout delay expires, then the timeout is cancelled and execution continues normally. If on the other hand the delay expires first, then the task in progress is cancelled and execution can resume with an error or a default value.

Because cancelling is just failing a fiber, using protected instructions judiciously allow various behaviours. In the previous timeout scenario, protecting the Delay in the second fiber means that whatever task is undertaken by the first fiber, execution will always take the same amount of time: if a result is provided before the timeout expires, then cancelling the other fiber has no effect and the join will only end when that fiber ends. But if the timeout expires before the task is complete, then the first fiber will be cancelled. Another use of protection is to give a fiber a chance to end gracefully even in the case of error such as cancellation; if the task in this scenario requires some cleaning up at the end, the last instructions can be protected to make sure that they are always executed, whether or not execution ends succesfully.

This article may be a little dry, so here is a 5-second racing microgame made using the model (source code). Use the up and down arrow keys to move your car.

III. Scheduling computations. We have described all of the components of our model: instants, computations, delays, fibers, spawning and joining, and error handling. The last piece of the puzzle is the scheduler, which has two main responsibilities: first, to bridge the gap between fibers and instants, ensuring that computations from the different fibers are executed at the right instant and in the right order; second, to translate the logical time of the model into the actual time experienced by the user of the program, acting as the interface between the idealized environment of the model and the real world in which programs actually run.

The model being described here is an example of cooperative multithreading (which is one reason why the term fiber was chosen in opposition to thread, which is generally associated with a preemptive model). Scheduling is relatively straightforward: once a fiber is scheduled to begin at a given instant, start executing the computations on that fiber until it explictly yields, which happens when encountering a Delay or a Join, or it ends. Therefore the scheduler simply needs to keep a priority queue of fibers, where the priority is the instant at which the fiber is scheduled to begin or resume; when the first instant happens, the first fiber is removed from the queue and starts executing. If the fiber yields, it can be put back into the queue at a new instant (following a dealy with a definite duration), and the scheduler can move to the next fiber in the queue.

Spawning a fiber adds the new fiber to the front of the queue (but after its siblings that were spawned in the same instant, so that they run in the order in which they were spawned); similarly, joining the last child fiber puts the parent fiber back at the front of the queue as it resumes execution as soon as its children have ended. The whole system can be initialized by scheduling a main fiber at instant 0, which acts as the entry point to the model, just like the main function of many programming languages.

Actually running a program using this model requires chosing a timescale for durations; for instance, if our application has a graphical user interfaces, we may want to think of durations in milliseconds; a Delay of 500 would then translate to half a second of physical time; a fiber scheduled at t = 7777 would begin executing 7.777s after the main fiber began. Note that the timescale does not need to fixed from one execution to the next, or even through one execution. (We will discuss time manipulation more thoroughly in a future article).

This model is designed to be as small as possible, which was achieved by removing any feature that can be realized using the existing primitives. For example, we can think of asynchronous function calls as a sequence of three instructions: a computation that initiate the call, which returns a future (or promise, or some similar value) synchronously; a Delay with an unresolved duration; and a computation to handle the eventual value or error. The scheduler manages the actual call and resolves the duration of the delay when the call completes. DOM events in a browser environment can be implemented in a very similar fashion (setting up the listener, waiting, and resuming with the actual event). These features do need to be provided by a practical implementation, and do require some effort, but using the model ensures that they fit consistently.

Another example of a feature that may seem fundamental but can also be completely defined in terms of the primitives of the model is repetition. We can define a new Repeat instruction as a pair of Spawn and Join: first, we spawn a new fiber for the body of the loop; then join. Recall that a parent fiber may execute a computation when a child fiber ends; here, that computation consists of spawning the child fiber one more time. Spawning the fiber again prevents the join from ending, since it still has a child fiber, and the same fiber will repeat over and over again.

Interestingly, our model makes it easy to limit the duration of a loop (e.g., by using a timeout as discussed above), but not the number of iterations! This can be addressed by modifying Repeat to keep track of the number of times the child fiber was spawned, and evaluate a predicate (i.e., any computation resulting in a boolean value) before spawning, and only spawn if the predicate is true. If the fiber is not spawned, then the Join ends immediately as there is no child fiber, and the loop ends. Adding this test before even the first iteration means that we can also use Repeat to conditionally spawn any fiber, keeping with our minimalisitic mindset. We will discuss this in more detail when we talk about how to implement the model. ⚁⚃