next up previous contents
Next: 2 Code Samples Up: HAFTA Checkpoint Library Architecture Previous: Contents   Contents

Subsections

1 Basic Program Structure

The Checkpoint Library provides a program with two levels of structure. The sequence level, and the checkpoint level.

1.1 Sequences

A sequence consists of an associative array style data structure containing a list of functions that the sequence will call. Each set of functions in the list, known as a node, contains a normal function, a rollback function, and a policy function.

A sequence is invoked similarily to the way one might invoke a single function, except that it is done through the API of the checkpoint library.

1.1.1 The important features of the sequence structure

The progression through the nodes in the sequence is linear or branching. It is not a dependancy map.

The nodes are anonymous and uniform. There is no need to differentiate between setup, running, or teardown functions.

The nodes are self-directing. The execution sequence is indicated to the checkpoint library by the policy function, which is described below.

Sequences are independant and nestable. Inter-sequence dependancies are left to the domain of the developer's code.

When the system transitions from a node, the node will be pushed onto the stack. This allows a node's policy function to roll back to a previous node should it determine that an error can't be corrected in the context of the current node.

Errors encountered during the execution of the sequence are reported by the developer's code to the checkpoint library. The checkpoint library then acts upon those errors by consulting the policy function.

1.1.2 The normal function

Each node in the sequence must have a normal function. This function contains the code that is called when the previous node requests the invocation of that node.

The normal function is called with 2 arguments:

The user data pointer is a value passed to the sequence when it is invoked. This argument is the same across all function calls in the sequence. This pointer has no specific meaning to the checkpoint library, it should be used to point to a data area for the sequence to use during its execution.

The sequence object is used by the checkpoint library calls in the developer's code to determine which sequence the calls are being made from. The developer should not modify or use any data in this structure. The object should be mearly passed around to the checkpoint library functions.

The normal function can exit in one of 3 ways.

1.1.3 The rollback function

Each node in the sequence that modifies the program state must have a rollback function. If the node does not modify the program state, this function can be excluded.

The rollback function is called when the policy function requests the node be rolled back, either to be retried, or to continue rolling back to the previous node.

The rollback function is called with 3 arguments:

The checkpoint value is maintained by the normal function and the rollback function. It is designed to keep track of which resources have been successfully allocated. It is further described in the Section 1.2 of this document.

When the rollback function is called, it is expected to undo the changes the normal function made to the program state.

The rollback function can exit in one of 3 ways.

1.1.4 The policy function

Each node in the sequence has a policy function. If no policy function is provided, a default policy function implemented by the checkpoint library will be used.

The policy function is called in 4 cases:

The developer's policy function can handle all, any, or none of these conditions. The developer's policy function must be designed in such a way as to fall through to the default policy function on unhandled or unknown event types.

When called, the policy function is expected to do one of 3 things:

The policy function must not do anything except direct the flow of control of the sequence.

The policy function is called with 4 arguments:

The user data pointer and sequence object parameters are identical to those in the normal and rollback functions.

The event parameter is a value indicating what prompted the invocation of the policy function - The rollback function failing, or the normal function exiting successfully, for example.

The policy_data argument is the value passed by the call that prompted the invocation of the policy function. For example, doing return HC_NormalFail(sequence, 5) will present '5' in the policy_data argument.

The default policy function, when called indicating the normal function has exited successfully, will make a call to return HC_Normal(sequence, policy_data), which will invoke the node in the sequence indicated by the policy_data provided by the just exited normal function.

When the default policy function is invoked indicating that the normal or rollback functions exited unsuccessfully, it will do return HC_RollbackCurrent(sequence) up to 5 times to request that the current node be rolled back and retried.

After the 5th failure in the normal function, the default policy function will do return HC_RollbackPrev(sequence), which will invoke the rollback function of this node, and then proceed to roll back the previously completed node.

After the 5th failure in the rollback function, the default policy function will call HC_Panic() to terminate the program.

If the current node has no rollback function, the default policy function will not attempt to roll back or retry the node, and will immediatly default to rolling back the previous node.

When the default policy function is invoked indicating that the next node has requested that this node be rolled back, it will proceed the same as if the normal function had exited unsuccessfully.


1.2 Checkpoints

Checkpoints are a facility for allowing the rollback of a partially completed function. Checkpoints are a strictly linear enumeration of the resources allocated by the programmer's code.

The value of the last checkpoint set by a particular normal or rollback function is stored, then presented to the rollback function so that the programmer's rollback code can know what resources it needs to free and/or deconfigure.

1.2.1 The important features of the checkpoint structure

The progression through the checkpoints is strictly linear, and as such is 100% predictable. Branching is handled in the scope of the sequence structure, not the checkpoints.

The checkpoint system is designed to be inobtrusive and lightweight. It does not directly affect the flow of execution.

1.2.2 Details

During the course of the operation of the normal function, a resource or a number of resources may need to be allocated. These allocations are marked, by the developer's code, upon successful allocation, by a checkpoint library HC_Checkpoint() call.

If, for any reason, the normal function has an error during it's operation, the checkpoint library will call the rollback function, if one is available, and provide it with the last value that was passed to the checkpoint library HC_Checkpoint() call. Using this value, the rollback function will be able to determine which resources need to be deconfigured or deallocated to restore the program and sequence state to what it was before the normal function was called.

The gravity of this design is to structure the rollback function as a single switch statement containing a fall-through reverse-order list of checkpoint value cases.


next up previous contents
Next: 2 Code Samples Up: HAFTA Checkpoint Library Architecture Previous: Contents   Contents
2003-01-03