The common database designs all have certain properties regarding throughput, latency and client interdependence.
In this model, client processes must follow a periodic access model. At a client-defined time interval each client requests the current values for all interesting data points and then performs its computation on that new data set. If the client needs to respond to changes in data, it must still read its entire data set and synthesize the data change events. This requires that the client maintain two copies of the data, one current and one previous so that the comparison can be made. This model is very inefficient. Not only does the client have to read and scan the entire data set, but it must also pay the price of message passing for each data item. One optimization that can help to reduce message passing overhead is to request many data items per message to the database.
Since all clients must continuously poll the database for new information, the system design must include a determination of the polling rates for all clients. High polling rates cause high message traffic and large drains on the CPU. Many cooperating clients tend to compete for the CPU resources. Reducing the polling rates of some clients in order to accommodate the others normally solves this competition. This polling rate determines the latency of a client. Latencies in this model are typically quite high. Since the polling rates of the clients must be "tuned" against the steady state system, it is difficult to know what the system performance will be without actually implementing it. If the system is subsequently modified, these tuned polling rates may no longer apply. The system in this scenario can be fragile with respect to change.
A message passing database will usually be more costly for the posting of bulk data. The process that posts the data must transmit each data value as an individual item, or accumulate several data changes into a single message for transmission. When many points change at the posting process, more than one message is usually necessary. This is somewhat mitigated by the fact that only changed values need to be transmitted. Several changes posted in a single message can propagate to the receiving clients as a single message, reducing messages again while preserving the bulk nature of the data to some extent.
A message passing database will usually be slightly less efficient than a shared memory database for posting point data, as the message pass time will be slightly slower than the time spent locking and unlocking a semaphore. Depending on the nature of the clients of the data, the time taken to lock the semaphore may actually be greater than the time spent passing a message.
Some processes are poorly suited to a periodic database. These are the data-driven processes. A good example is that of user interface. In a user interface, the latency of update to the screen may be permitted to be high, but the latency of control is not. When a user presses a button on the screen, he is expecting immediate response from his control system. Some control systems can stand long latencies here, but many cannot. Similarly, if there is feedback to the screen, then the acceptable latency of response on the user interface is very low. This implies high polling rates, and therefore high CPU requirements. It is generally bad policy to have the user interface competing with the rest of a control system for resources. Many control programs also function better in a data driven environment. For example, an alarm monitor process should respond as soon as the alarm condition occurs. A data change event would achieve a minimum delay here.
A data logger is another example where a data driven model is necessary. In the periodic model, short-duration events can be lost entirely, as they occur and disappear between polling cycles. In that case, a logger becomes a sampler on the data stream, giving approximate movement of the data point rather than accurate movement. In a data driven logger, each change of data value can be significant. The logger accurately captures short-duration and high-activity events. During periods of low activity, the logger will produce low output, thereby saving disk space.
A message passing database de-couples the clients from the data representation. Each client can maintain its own representation internally, and interacts through a well-defined message interface. Changes in the database itself will not affect the client unless the message contents must be modified.
In general, there is very little to recommend a passive message-based database over other choices. It performs poorly in those areas where a shared memory database does well, and performs poorly in those areas where an active message passing database does well.
In this model, the data is made available in a shared memory segment. Each client is again periodically driven, performing its processing at a pre-set time interval. If the processing is associated with a periodic computation, such as a PID controller, then this can be a good match. The data is available with very low overhead, and is current at the instant that the control algorithm is performed. It is, however, necessary for the client to lock the shared memory region with a semaphore. If there are many clients using the data, then any one client may need to wait for all other clients to finish before it is allowed to run. This can be alleviated somewhat by forcing all clients to make a local copy of the shared memory before performing their processing. In this case, the shared memory is only locked for a brief period during the copy. Unfortunately, if the control algorithm also writes values back to the shared memory, the client must lock, write and unlock the shared memory for each point that it changes. It is not acceptable for the client to simply re-write all of shared memory from its local copy since other processes may also have written to different portions of the data set. This leaves the writer of the client with the choice of a single lock, processing the data in place, or a lock-write-unlock sequence for every point written. The former is easier to code, and guarantees immediate access to the shared memory, but scales very poorly as more clients are added. Access latency can be the sum of the processing times for all other clients plus the polling interval for the current client.
If the client is performing event processing, it must keep a previous copy of the shared memory segment in order to perform a comparison between the current value and the previous value. If the client is also making a local copy of the data, this means that there are two local copies per client in addition to the master data set. The client then synthesizes events in exactly the same way as for the passive message-passing model.
Since this model uses a periodic scan technique, it suffers from the same limitations as with the passive message passing model. Polling rate "tuning" must still occur, and latency issues still apply. Clients that are poorly suited to the polling model (those that respond to data change events) will still behave poorly. The big advantage of shared memory versus a message-based database is the reduction in CPU cost to perform each poll. This alters the point of overload in the system, but does not alleviate any of the architectural drawbacks of a polled database. Processes such as GUIs may perform poorly, and data loggers may not produce correct results at all.
A shared memory database does not necessarily preserve the time ordering of data changes. This can only be detected by the client making two passes through the data set at each poll. On the first pass the set of changed valued is collected, and on the second pass this changed subset is sorted in time order. Only then can time-ordered events be synthesized. This can eliminate some of the CPU savings over a message-based database.
A shared memory database couples the clients very tightly to the data representation. Generally the client is linked against a library that encapsulates the data organization in the shared memory. If any change is made in the data organization, all clients must be re-compiled and re-linked against this library. The opportunity for severe damage due to version mismatches between the client and the database is high. This reduces the effectiveness of field maintenance, as the entire system must be upgraded at once, and usually increases application development times.
Shared memory does not generally work across a network, so expansion to multiple CPUs or geographically separate process areas is generally precluded. There are very expensive hardware solutions to shared memory on a network. These require synchronization delays as network traffic occurs, and will perform a certain amount of ad-hoc conflict resolution where multiple hosts alter the same data.
In general, a passive shared memory architecture is appropriate for small data sets where the scan time on the memory is small or for processes where computation is intended to be periodic and no scan is necessary. Shared memory databases are a poor choice for networked data sharing systems.
In this model, a message passing database offers a publish/subscribe service in which clients register an interest in part or all of the data set, and the database subsequently informs the clients when any of those points have changed. This transmission occurs asynchronously, meaning that the client must be prepared to handle unsolicited messages sent from the database as data values change.
In an active model, the client is normally able to be entirely passive until a data change event occurs. This means that so long as the data is not changing, the client is consuming no CPU resources. When data changes, only those clients interested in that particular change will require CPU resources. The time between a data change and the client being informed will typically be very short. The amount of processing performed by the client is limited to the processing required for that data point only.
If a client intends to be periodic in its computation, it collects the data changes as they occur into local storage, and performs its processing when a timer event occurs. The amount of time to accept and store the data change messages is largely wasted compared to a shared memory model, but guarantees that when the timer occurs the local data set will be current.
If more than one change is transmitted for a point before a client responds, the asynchronous message passing mechanism will queue these changes, and all changes will be transmitted to the client. In a system where there are sufficient resources in the steady state, a momentary burst of data will not cause a loss of events at the client.
An active message passing database demonstrates the same behaviour for bulk and point data as does the passive message passing database. The active database will, however, be able to pass on the bulk nature of its input data to its clients where more than one data change per message is received. Further, an active message passing database will offer better performance for point data than a shared memory database will.
An active message passing database provides the same de-coupling of client and database as with a passive message passing design. Client tasks can be modified independently from the database, providing a high degree of flexibility for post-installation maintenance.
If a client intends to be data driven, this database is model will produce the lowest CPU requirement and the lowest latency in response of any of the designs. This model is also the only one that will produce correct behaviour from a data logger during periods of high activity. User interface processes will be most responsive, and will demonstrate the least control latency. Where it is necessary to share user interfaces, control or data acquisition on a network, a message passing database is usually necessary. A message passing database is likely to be less efficient for tasks performing periodic computation.
An active shared memory database is really an optimization on a passive shared memory database. In this model, client tasks register an interest in changes in all or a portion of the data set. When any change occurs in that portion of the data set, the database informs the client through a simple trigger. This trigger generally carries no information beyond its own existence, since to carry actual data would make the database into a message passing database. There is usually only a single trigger for each client, or a single trigger per database subset per client. It should always be possible to find a trigger mechanism that is more efficient than an asynchronous message pass on a target operating system. In the extreme, the database could provide a trigger for each data point for each client. This is entirely impractical for large data sets as most operating systems impose a limit on the number and differentiation of triggers.
When a client receives a trigger, it knows to scan a portion of the shared memory appropriate to the trigger. At this point, the processing appears identical to the passive shared memory model. Since the trigger is shared among multiple data points, and multiple data points may have changed value, the data subset related to the trigger must be scanned for changes. If the client is data driven, then data change events must be synthesized. The time order of multiple changes cannot be preserved with a single-pass scan, and it is not possible to determine whether more than one change took place to a single value between the time that the trigger was received and the time that it was processed.
It is possible to include a "change" bit in shared memory for each data point for each client, but this implies either a dynamic shared memory organization to accommodate increasing numbers of clients, or an entirely static runtime environment in which the maximum number of clients is known. If more than one change occurs before the client processes the data, all but the last change will be lost. In order to even detect that a data loss has occurred, a second bit per client must be held per data point per client.
An active shared memory database exhibits many of the same limitations of the passive shared memory database. It does not inherently offer time ordering, does not function on a network, still requires a scan of some portion of the data set, and will potential produce incorrect results for such processes as data loggers.
An active shared memory database performs essentially as a passive shared memory database except that the latencies and CPU requirements are usually reduced at each client. Since a client does not need to access the shared memory until a change occurs, it can wait passively for a trigger from the database. At this time, all clients will perform their processing simultaneously, possibly causing contention for the shared memory area. It is important that all clients maintain at least one local copy of the data to keep the response of the system high.
An active shared memory database imposes a tight coupling between the database and the clients, exactly as with the passive shared memory database. This reduces flexibility in post-installation flexibility, and generally increases application development times.
In general, an active shared memory database is an improvement over a passive shared memory database for data driven clients. It will behave identically to a passive shared memory database for periodic clients. All of the limitations associated with shared memory databases apply here, with the exception that latencies can be reduced. An active shared memory database is appropriate for well-partitioned data sets where scan times can be kept small and for clients that are periodic in nature. They are not appropriate for use in a networked environment, where time order is important or where every data change is considered significant.