6. Design Considerations

A number of common situations should be considered when choosing a data access model.

  1. It is often possible to model shared memory to make the posting of bulk data very simple (essentially nothing more than a semaphore lock and memory copy). By modeling the shared memory to mirror the input data organization, the database forces its clients to perform interpretation of the data, and requires them to maintain knowledge of the data organization in the physical I/O hardware. This is a very inflexible approach, and should be discouraged. A data sharing mechanism should provide sufficient abstraction of the data that changing such things as the physical I/O hardware or engineering unit conversion will not affect the users of that data.

  2. In the message-passing model, the usefulness of a polled database is very limited. One of the primary advantages of the messaging model is the ability to express data changes as events. By polling the database this advantage is lost. Polling may work well for periodic tasks, but is a very poor model for data driven tasks. A polled shared memory system requires less CPU time for the poll due to efficient memory reads, but the net result is similar. As the number of data points in the system grows, the time spent in polling will also grow. If a polled system is required, then a shared memory model will result in lower CPU usage.

  3. Partitioning the memory into sections can mitigate polling time in a shared memory system, where each section pertains to a particular process area or type of data. Each data area can be individually locked with a semaphore. This can be very effective where tasks only require one data area at a time. If tasks require more than one data area, the order in which semaphores are locked is critical, as two tasks can get into a semaphore deadlock. The need to enforce semaphore lock ordering reduces flexibility in the code and application substantially, and requires that all tasks be aware of the global ordering, even if they do not require all of the data. This causes a degree of coupling among client tasks that is generally considered to be undesirable.

  4. A data driven system is characterized by events associated with a change of data value. In a message passing model, the data item and value can be explicitly identified in the message, so that the client does not need to perform a sweep of all or part of the database. In the shared memory model, the data item and value are not identified in the signal sent to the client. The client must discover which data has changed. In the case of small data sets, the polling time is negligible, but in larger data sets a data reduction scheme must be employed to make this practical. Data reduction schemes consist of data set partitioning and per-client flagging. The flagging mechanism helps the client reduce the search space, but does not tell the client how many data items have changed. At best this acts as a constant divisor on the search time required. As data set sizes become larger, the shared memory model becomes less desirable. As the number of partitions increases to accommodate larger data sets, the per partition per client resources required for notification overwhelm the system, resulting in an artificial limit on the partitioning available.

  5. When using a data driven shared memory architecture, the data source must signal all clients of the shared memory partition where that a change has occurred, potentially including clients that are not in fact interested in that particular change. This causes unnecessary processing to search the data set. If the data set can be perfectly partitioned, where any client that uses data from a partition uses all of the data from a partition, then this effect is eliminated. This may be practical for very small data sets, but is effectively impossible for large data sets with many clients with differing data requirements.

  6. When using a data driven shared memory architecture with a point data source, every data change will cause a signal to the clients. This signal subsequently causes a search of the data set in each client. Depending on the speed of data changes coming from the point data source, this wasted time could be prohibitive. This scenario is almost guaranteed to be more efficient using a message passing mechanism.

  7. When a task is performing periodic activity, it typically expects to have a consistent data set - a data set where all values are related in time. This will be true in either the polled or the data driven shared memory cases. In a message passing system, a separate trigger value must be used to signal the task that all data in the data set has now been transmitted.

  8. When a data driven system is used for periodic control, data pertaining to the control algorithm is being constantly supplied to the client, and essentially ignored until the next processing time arrives. If the data is being delivered at a higher frequency than the processing time, unnecessary message passing occurs.

  9. In a polled system, the latency of response to a value change is determined by the polling rate, and is stochastic. If the polling rate is low, then latency is high. This can result in delayed event response, and if the data rate is higher than the polling rate, missed data. Added complexity and hard limits on the number of clients can be added to determine that data has been missed, but it is not possible to determine what has been missed.

  10. Since shared memory systems offer the data to all processes in their own address space, it is tempting for processes to simply lock the shared memory, perform some activity, and then unlock the shared memory. This can have strong detrimental effects on overall system performance, since all other tasks must wait until the others have finished their processing before they can begin. Having each client task create a private copy of the data set, only keeping the common data locked long enough for the copy to be performed can solve this problem. During processing each task must lock and write the global data set once for each value to be changed. It is not feasible for a task to copy its private data set back to the global data set as other tasks may have modified other portions of that global data in the meantime. This is further aggravated if a bulk data source may overwrite the entire global data set at any time. Separating the data set into read-only and write-only data can reduce this problem, but this imposes very strong limitations on system design and data usage.

  11. While both shared memory and message passing will respond similarly under steady state conditions with sufficient resources, they respond differently to burst flooding conditions. In the case where the burst data rate exceeds the processing time available, shared memory architectures will lose data changes. Message passing systems should imply a queue mechanism that allows the system to work through a bust of activity with a lower likelihood of missing values (corresponding to filling the queue). Missed values on analog points are often considered harmless, but missed values on digital points may not be. In addition, it is common that the data generated during a burst of activity is associated with a process upset and is more likely to be of interest than data in the steady state. This may be the worst time to miss data.

  12. Shared memory cannot grow as more storage is required. Once created, a shared memory segment is fixed in size. This implies that the data set is completely known, or is maximally bounded, before the application starts, and all of the space for the maximum data set is allocated even if it is not used. A message passing database can generally grow without pre-set bounds. The implementation need not make assumptions about minimum or maximum sizes. While schemes involving allocation of extra shared memory segments are possible, they are extremely involved, and rarely if ever used in practice.

  13. Most shared memory implementations use fixed-length structures to represent the data internally. This works well for integer and floating point data types, but makes string information difficult to deal with. In addition, data such as tag names and description fields must be pre-allocated and limited in length. The use of variable-length data fields requires the implementation of a private heap management system using a portion of the shared memory. Again, shared memory must be over-allocated in order to ensure that it does not run out. Memory fragmentation issues apply, suggesting a garbage collection mechanism with stop-and-copy features. Access to these components is more complex as a result.

Copyright 1995-2002 by Cogent Real-Time Systems, Inc.