November 25, 2011

Active and Passive Data Store

By default, all data is stored in-memory to achieve high speed data access. However, not all data is expected to be read or updated frequently and needs to reside in-memory, as this increases the required amount of main memory unnecessarily. This so-called historic or passive data can be stored in a specific passive data storage based on less expensive storage media, such as SSDs or hard disks, still providing sufficient performance for possible accesses at lower cost. The dynamic transition from active to passive data is supported by the database, based on custom rules defined as per customer needs.We define two categories of data stores: active and passive.

We refer to active data when it is accessed frequently and updates are expected, e.g. access rules. In contrast, we refer to passive data when this data either is not used frequently and neither updated nor read. Passive data is purely used for analytical and statistical purposes or in exceptional situations where specific investigations require this data. For example, tracking events of a certain pharmaceutical product that was sold five years ago can be considered as passive data.

Why is this feasible? Firstly, from the business perspective, the pharmaceutical is equipped with a best-before data of two years after its manufacturing date, i.e. even when the product is handled now, it is no longer allowed to sell it. Secondly, the product was sold to a customer four years ago, i.e. it left the supply chain and is typically already used within this timespan. Therefore, the probability that details about this certain pharmaceutical are queried is very low. Nonetheless, the tracking history needs to be conserved by law regulation, e.g. to prove the used path within the supply chain or when selling details are analyzed for building a new long-term forecast based on historical data. This example gives an understanding about active and passive data. Furthermore, introducing the concept of passive data comes with the advantage to reduce the amount of data, which needs to be accessed in real-time, and to enable archiving. As a result, when data is moved to a passive data store it consumes no longer fast accessible main memory and frees hardware resources. Dealing with passive data stores involves the need for a memory hierarchy from fast, but expensive to slow and cheap. A possible storage hierarchy is given by: memory registers, cache memory, main memory, flash storages, solid state disks, SAS hard disk drives, SATA hard disk drives, tapes, etc. As a result, rules for migrating data from one store to another needs to be defined, we refer to it as aging strategy or aging rules. The process of aging data, i.e. migrating it from a faster store to a slower one, is considered as background tasks, which occurs on regularly basis, e.g. weekly or daily. Since this process involves reorganization of the entire data set, it should be processed during times with low data access, e.g. during nights or weekends.

Please also see our podcast on this technology concept.

No comments:

Post a Comment