December 20, 2011

SQL Interface on Columns and Rows

Business operations have very diverse access patterns. They include read-mostly queries of analytical applications and write-intensive transactions of daily business. Further, all variants of data selects are present including point selects (e.g., details of a specific product) and range selects, retrieving sets of data (e.g., from a specified period like sales overview per region of a specific product for the last month).

Column and row-oriented storage in HANA provides the foundation to store data according to its frequent usage patterns in column or in row-oriented manner to achieve optimal performance. Through the usage of SQL, that supports column as well as row-oriented storage, the applications on top stay oblivious to the choice of storage layout.

Please also see our podcast on this technology concept.

December 19, 2011

Analytics on Historical Data

Analytics on historical data
All enterprise data has a lifespan: depending on the application, a datum might be expected to be changed or updated in the future in many cases, in few cases, or never. In financial accounting, for example, all data that is not from the current year plus all open items from the previous year can be considered 'historic data', since they may no longer be changed.

In HANA, historical data is instantly available for analytical processing from solid state disk (SSD) drives. Only active data is required to reside in-memory permanently.

Please also see our podcast on this technology concept.

November 25, 2011

Active and Passive Data Store

By default, all data is stored in-memory to achieve high speed data access. However, not all data is expected to be read or updated frequently and needs to reside in-memory, as this increases the required amount of main memory unnecessarily. This so-called historic or passive data can be stored in a specific passive data storage based on less expensive storage media, such as SSDs or hard disks, still providing sufficient performance for possible accesses at lower cost. The dynamic transition from active to passive data is supported by the database, based on custom rules defined as per customer needs.We define two categories of data stores: active and passive.

We refer to active data when it is accessed frequently and updates are expected, e.g. access rules. In contrast, we refer to passive data when this data either is not used frequently and neither updated nor read. Passive data is purely used for analytical and statistical purposes or in exceptional situations where specific investigations require this data. For example, tracking events of a certain pharmaceutical product that was sold five years ago can be considered as passive data.

Why is this feasible? Firstly, from the business perspective, the pharmaceutical is equipped with a best-before data of two years after its manufacturing date, i.e. even when the product is handled now, it is no longer allowed to sell it. Secondly, the product was sold to a customer four years ago, i.e. it left the supply chain and is typically already used within this timespan. Therefore, the probability that details about this certain pharmaceutical are queried is very low. Nonetheless, the tracking history needs to be conserved by law regulation, e.g. to prove the used path within the supply chain or when selling details are analyzed for building a new long-term forecast based on historical data. This example gives an understanding about active and passive data. Furthermore, introducing the concept of passive data comes with the advantage to reduce the amount of data, which needs to be accessed in real-time, and to enable archiving. As a result, when data is moved to a passive data store it consumes no longer fast accessible main memory and frees hardware resources. Dealing with passive data stores involves the need for a memory hierarchy from fast, but expensive to slow and cheap. A possible storage hierarchy is given by: memory registers, cache memory, main memory, flash storages, solid state disks, SAS hard disk drives, SATA hard disk drives, tapes, etc. As a result, rules for migrating data from one store to another needs to be defined, we refer to it as aging strategy or aging rules. The process of aging data, i.e. migrating it from a faster store to a slower one, is considered as background tasks, which occurs on regularly basis, e.g. weekly or daily. Since this process involves reorganization of the entire data set, it should be processed during times with low data access, e.g. during nights or weekends.

Please also see our podcast on this technology concept.