In-Memory Computing Blog: October 2011

October 31, 2011

Single and Multi-Tenancy

To achieve the highest level of operational efficiency, the data of multiple customers can be consolidated onto a single HANA server. Such consolidation is key when HANA is provisioned in an on-demand setting, a service which SAP plans to provide in the future. Multi-tenancy allows making HANA accessible for smaller customers at lower cost, as a benefit from the consolidation.
Already today HANA is equipped with the technology to enable such consolidation while ensuring that no critical resources are contending between the customers sharing a server and while ensuring a reliant and highly-available storage of the customers data at the hosting site.

October 27, 2011

Lightweight Compression

Compression defines the process of reducing the amount of storage place needed to represent a certain set of information. Typically, a compression algorithm tries to exploit redundancy in the available information to reduce the amount required storage. The biggest difference between compression algorithms is the amount of time that is required to compress and decompress a certain piece or all of the information. More complex compression algorithms will sort and perform complex analysis of the input data to achieve the highest compression ratio. For in-memory databases, compression is applied to reduce the amount of data that is transferred along the memory channels between main memory and CPU. However, the more complex the compression algorithm is the more CPU cycles it will take to decompress the data to perform the query execution. As a result in-memory databases choose a trade-off between compression ration and performance using so called light-weight compression algorithms.

An example for a light-weight compression algorithm is dictionary compression. In dictionary compression all value occurrences are replaced by a fixed length encoded value. This algorithm has two major advantages for in memory databases: First, it reduces the amount of required storage. Second, it is possible to perform predicate evaluation directly on the compressed data. As a result query execution becomes even faster with in-memory databases.

Please also see our podcast on this technology concept.

October 24, 2011

Multi-core and Parallelization

In contrast to the hardware development until the early 2000 years, todays processing power does no longer scale in terms of processing speed, but degree of parallelism. Today, modern system architectures provide server boards with up to eight separate CPUs where each CPU has up to twelve separate cores. This tremendous amount of processing power should be exploited as much as possible to achieve the highest possible throughout for transactional and analytical applications. For modern enterprise applications it becomes imperative to reduce the amount of sequential work and develop the application in a way that work can be easily parallelized.

Parallelization can be achieved at a number of levels in the application stack of enterprise systems – from within the application running on an application server to query execution in the database system. As an example of application-level parallelism, assume the following: Incoming queries need to be processed by EPCIS (Electronic Product Code Information Services) repositories in parallel to meet response time thresholds. Processing multiple queries can be handled by multi-threaded applications, i.e. the application does not stall when dealing with more than one query. Threads are a software abstraction that needs to be mapped to physically available hardware resources. A CPU core can be considered as single worker on a construction area. If it is possible to map each query to a single core, the system’s response time is optimal. Query processing also involves data processing, i.e. the database needs to be queried in parallel, too. If the database is able to distribute the workload across multiple cores a single system works optimal. If the workload exceeds physical capacities of a single system, multiple servers or blades need to be involved for work distribution to achieve optimal processing behavior. From the database perspective, partitioning datasets supports parallelization since multiple cores across servers can be involved for data processing.

Please also see our podcast on this technology concept.