October 27, 2011

Lightweight Compression

Compression defines the process of reducing the amount of storage place needed to represent a certain set of information. Typically, a compression algorithm tries to exploit redundancy in the available information to reduce the amount required storage. The biggest difference between compression algorithms is the amount of time that is required to compress and decompress a certain piece or all of the information. More complex compression algorithms will sort and perform complex analysis of the input data to achieve the highest compression ratio. For in-memory databases, compression is applied to reduce the amount of data that is transferred along the memory channels between main memory and CPU. However, the more complex the compression algorithm is the more CPU cycles it will take to decompress the data to perform the query execution. As a result in-memory databases choose a trade-off between compression ration and performance using so called light-weight compression algorithms.

An example for a light-weight compression algorithm is dictionary compression. In dictionary compression all value occurrences are replaced by a fixed length encoded value. This algorithm has two major advantages for in memory databases: First, it reduces the amount of required storage. Second, it is possible to perform predicate evaluation directly on the compressed data. As a result query execution becomes even faster with in-memory databases.

Please also see our podcast on this technology concept.

No comments:

Post a Comment