Columnar Database
While most databases distribute information in rows, columnar databases operate differently. Here, the data is structured in columns. What’s the point of this? Read on and find out what advantages these columnar databases offer.
How are Columnar Databases Structured?
Typically, databases are structured row by row. For each entry, the database management system (DBMS) creates a row. The fields containing the respective information are therefore listed one after the other. Relational databases in particular are based on this principle. Columnar databases flip this system over. With this system, each column represents an entry: the data for each entry is accordingly arranged vertically (and not horizontally as is the case for row-based variants).
An illustrative example of the row-based system is shown below:
Number | Last name | First name | Key |
---|---|---|---|
1 | Skywalker | Luke | 3FN-Z768 |
2 | Kenobi | Obi-wan | 7TR-K345 |
3 | Organa | Leia | 8NN-R266 |
The columnar database essentially turns this table on its side:
Number | 1 | 2 | 3 |
---|---|---|---|
Last name | Skywalker | Kenobi | Organa |
First name | Luke | Obi-wan | Leia |
Key | 3FN-Z768 | 7TR-K345 | 8NN-R266 |
On the hard drive itself, however, the data appears one-dimensional: it’s displayed one piece after the other. For row-based databases, this looks as follows:
1, Skywalker, Luke, 3FN-Z768; 2, Kenobi, Obi-wan, 7TR-K345; 3, Organa, Leia, 8NN-R266
Columnar databases also save the information one piece after the other. But this alternative orientation results in a different data sequence:
1, 2, 3; Skywalker, Kenobi, Organa; Luke, Obi-wan, Leia; 3FN-Z768, 7TR-K345, 8NN-R266
There are other alternatives to the conventional model besides columnar databases. NoSQL databases or their subcategory document stores as well as the related key-value databases are used increasingly often. Graph databases are also becoming more popular since this model is able to map highly networked data very effectively.
Applications of Columnar Databases
Relational systems based on rows are primarily used when lots of transactions have to be performed quickly. Writing, changing and deleting entries works very well with relational databases. Columnar databases are particularly useful when analyzing large volumes of data.
In many applications—for example, in research and many other fields—data is analyzed on a continuous basis. This is much faster with columnar DBs. The reason for this is that fewer hard drive accesses are required. A category’s data is found close to each other. To read and analyses a dataset, only one block needs to be loaded. The entire database doesn’t need to be read in order to gather up the distributed information.
Advantages and Disadvantages of a Columnar Database
Columnar database management systems play to their strengths when analyzing large volumes of data—such as big data. Since hard drive access constitutes the respective bottleneck when reading any database, and this access is more efficient with a column-oriented DBMS, the columnar variant is really effective here.
For transactional applications, the accesses typically work in a different way. In these cases, new information has to be distributed across the entire database. A columnar database would perform this task more slowly than the conventional model.
Another advantage of columnar databases is the possibility of compressing. Data in one column is always of the same type, for example, a string or an integer. Since all entries of a type are found close together, they can be compressed more efficiently.
The Most Well-Known Columnar Databases
Columnar database management systems have been in use for a relatively long time, but the number of available implementations is still limited. The demand for relational databases is simply greater. Nonetheless, a few systems have become established.
- Amazon Redshift: As part of Amazon Web Services (AWS), Redshift offers a column-based data warehouse for big data.
- MariaDB ColumnStore: The open-source DBMS MariaDB (fork of MySQL) offers a combination of a columnar and relational database with ColumnStore.
- SAP HANA: The SAP development platform likewise uses a combination of a relational and columnar database.
- Apache Cassandra: The free software is built on Apache Hadoop and is compiled in Java.
- MonetDB: This open-source software was developed with a special focus on data mining.