In-Memory Databases
Digital communication generates large amounts of data. This is a great opportunity for companies that work with big data. However, the more data a company has to work with, the greater the challenge becomes to recognize connectivity and patterns. IT solutions and systems are ever more in demand to support companies in evaluating the huge amounts of information they receive. Data analysis using traditional databases is no longer sufficient to store, retrieve, and process extremely large collections of data. When classic databases reach their limits, in-memory databases can be of use.
What are in-memory databases?
An in-memory database (sometimes abbreviated to db) is based on a database management system that stores its data collections directly in the working memory of one or more computers. Using RAM has a key advantage in that in-memory databases have significantly faster access speeds. Stored data is then available very quickly when needed.
The working memory is also called RAM (random access memory), and contains all programs, program parts, and any data required for running these programs. After switching off the computer, all the temporary data will be lost.
How do in-memory databases work?
In-memory dbs store large amounts of data and provide a range of analysis results. But how exactly does big data storage work, and what kind of technology makes big data storage possible?
How your data is stored
When storing data with an in-memory database, a distinction is made between column-oriented and row-oriented data storage, whereby some database systems use both methods of data storage. Row-oriented databases arrange the collected data records together in one row. For example, if the values “name, city, and country” are stored, the data would be arranged as follows: name 1, city 1, country 1, name 2, city 2, country 2. In a column-based storage, the data is assigned to its corresponding categories in this way: name 1, name 2, city 1, city 2, country 1, country 2.
The storage format for column-based data storage is called columnar format. By storing data with identical values together, the system minimizes the number of existing data. Storage space and transmission times are therefore reduced. The analysis performance of in-memory databases has also improved over time, seeing as only specific columns need to be analyzed, and not all of them. This form of data evaluation is called columnar projection.
Technology for big data storage
The concept of in-memory databases is nothing new. The foundations of database technologies were developed as early as the mid-1980s. However, the IT systems back then did not have the required processing capacity, so that earlier concepts of in-memory databases could not be used. Modern computer architectures such as data warehousing, 64-bit technology, and multi-core processors finally made it possible for in-memory databases to be put to use.
- In-memory databases usually belong to a data warehouse. These database systems collect and compress data from various sources, save it for the long term, and prepare it for analysis.
- With 64-bit technology, it is possible to increase the capacity of the main memory up to the terabyte range. As a result, in-memory DBs have grown in size.
- With multi-core processors, multiple processor cores work in one chip, resulting in better processing performance and higher data performance. The data performance shows the net data volume transferred.
What are the steps of an in-memory database?
Recurring, identical processes occur during the running of in-memory databases. An in-memory database backs up data in the following way:
- Start the database: when the database is started, the system loads the entire dataset from the hard disk into the working memory. This means that no data has to be loaded while the database is running.
- Readjusting data: the database will review and adjust data frequently, so if data changes it remains up to date.
- Transaction log backups: current changes are recorded in transaction logs. If an error occurs, the database can be restored to the time before the error occurred. This process is called "rollforward".
- Data processing: data is processed according to the ACID principle (atomicity, consistency, isolation, and durability), as it is in traditional databases. The acronym ACID describes the ideal properties of processes in database management systems.
- Database replication: this step continuously copies data from the database to a computer or server as a backup.
Pros and cons of in-memory databases
The advantage of in-memory databases is that accessing data is much faster than with a traditional database, although this is also the root of the greatest disadvantage of in-memory databases, because it is not possible to permanently store data on the RAM. Here is a comparison of the advantages and disadvantages of in-memory databases.
Pros of in-memory databases
The biggest advantage of using in-memory databases is the significantly higher access speeds resulting from the use of RAM. This also leads to a quicker data analysis. However, it’s not only the reduced fetch time that optimizes data analysis. In-memory DBs make the evaluation of structured and unstructured data possible from any system. Until now, companies and software solutions have been faced with the challenge of storing and processing large amounts of unstructured data, such as texts, images, or audio and video files.
By using distributed data infrastructures, unstructured data can be stored in an in-memory database, in which several processing units (computers, processors, etc.) work on a common task in parallel and distribute it to different server clusters. This results in both a higher storage capacity, faster processing, and better transfer speed of the unstructured data.
Cons of in-memory databases
The use of an RAM means faster access on the one hand, but also brings with it a key disadvantage: the data stored is only temporary. If the computer system should crash, all temporarily-stored data would be lost. To combat this, the following methods have been established:
- Snapshot files: there are specific moments, such as in routine intervals or before shutting the system down, when the current version of the database is saved. An important criticism of this is that all data added after the last so-called snapshot would be lost in the case of a crash – depending on how long each interval is, this could be a lot of data.
- Transaction protocol securing: noting change in transaction logs is an integrated process used as a means of security. Used in combination with the snapshot process, the transaction protocol can help restore a system after a crash.
- Replication: in-memory databases already largely have the function of storing an exact copy of the database on a conventional hard disk. In the event of a failure, the stored database can be accessed.
- Non-volatile RAM memory: a non-volatile RAM memory is able to keep files available for retrieval even after the system has been restarted.
Another disadvantage due to RAM use is that the computer itself no longer has as much RAM available. Grid computing can be a solution to this limitation. Grid computing connects many different computers, and for a computer to connect to this link, specialized software must be installed on the computer. By merging the unused capacities, a virtual, high-performance computer is created.
In-memory databases vs. traditional databases
A database is generally understood to be a collection of information that is available electronically. Traditional databases only store structured data. Structured data means clearly organized and defined data fields in concrete data records. The data records are arranged in tables, where each data field represents a different attribute and is named accordingly. The big data movement pushed this model to its limits – its weaknesses lie in the storage and processing of large amounts of data. Another weakness is its lack of adaptability. Unstructured data, such as images and documents in natural language, cannot be stored and evaluated.
In-memory database | Traditional database | |
---|---|---|
Data types | Structured and unstructured | Structured |
Access speed | Real time | Slow |
Security | Unsafe | Safe |
When would an in-memory DB make sense for a business?
Once you have had a look at the advantages and disadvantages of in-memory databases, and made a comparison to traditional databases, you can consider which database management system (DBMS) is suitable for your company. If you work with big data, the decision has already been made for you – only an in-memory DB would suit your needs. However, the in-memory database may also be the right choice in other cases.
An in-memory database is the right DBMS for you, if:
- You have a lot of data collected
- You need fast and frequent access to your data
- Your existing database management systems or servers are overloaded
- Data persistence is not your highest priority
- Loss of data (or at least the possibility of this) is workable for you
Examples of in-memory databases
Among the best-known in-memory databases are SAP HANA, and Oracle TimesTen. If your company is looking for enterprise software with a wide range of functions, the SAP and Oracle solutions are the most common. Both database management systems have the highest possible performance. In the following we will look at what distinguishes them and what their practical application in a company would be.
SAP HANA (high performance analytic appliance)
The SAP HANA (high performance analytic appliance) in-memory database is a combination of hardware and software. The software was specially developed by SAP, while the hardware (server) comes from ten different manufacturers. Unlike other in-memory databases, SAP HANA doesn’t store data temporarily, but instead stores it on the working memory permanently, and saves the data using transaction logs.
Transaction and analysis processing in a common database makes it possible to process information in real time. SAP HANA can be implemented on an enterprise server as well as in a cloud, and so reduces the potential challenges to your company's IT structure. In addition, costs for previous data management methods are minimized.
Oracle TimesTen
The Oracle database has a lot in common with SAP. Data processing is also done in real time and the application can be executed via a server, or as a cloud service. In contrast to the SAP database, the Oracle TimesTen software and hardware both originate from Oracle itself.
As such, TimesTen is a purely “Oracle” application. The resulting advantage for the user is that in case of an error, you can act internally and your company is not dependent on different hardware and software companies. Oracle does not store the collected data exclusively in memory – data that doesn’t require high performance processing can be stored on the hard disk or a flash disk.
In-memory database comparison
The functions of SAP HANA and Oracle TimesTen are pretty much the same. Both databases have good benefits to offer:
- Faster data processing
- Company reorientation by means of innovative applications
- Increased flexibility, activity, and adaptability
The following table summarizes the differences and similarities between the two in-memory databases:
Oracle TimesTen | SAP HANA | |
---|---|---|
Data storage | Storage of data on working memory, hard disk and flash diskPermanent storage of data in working memory | Permanent storage of data in working memory |
Software and hardware | Software and hardware both from Oracle | Software from SAP, with hardware from different manufacturers |
Installation | Hardware and cloud services | Hardware and cloud services |
Database | Column-based database | Column-based database |
Enterprise Information Management (EIM) | Optimization of business processes through various data management functions | Optimization of business processes through various data management functions |
Big Data | Developed for big data | Developed for big data |
Data processing | Real time | Real time |
Data analysis | Takes place in the database | Takes place in the database |
Challenges you may face with in-memory databases
As digitalization progresses, the already huge amounts of data will continue to grow. The developers of in-memory databases are now faced with having to constantly develop previous systems further. The following tasks need to be looked at:
- The collection of data from a growing number of sources
- IT structure simplification, and the reduction of response times and analysis speeds at the same time
- How to gain further insights from data analysis, and support companies in their decision-making processes
- The development of applications that are even more geared to the challenges of constant digital change
Summary
In-memory databases have established themselves as a successful form of technology for storing and processing data. They enable companies that have to handle large amounts of data to analyze big data as quickly as possible and access it at any time. However, the effective use of in-memory databases is only possible if the data storage is not exclusively an in-memory database, but only if standardized systems for data backup are integrated into the in-memory database processes.