InnoDB: Why Store Data in One File?

InnoDB: Why Store Data in One File?

Profile picture for user LukasVileikis
Lukas Vileikis
25 March 2021

If you’ve found yourself working with MySQL for quite some time, you probably began to notice some patterns. One of those patterns is that MySQL has a couple of storage engines under its hood - one of the most popular choices is InnoDB. InnoDB is widely known to store all of its data and indexes inside of one file - ibdata1. In this blog post, we will look into this phenomena a little bit deeper.

Where and Why Does InnoDB Store Data?

As already mentioned above, the InnoDB engine usually stores its data and indexes in a file called ibdata1. The ibdata1 file can be found in the /var/lib/mysql directory. The file is a system tablespace for the entire InnoDB infrastructure - here’s how it looks like from the inside (image by Percona’s CTO Vadim Tkachenko):

InnoDB Internals

When the InnoDB file per table option is disabled, ibdata1 normally houses many classes of information including:

  • The data of InnoDB tables
  • The indexes of InnoDB tables
  • Multiversioning Concurrency Control (MVCC) Data - MVCC ensures that if someone is reading from a database simultaneously as someone else is writing data to the database users will still see a consistent piece of data.
  • Rollback segments - in other words, the storage area consisting of the undo logs.
  • Undo tablespace - the undo tablespace consists of undo logs. Undo logs exist within undo log segments which are contained within rollback segments.
  • Table metadata.

That means InnoDB’s ibdata1 stores all data and indexes of all tables running the InnoDB storage engine - we will now look how to clean up the InnoDB infrastructure.

Cleaning Up the InnoDB Infrastructure

If you want to clean up the entire infrastructure and make the file store only metadata related to InnoDB tables, you can do that too:

  1. Take a copy of all of your MySQL databases.
  2. Remove all of your MySQL databases excluding mysql and information_schema.
  3. Flush all remaining transactional changes from the log files by running SET GLOBAL innodb_fast_shutdown=0; and shut down MySQL.
  4. Modify my.cnf - add the innodb_file_per_table option, switch the InnoDB flushing method to O_DIRECT (in comparison with, for example, O_DSYNC, O_DIRECT is more data consistent, but slower - this setting is the recommended setting on Linux machines - bear in mind that O_DIRECT is not available on Windows machines) and make the innodb_log_file_size 25% of your innodb_buffer_pool_size.
  5. Delete all of the files associated with InnoDB (ibdata1 and ib_logfile* files), then restart MySQL - once you do, these files will be recreated.

Now you can re-import your files into MySQL. Once you do, ibdata1 will still grow but it will only contain table metadata because each InnoDB table will exist outside of ibdata1.

Can I Run OPTIMIZE TABLE?

If you’re familiar with how MySQL operates, you might know that you can also use an OPTIMIZE TABLE statement to reorganize the physical storage of table data and associated index data to reduce storage space and optimize performance - you might be wondering if you can run OPTIMIZE TABLE in InnoDB in such a scenario? You can, but it is worth noting that running OPTIMIZE TABLE in such a scenario could only make matters worse because contiguous data and index pages might be appended to the file. Enabling innodb_file_per_table would help a little bit - if you ran OPTIMIZE TABLE query on a table with innodb_file_per_table enabled, it would produce .frm and .ibd files inside of MySQL’s datadir (e.g for a table called demo_table you would have demo_table.frm and demo_table.ibd), but bear in mind that the data dictionary entry for every table would remain inside of the InnoDB tablespace (the ibdata1 file) - if you want to make your InnoDB tablespace store only metadata related to MySQL, follow the steps outlined above.

Backing Up Your Data with Backup Ninja

InnoDB usually stores data in one file. With that being said, you have far more than one option when dealing with backups with Backup Ninja! Backup Ninja can help you schedule your backups, schedule all kinds of backups (e.g incremental or partial backups) and do a whole bunch of other things!

Backup Ninja is not limited to MySQL either - the technologies that can be used in conjunction with the tool include MySQL, PostgreSQL, MariaDB, MongoDB, Percona and TimescaleDB. Backup Ninja also lets you store your backups on-premises or to the cloud provider of your choice.

Alongside with performing database backups, Backup Ninja can also help you to backup your files and directories while at the same time providing you with an enhanced restore feature to support individual file restoration:

Scheduling Backups

If you feel the need to back up your data, make sure to give it a try.

Summary

InnoDB is one of those storage engines in MySQL that uses one file to store a bunch of different things - while dealing with this file can get pretty complex, it really isn’t as daunting as it might seem. If you want to make your ibdata1 file associated with the InnoDB engine only store metadata related to the engine, consider cleaning up the entire InnoDB tablespace. If you want to know how to back up your data with Backup Ninja, consider giving it a try too - do note that Backup Ninja offers a free subscription tier that allows you to use one backup per agent per day and one backup restore agent per day too.

Tags