MySQL Storage Engines

One of the greatest things about MySQL, other than being free, widely supported and fast, is the flexibility of choosing different storage engines for different tables.

Overview

MyISAM:
The default engine. No transactions support, average data reliability. Offers great performance for read heavy applications. Most web services and data warehousing applications use MyISAM heavily.

HEAP:
All in-memory. Very fast for data retrieval, however due to being stored only in memory – all data is lost on shutdown. Great for temporary tables.

Archive:
Used for storing large amounts of data without indexes in a small footprint.

Merge:
Collection of MyISAM tables logically merged together to provide a single view.

InnoDB:
Transaction-safe storage engine, best suited for write heavy environments thanks to row-level locking. Offers good built-in recovery and solid data reliability. InnoDB engine was acquired by Oracle on 2005.

NDB:
A clustered engine – data is automatically split and replicated across several machines, a.k.a data nodes. Best suited for applications that require high performance lookups with the highest possible degree of uptime and availability. Originally designed by Ericsson for the Telco market, NDB offers the highest levels of data reliability (99.999%). NDB works well in read heavy environments. For write heavy environments with multiple concurrent writes, consider InnoDB.

The biggest disadvantadge of NDB, is that by design your entire database must fit in memory. If your database size times 2 is too big to fit in memory, NDBCluster is not for you.To make it easier to follow the unique characteristics of each storage engine, I created this magic
quadrant diagram:

 

 

Examples:

Below are some examples of using the best storage engine for different tasks:

Search Engine – NDBCluster

Web stats logging – Flat file for the logging with an offline processing demon processing and writing all stats into InnoDB tables.

Financial Transactions – InnoDB

Session data – MyISAM or NDBCluster

Localized calculations – HEAP

Dictionary – MyISAM

Important notes about MyISAM tables:

1. Your tables will get corrupted eventually! Plan accordingly.

Tar the entire database directory daily and setup MySQL replication to a slave for an up-to-the-minute live backup.

2. Turn on auto-repair by adding this flag to your my.cnf file:
myisam-recover=backup,force Or consider running a check-all-tables-and-email-me cronjob daily: See our MySQL Table Maintenance automation.

3. Super fast for read (select) operations.

4. Concurrent writes lock the entire table. Switch everything to offline processing where you can, to serialize writes without taking the database down. (Offline processing is golden and applies to all table types)

Important notes about HEAP/Memory tables:

While this type of table offers super fast retrieval, it only works well for small temporary tables. If you try to load too much data into a Memory table, MySQL will start swapping information to disk and then you lose the benefits of an all-memory storage.

Important notes about InnoDB tables:

1. ACID transactions support. Row-level locking (compared to table level locking with MyISAM) means faster concurrent writes.

2. Doing a “SELECT Count(*) FROM table” without specifying any indexes is very slow on InnoDB and requires a full table scan. (With MyIsam this operation doesn’t cost anything because MyIsam stores an internal record counter with each table).

If you need to “SELECT COUNT(*)” often on InnoDB tables, create MySQL insert/delete triggers that will increment/decrement a counter whenever records are added or deleted from the table.

3. Backup:

Doing a tar/rsync backup where you simply copy all files is not possible with InnoDB.

MySQLDump backup is too slow with InnoDB. (If you insist on using it, turn on these flags: –opt –compress)

The only viable fast backup option, which can also be used to populate new slave machines, is InnoDB Hot Backup.

4. Recovery:

InnoDB has built-in recovery that works 99% of the times automatically. Never try to move .frm or .ibd files around as a way of “helping” the database to recover. If the built-in recovery doesn’t work, switch to your slave server and restore the primary from backup.

5. LOAD DATA INFILE is too slow with InnoDB. Consider using MyIsam tables for LOAD DATA operations.

6. InnoDB is less forgiving than MyIsam when it comes to queries on non indexes. InnoDB is going to “School” you into ensuring every single query and update statement runs on an index. Issue no index queries and you’ll pay dearly in execution time.

7. Never ever change my.cnf INnoDB log file size while the database is running. You’ll corrupt the log sequence number beyond repair.

8. To maximize InnoDB MySQL database performance, start with these my.cnf settings:

innodb_open_files = 500
innodb_file_per_table
innodb_buffer_pool_size = 250M
innodb_flush_log_at_trx_commit = 2
innodb_thread_concurrency =8
innodb_lock_wait_timeout = 500
interactive_timeout = 20
back_log = 75
table_cache = 300
thread_cache = 32
thread_concurrency = 8
wait_timeout = 30
connect_timeout = 10

9. If InnoDB crashes and the built-in recovery mechanism is unable to roll-back transactions, your database will not start. This is very important to understand. With MyISAM, even if a table gets corrupted, you can still start the database and everything will work normally. InnoDB will simply refuse to start until you restore the entire database from backup. Make sure you understand this principle and backup religiously.

Scalability:

Every successful web application eventually outgrows the capacity and throughput of a single database machine.

At that point you typically have two options – Replication or NDBCluster.

As always the choice depends on the needs of your application.

For read-heavy environments, use an NDBCluster or setup replication for n MyISAM slave read-only machines.

For write-heavy environments, InnoDB on an active/passive replication setup is typically the best choice. You may also want to experiment with an NDBCluster. An NDBCluster is generally going to be slower than InnoDB in
write-heavy environments, but it offers a higher level of availability.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: