Notes from the
Maximum Velocity MySQL
talk by Jay Pipes

On Tuesday, 3 April 2007, I attended a three-hour class entitled, "Maximum Velocity MySQL" in Omaha, NE. The class was taught by Jay Pipes, MySQL Community Relations Manager and co-author of Pro MySQL. From the advertisement, the class covered:

  • How to diagnose MySQL database performance bottlenecks using a variety of tools.
  • How to identify and FIX problem SQL code.
  • Best practices for writing streamlined, efficient SQL code that takes advantage of MySQL's strengths.
  • About advanced index concepts.
  • How to build an effective indexing strategy.
  • How to choose appropriate storage engines for different types of data.
  • How to adjust MySQL server parameters to squeeze the most performance out of your application.
  • How to plan for growth, in your code and in your index strategy.
  • How you can receive automatic preemptive advice & code for improving performance!

Slides for this presentation are here (680KB PDF). Below are the notes that I scribbled during the presentation. There were enough of them and the class was good enough that I figured I'd benefit from typing them up.

Benchmarking

Profiling

  • Diagnose bottlenecks on a running system
  • For OLAP (lots of reads), memory is usually the bottleneck
  • For OLTP (online transaction processing), I/O is usually the bottleneck
  • turn on slow query logging in my.cf
  • Despite its name, the log_long_format variable lists any query that doesn't use an index
  • mysqldumpslow dumps the entire DB/table by selecting 1 row at a time

EXPLAIN command

  • Prepend "explain" before any "select Shows execution plan chosen by mysql for a query
  • Type of "all" means the entire table was searched.
  • Sequential reading of all data is often faster than pulling lots of indexed rows, so an available index may be ignored.
  • Type of "index"
    • Selecting only columns which are in an index is faster even when pulling all rows, becasue it can scan the much smaller index entries rather than the much larger row data.
    • Look for "Extra: using index"
  • Type of "range" (like "between", "in", and <> in "where" clause)
    • Very old and mature optimizer
    • Very fast run time if where column is indexed, otherwise searches all rows
    • "Extra: using where" means all rows were retrieved, then filtered

Schema

  • If table contains lots of columns which are infrequently used, split the less used fields into a separate table to reduce the foootprint of each row and improve the memory cache (buffer pool) utilization.
  • Updates to a table invalidate all query cache rows which reference that table, so split static and dynamic fields into separate tables to improve cache hit rate for static data.
  • "select count(*) from table" in innodb requires a full table scan, but is instantaneous in MyISAM.
  • "engine=memoryinit_sql script.
  • heap_table_size sets the size of memory tables - 32MB is default.
  • To query count of innodb databases, you can create a memory table that stores the number of records in an innodb table. This is filled at boot time by the init_sql script and then updated by the app when the table is modified.
  • INFORMATION_SCHEMA is new in MySQL 5.0.
  • When creating indices, you can determine their usefulness by checking the selectability: distinct(*) / count(*)
  • Rather than storing IP addresses as 15-character strings, store them as 4-byte integers. Use the builtin functions INET_ATON() and INET_NTOA() to convert IP strings to ints.
  • Never group by a non-indexed column.
  • Autoincrement primary keys don't lock MyISAM tables when updated like most other updates do.
  • Use the "archive" storage engine for web logs. Archive is as fast as MyISAM, and compresses its data using gzip, but it can't be modified once inserted. To delete old entries, use partitioning in v5.1 to split the data into buckets, then periodically drop the entire bucket of old rows.

SQL Coding

  • Inner join on a temp table is much faster than correlated subqueries.
  • Applying a built-in function on a field prevents the use of any index that may be on that field.
  • A like match with a fixed prefix (LIKE "abc%") uses the "range" type and is quite fast.
  • The CURRENT_DATE() function is dynamic and therefore doesn't allow storage of results in the query cache. It's faster to hardcode today's date as obtained from the calling language.
  • Doing a postfix match (LIKE "%abc") requires a full text search. It's faster to create a new field storing the reversed field using the REVERSE() function and do an index search on that (LIKE CONCAT(REVERSE(pattern), '%')). You can insert the calculated reversed field on the fly with triggers.
  • Unlike most DB's, stored procedures in MySQL are cached per connection thread, not globally. If the connection is kept open for lots of requests, they work well. For web apps that close the connection after a few queries, stored procs are inefficient.
  • PHP
    • Avoid persistent connections, as individual connections from PHP to MySQL are 10-100 times faster than to other DB's (has to do with the driver).
    • The mysqli driver is faster than the mysql driver when using prepared statements.
    • Use PDO v5.1 (not to be confused with MySQL 5.1)
    • Using prepared statements avoids SQL injection and is faster.
    • Serve static web content using a light weight HTTP server like lighthttpd, not a heavy weight server like Apache.
    • "Smarty" is a PHP templating system

Server Tuning

  • "show status" or "show status like 'Qc%'" shows various server stats.
  • When setting server variables at startup, be aware of which variables are global and which are per thread. Setting a huge buffer per thread will kill the server.
  • Adding more RAM is the best way to boost performance via hardware upgrades.
  • MyISAM variaables
    • key_buffer_size
    • key_block_unused=0 means you've run out
    • table_cache shows simultaneous open connections to each table
    • myisam_sort_buffer_size - large helps when building indices, but reduce it again when done. This setting is per connection, so you can set it dynamically after you connect, and it'll go away when your thread dies.

Misc Info

Good references for MySQL support and tuning include:

Adding "\G" to the end of a query in the mysql command line UI will present the output as row based rather than column based.

DailyWTF has curious things every day.


last updated 5 Apr 2007
Obi-Wan (obiwan@jedi.com)




Namecheap.com - Cheap domain name registration, renewal and transfers - Free SSL Certificates - Web Hosting