Notes from the
Maximum Velocity MySQL
talk by Jay Pipes
On Tuesday, 3 April 2007, I attended a three-hour class entitled, "Maximum Velocity MySQL" in Omaha, NE. The class was taught by Jay Pipes, MySQL Community Relations Manager and co-author of Pro MySQL. From the advertisement, the class covered:
- How to diagnose MySQL database performance bottlenecks using a variety of tools.
- How to identify and FIX problem SQL code.
- Best practices for writing streamlined, efficient SQL code that takes advantage of MySQL's strengths.
- About advanced index concepts.
- How to build an effective indexing strategy.
- How to choose appropriate storage engines for different types of data.
- How to adjust MySQL server parameters to squeeze the most performance out of your application.
- How to plan for growth, in your code and in your index strategy.
- How you can receive automatic preemptive advice & code for improving performance!
Slides for this presentation are here (680KB PDF). Below are the notes that I scribbled during the presentation. There were enough of them and the class was good enough that I figured I'd benefit from typing them up.
Benchmarking
- ApacheBench (ab) provides simple web app benchmarks by simulating a load on a page.
- Sysbench, mysqlslap (v5.1), and Super Smack are mysql-specific benchmark tools.
- Frameworks / harnesses reduce repetition when benchmarking -
MyBench,
JMeter/Ant -
benchmark-no-cache.sh
Profiling
- Diagnose bottlenecks on a running system
- For OLAP (lots of reads), memory is usually the bottleneck
- For OLTP (online transaction processing), I/O is usually the bottleneck
- turn on slow query logging in
my.cf
- Despite its name, the
log_long_format
variable lists any query that doesn't use an index -
mysqldumpslow
dumps the entire DB/table by selecting 1 row at a time
EXPLAIN command
- Prepend "
explain
" before any "select
Shows execution plan chosen by mysql for a query - Type of "all" means the entire table was searched.
- Sequential reading of all data is often faster than pulling lots of indexed rows, so an available index may be ignored.
- Type of "index"
- Selecting only columns which are in an index is faster even when pulling all rows, becasue it can scan the much smaller index entries rather than the much larger row data.
- Look for "
Extra: using index
"
- Type of "range" (like "between", "in", and <> in "where" clause)
- Very old and mature optimizer
- Very fast run time if where column is indexed, otherwise searches all rows
- "
Extra: using where
" means all rows were retrieved, then filtered
Schema
- If table contains lots of columns which are infrequently used, split the less used fields into a separate table to reduce the foootprint of each row and improve the memory cache (buffer pool) utilization.
- Updates to a table invalidate all query cache rows which reference that table, so split static and dynamic fields into separate tables to improve cache hit rate for static data.
- "
select count(*) from table
" in innodb requires a full table scan, but is instantaneous in MyISAM. - "
engine=memory
init_sql script. -
heap_table_size
sets the size of memory tables - 32MB is default. - To query count of innodb databases, you can create a memory table
that stores the number of records in an innodb table. This is
filled at boot time by the
init_sql
script and then updated by the app when the table is modified. -
INFORMATION_SCHEMA
is new in MySQL 5.0. - When creating indices, you can determine their usefulness by checking
the selectability:
distinct(*) / count(*)
- Rather than storing IP addresses as 15-character strings, store them
as 4-byte integers. Use the builtin functions
INET_ATON()
andINET_NTOA()
to convert IP strings to ints. - Never group by a non-indexed column.
- Autoincrement primary keys don't lock MyISAM tables when updated like most other updates do.
- Use the "archive" storage engine for web logs. Archive is as fast as MyISAM, and compresses its data using gzip, but it can't be modified once inserted. To delete old entries, use partitioning in v5.1 to split the data into buckets, then periodically drop the entire bucket of old rows.
SQL Coding
- Inner join on a temp table is much faster than correlated subqueries.
- Applying a built-in function on a field prevents the use of any index that may be on that field.
- A like match with a fixed prefix (
LIKE "abc%"
) uses the "range" type and is quite fast. - The
CURRENT_DATE()
function is dynamic and therefore doesn't allow storage of results in the query cache. It's faster to hardcode today's date as obtained from the calling language. - Doing a postfix match (
LIKE "%abc"
) requires a full text search. It's faster to create a new field storing the reversed field using theREVERSE()
function and do an index search on that (LIKE CONCAT(REVERSE(
pattern), '%')
). You can insert the calculated reversed field on the fly with triggers. - Unlike most DB's, stored procedures in MySQL are cached per connection thread, not globally. If the connection is kept open for lots of requests, they work well. For web apps that close the connection after a few queries, stored procs are inefficient.
- PHP
- Avoid persistent connections, as individual connections from PHP to MySQL are 10-100 times faster than to other DB's (has to do with the driver).
- The
mysqli
driver is faster than themysql
driver when using prepared statements. - Use PDO v5.1 (not to be confused with MySQL 5.1)
- Using prepared statements avoids SQL injection and is faster.
- Serve static web content using a light weight HTTP server
like
lighthttpd
, not a heavy weight server like Apache. - "Smarty" is a PHP templating system
Server Tuning
- "
show status
" or "show status like 'Qc%'
" shows various server stats. - When setting server variables at startup, be aware of which variables are global and which are per thread. Setting a huge buffer per thread will kill the server.
- Adding more RAM is the best way to boost performance via hardware upgrades.
- MyISAM variaables
-
key_buffer_size
-
key_block_unused=0
means you've run out -
table_cache
shows simultaneous open connections to each table -
myisam_sort_buffer_size
- large helps when building indices, but reduce it again when done. This setting is per connection, so you can set it dynamically after you connect, and it'll go away when your thread dies.
-
Misc Info
Good references for MySQL support and tuning include:
-
Optimizing Linux Performance by Philip Ezollt - Advanced PHP Programming by George Schlossnagle
- Pro MySQL by Jay Pipes
- www.mysqlperformanceblog.com is written by a guy who used to be a MySQL developer
- XAPRB.com
- PlanetMySQL.org
- Forge.MySQL.com/worklog/ contains the MySQL roadmap
- IRC channel (freenode)
#mysql
- The MySQL Reference Manual
Adding "\G
" to the end of a query in the mysql command line
UI will present the output as row based rather than column based.
DailyWTF has curious things every day.
last updated 5 Apr 2007 Obi-Wan (obiwan@jedi.com)
Links on this page to Amazon are part of an affiliate program that helps keep Jedi.com operational. Thank you for your support!