Maximum Velocity MySQL

Notes from the
Maximum Velocity MySQL
talk by Jay Pipes

On Tuesday, 3 April 2007, I attended a three-hour class entitled, "Maximum Velocity MySQL" in Omaha, NE. The class was taught by Jay Pipes, MySQL Community Relations Manager and co-author of Pro MySQL. From the advertisement, the class covered:

How to diagnose MySQL database performance bottlenecks using a variety of tools.
How to identify and FIX problem SQL code.
Best practices for writing streamlined, efficient SQL code that takes advantage of MySQL's strengths.
About advanced index concepts.
How to build an effective indexing strategy.
How to choose appropriate storage engines for different types of data.
How to adjust MySQL server parameters to squeeze the most performance out of your application.
How to plan for growth, in your code and in your index strategy.
How you can receive automatic preemptive advice & code for improving performance!

Slides for this presentation are here (680KB PDF). Below are the notes that I scribbled during the presentation. There were enough of them and the class was good enough that I figured I'd benefit from typing them up.

Benchmarking

ApacheBench (ab) provides simple web app benchmarks by simulating a load on a page.
Sysbench, mysqlslap (v5.1), and Super Smack are mysql-specific benchmark tools.
Frameworks / harnesses reduce repetition when benchmarking - MyBench, JMeter/Ant - benchmark-no-cache.sh

Profiling

Diagnose bottlenecks on a running system
For OLAP (lots of reads), memory is usually the bottleneck
For OLTP (online transaction processing), I/O is usually the bottleneck
turn on slow query logging in my.cf
Despite its name, the log_long_format variable lists any query that doesn't use an index
mysqldumpslow dumps the entire DB/table by selecting 1 row at a time

EXPLAIN command

Prepend "explain" before any "select Shows execution plan chosen by mysql for a query
Type of "all" means the entire table was searched.
Sequential reading of all data is often faster than pulling lots of indexed rows, so an available index may be ignored.
Type of "index" Selecting only columns which are in an index is faster even when pulling all rows, becasue it can scan the much smaller index entries rather than the much larger row data. Look for "Extra: using index"
Type of "range" (like "between", "in", and <> in "where" clause) Very old and mature optimizer Very fast run time if where column is indexed, otherwise searches all rows "Extra: using where" means all rows were retrieved, then filtered



Schema


 If table contains lots of columns which are infrequently used, split
	the less used fields into a separate table to reduce the foootprint
	of each row and improve the memory cache (buffer pool) utilization.
 Updates to a table invalidate all query cache rows which reference that
	table, so split static and dynamic fields into separate tables to
	improve cache hit rate for static data.
 "select count(*) from table" in innodb requires a full
	table scan, but is instantaneous in MyISAM.
 "engine=memoryinit_sql script.
 heap_table_size sets the size of memory tables - 32MB is
	default.
 To query count of innodb databases, you can create a memory table
	that stores the number of records in an innodb table.  This is
	filled at boot time by the init_sql script and then
	updated by the app when the table is modified.
 INFORMATION_SCHEMA is new in MySQL 5.0.
 When creating indices, you can determine their usefulness by checking
	the selectability:  distinct(*) / count(*)
 Rather than storing IP addresses as 15-character strings, store them 
	as 4-byte integers.  Use the builtin functions INET_ATON()
	and INET_NTOA() to convert IP strings to ints.
 Never group by a non-indexed column.
 Autoincrement primary keys don't lock MyISAM tables when updated
	like most other updates do.
 Use the "archive" storage engine for web logs.  Archive is
	as fast as MyISAM, and compresses its data using gzip, but it
	can't be modified once inserted.  To delete old entries,
	use partitioning in v5.1 to split the data into buckets, then
	periodically drop the entire bucket of old rows.


SQL Coding


 Inner join on a temp table is much faster than correlated subqueries.
 Applying a built-in function on a field prevents the use of any index
	that may be on that field.
 A like match with a fixed prefix (LIKE "abc%") uses the
	"range" type and is quite fast.
 The CURRENT_DATE() function is dynamic and therefore doesn't
	allow storage of results in the query cache.  It's faster to
	hardcode today's date as obtained from the calling language.
 Doing a postfix match (LIKE "%abc") requires a full text
	search.  It's faster to create a new field storing the reversed
	field using the REVERSE() function and do an index search
	on that (LIKE CONCAT(REVERSE(pattern), '%')).
	You can insert the calculated reversed field on the fly with triggers.
 Unlike most DB's, stored procedures in MySQL are cached per connection
	thread, not globally.  If the connection is kept open for lots of
	requests, they work well.  For web apps that close the connection
	after a few queries, stored procs are inefficient.
 PHP
	
	 Avoid persistent connections, as individual connections from PHP
		to MySQL are 10-100 times faster than to other DB's (has to
		do with the driver).
	
 The mysqli driver is faster than the
		mysql driver when using prepared statements.
	
 Use PDO v5.1 (not to be
		confused with MySQL 5.1)
	
 Using prepared statements avoids SQL injection and is faster.
	
 Serve static web content using a light weight HTTP server
		like lighthttpd,
		not a heavy weight server like Apache.
	
 "Smarty" is a PHP templating system
	


Server Tuning


 "show status" or "show status like 'Qc%'"
	shows various server stats.
 When setting server variables at startup, be aware of which variables
	are global and which are per thread.  Setting a huge buffer per
	thread will kill the server.
 Adding more RAM is the best way to boost performance via hardware upgrades.
 MyISAM variaables
	
	 key_buffer_size
	
 key_block_unused=0 means you've run out
	
 table_cache shows simultaneous open connections to
		each table
	
 myisam_sort_buffer_size - large helps when building
		indices, but reduce it again when done.  This setting is per
		connection, so you can set it dynamically after you connect,
		and it'll go away when your thread dies.
	


Misc Info


Good references for MySQL support and tuning include:


 Optimizing Linux Performance by Philip Ezollt
 Advanced PHP Programming by George Schlossnagle
 Pro MySQL by Jay Pipes
 www.mysqlperformanceblog.com is written by a guy who used to be a MySQL developer
 XAPRB.com
 PlanetMySQL.org
 Forge.MySQL.com/worklog/ contains the MySQL roadmap
 IRC channel (freenode) #mysql
 The MySQL
	Reference Manual



Adding "\G" to the end of a query in the mysql command line
UI will present the output as row based rather than column based.



DailyWTF has curious things every day.



last updated 5 Apr 2007

Obi-Wan (obiwan@jedi.com)







Links on this page to Amazon are part of an affiliate program that helps
keep Jedi.com operational.  Thank you for your support!

Notes from theMaximum Velocity MySQLtalk by Jay Pipes

Benchmarking

Profiling

EXPLAIN command

Schema

SQL Coding

Server Tuning

Misc Info

Notes from the
Maximum Velocity MySQL
talk by Jay Pipes