Monday, September 13, 2010

Summary of SQL Antipatterns Presentation

The following is a summary of the presentation, "SQL Antipatterns - Tables and Queries That Don't Work", by Bill Karwin. Please see Bill's presentation for details about each of these:

Things not to do:

  • Comma-Separated Lists: Storing list of relations as a comma-delimited list of ids.
  • Multi-Column Attributes: Storing ids to related tables as additional columns in same table (e.g. product_id1, product_id2, product_id3)
  • Entity-Attribute-Value: Storing generic key/value data in a table (e.g. attr_name, attr_value) in cases where attributes do not have to be truly dynamic, constraints on data are not enforced by application, non-relational solutions have not been explored, etc.
  • Metadata Tribbles: Putting year (data) into database table name or other similar splits for performance that ignore ability to partition horizontally, vertically, or use dependent tables.
  • ID Required: Creating a new ID field (calling it "ID") and ignoring other ID fields that could be used as ID or composite ID or that could have more descriptive names.
  • Phantom Files: Storing links in database to external files that should be managed transactionally, backed up, and restored with the database.
  • FLOAT: Storing decimal numbers (especially money or time estimates) as inexact FLOAT data type vs. NUMERIC data type.
  • ENUM: Using ENUM datatype when the values may change.
  • Readable Passwords: Storing passwords in plain text (e.g. for purpose of providing to user upon request for password) rather than MD5 digest w/salt in application prior to storage in DB.
  • Ambiguous GROUP BY: Using GROUP BY and select some attributes in result without using functionally dependent attributes, GROUP_CONCAT with attribute, or using OUTER JOIN instead.
  • HAVING: Using HAVING clause vs. WHERE clause, not realizing order of execution for HAVING is after WHERE, SELECT, aliases, and GROUP BY (and only before ORDER BY).
  • Poor Man's Search Engine: Using LIKE with wildcards on both sides or RLIKE (regexp) (Note: index is used if wildcard only at end of LIKE, but not used if wildcard at beginning) rather than MySQL FULLTEXT index (MyISAM only), Lucene, or Sphinx Search (for example).
  • Implicit Columns: Selecting wildcard columns.
  • User-supplied SQL: Letting users supply SQL.
  • SQL Injection: Putting user-supplied data into SQL expression directly rather than filtering, parameterizing.
  • Parameter Facade: Reusing parameter entered by user for list used by application, like WHERE bug_id in LIST ( $user_supplied_parameter ) because you want to have application also supply as "1234,3456,5678". (basically same as SQL injection)
  • Pseudokey Neat Freak: Changing ID values to eliminate gaps between sequential IDs OR recycling ID values.
  • Session Coupling: Using persistent connections and reusing uncommitted transactions, changing/using connections character set, using DB session variables, using LAST_INSERT_ID.
  • Phantom Side Effects: Executing external effects (that don't obey ROLLBACK) in database triggers, stored procs, and functions, and kicking off application functionality (like emailing link with ID, etc.) without waiting on transactions to complete.
See also:

No comments: