Database Naming Conventions

Posted Wednesday, July 02, 2008 2:50 PM by Nathan Zaugg

DatabaseServer Naming conventions are like arm pits.  Everyone has them and they all stink!  Well, at least that's the perspective of pretty much every developer an DBA alike.  I will present my own personal philosophy for naming conventions on databases and hopefully spawn some discussion in the process.

Basic Principles

Consistency

As annoying as certain standards are (such as putting tbl_ before everything) it is more annoying and more difficult when there are no conventions or mixed conventions.  Being able to reliably predict the schema once the basic relational structure is understood is key to productivity.  Therefore, even if you get stuck with standards you disagree with, so long as they are consistent they will be much better than the alternative.  Unless you get to make the decision, my guess is that there are going to be some conventions that you do not agree with.

Abbreviations

It is a good idea to abbreviate, when appropriate, in the naming of objects in your database.  It may be a good idea to have a list of abbreviations that you plan to use in the database as part of your data dictionary.  However, if there is not a good, clear abbreviation for an object, don't make one up.  When in doubt, spell it out!  Especially with SQL Server where you don't have the pesky 30 char limit for tables and columns like Oracle.

Identities

Every table should have an Identity as it's primary key!  Sometime, in a future blog post I will explain why this is so critical, but suffice it to say that any table that does not have a primary key is considered by SQL Server a "heap".  If you are using something other than an Identity column for the primary key you better have a really compelling reason because it will cause major performance problems.  THERE IS NO SUCH THING AS A NATURAL KEY AND THEY SHOULD NEVER BE USED IN PLACE OF AN IDENTITY! So always use a surrogate key approach, even with join tables.

Security

Key I believe that with a good data layer like Linq to SQL there is no need for relegating all database access through stored procedures.  While it does remove some of the service area for venerability and bugs robust solutions like Linq to SQL are very limited by this approach.  You should grant specific access to tables and procs by user.  A good approach can be found on another one of my blog posts.

Object Naming

Table Names

  • If you are running a database that preserves case (like SQL Server) tables should have no prefixes and should not contain underscores "_" unless it is a join table.  Table names should also be Pascal Cased.  If you are running a database that makes all tables upper case (like Oracle) then you have little choice but to use underscores everywhere.
  • Avoid pluralizing table names (User vs Users).  This is a good idea for two reasons, first it can be confusing when doing the keys.  Do we use UserID or UsersID?  Second not all tables pluralize well (Addresses) so avoiding any plural names will keep it consistent.  It you are using Linq to SQL the designer will pluralize for you automatically.
  • Join tables should the two or three tables that they are joining together as part of the name seperated by underscores.  (ex: User_Address, User_Order). Although they are many-to-many relationship see if you can find a principle table.  Users have orders, orders do not have users, therefore the User table comes first in the name.

Column Names

  • Name the Primary Key Identity column the table name with ID.  (ex: UserID) With the possible exception of join tables, in which case just name the Identity ID.
  • Use Pascal casing (ex: EachWordsStartsCapolatized)
  • Do not use the table name as part of the column name.  If this is a shipping table don't name your columns ShippingAddress, just name it Address.
  • Do not prefix column names with the type (ex: strUserName).  It makes the database much more difficult to work with.
  • Use the correct data types.  Always use nvarchar types (unicode) rather than varchar types.  This avoids substantial complexity if you are ever requried to store non latin-based data!  Trust me, you do not want to have to deal with code pages in the database!  Also, use Date fields for dates, bit fields for boolean, etc.
  • Don't make every column nullable!  Think through what data is absolutely required.  If you want to hold "partally complete records" then I would suggest a different table or different "staging" database. 
  • Don't make a bit field nullable unless you have a great reason!
  • Try to include a TimeStamp column if you think you may have to worry about concurrency.
  • Don't prefix with anything.

Constraint & Index Names

  • Name your constraints and indexes.  With the exception of foreign key constraints they are not automatically assigned meaningful names.
  • Don't use prefixes and make light use of underscores.

Stored Procedure Names

  • Don't prefix your stored procedures!  People used to prefix them with "sp" because existing procs in the database use this convention.  It has been presumed that sp stands for system procedure and it wouldn't make any sense to use that.  Seriously, prefixes are not very helpful in the database!
  • The first part of the name of a proc should be the table name it works upon (ex: User_Insert).  If the proc works on multiple tables try to give it the name of the portion of the database this proc deals with.  For example, if it's a proc that the invoicing system uses it would be acceptable to name it Invoicing_Update, for example.
  • Don't generate procs for simple Insert, Update, Delete, and Select unless you have a policy in place for accessing data exclusively from procs.
  • Don't create any stored procedure you don't need or plan to immediately use.  At some point you will change the schema and you won't update procs your not using.  Someone may eventually want to use that proc later only to find it broken.
  • The verb in the naming convention does not have to be relegated to "Insert, Update, Delete, Select".  It should say what it does.  Just be careful that if there is another procedure that does this same thing to another table that the verbs are named the same.
  • You can add additional information to the proc name to help distinguish it from others.  (ex: User_Select_ByDate, User_Select_ByState)
  • Don't use a prefix for arguments (ex: @ArgUserID). In my experience they don't help at all and are quite annoying!

Tips & Tricks

SQL Server 2008 has a policy manager that can help create and enforce policies like naming conventions!  Regardless of using SQL 2008 be sure to keep a Data Dictionary of your database! The database is the heart and soul of your business processes and should be well documented!  There is nothing worse than an unclean database!

Nathan Zaugg

Comments

# re: Database Naming Conventions

Wednesday, July 21, 2010 4:47 PM by Nathan Zaugg

Here is the 4-1-1 on why it's important to use identity columns over "natural" keys.  Listed in order of importance.

  * It is important for the health of a database to insert data into the clustered index in an ascending order.  Background:  There are two types of indexes called a Clustered Index and a Non-Clustered index.  A clustered index is the physical ordering of the rows in the pages on the database.  A non-clusterd index references the clustered index to make it's own lists of the order of the rows, but it just keeps a reference to the clustered index.  You may only have one clustered index because you can only have one physical ordering of data.  Okay, so now to the question "Why is it bad to insert rows out of order".  The answer is that because the clustered index is physically ordered, if you have a row that goes in the middle of other rows, those other rows need to be shifted down.  Actually it doesn't bother moving the data down, it performs something called a page tear.  Because all rows are stored on 8K pages any rows behind the row that is inserted are simply copied to a new page.  This leaves both pages half empty and means your database has to perform twice as many requests to the disk.  This is a big deal!  The performance will be acceptable in development (you may never notice) but in production it will cause nothing but problems.

  * There really is no such thing as a natural key - there would be very few real-life examples of where the data is guaranteed to be unique (an imperative for being a primary key). It then takes very little time for an application to want to insert a duplicate record (violating the key) because the key was poorly selected. In fact, because of these problems, I very frequently see tables with almost ALL columns in the composite key because it takes nearly all columns to make the row unique.  So what's the point of a key at all?

  * Natural keys typically include more than a single value (see reason above).  This makes it difficult to make relationships to other tables because each column that constitutes the key will have to be present on the child table to make that relationship.

  * When you join two tables together, you typically join the tables on as key.  For the parent table it will probably be a primary key and for the child table it will be a foreign key (non-clustered index).  There are many ways the DBMS may choose to join the tables but typically it will be able to use the clustered index of the parent table and join the parent to the child.  This will use a clustered-index scan which is the fastest.  If both keys are non-clustered-indexes (because you are trying to avoid the page tearing on insert) the result is a hash-join or even a full table scan which are much slower! Also your join statements in SQL are going to get really long and ugly.

  * If your primary key is missing or is an non-clustered index, a key is generated for you anyway.  This is so it can distinguish between two different rows in the heap that have the same values and it really *needs* that row internally.  For example, if you add a non-clustered index what would it point to?  This also means that you will have two non-clustered indexes to deal with when joining tables together which hurts join performance (see point above)

  * The surrogate key gains meaning in the application and is often used as the predicate in SQL queries.

  * The table schema is more difficult to change. For example, you cannot move a field out of one table and into another when it is part of the natural key.  The integrity of the entire table is put at risk.

Having said that, you can have a primary key that is a non-clustered index and you can opt for a different column to be your clustered index.  This is a very a-typical configuration and can be helpful.  Remember, clustered index seeks/scans are far faster than using a non-clustered index.  If you constantly query on a field or set of fields that can be unique, you would see a big performance boost in those queries.  Pretty much everything else you do with that table will suffer though and you still have the problem of getting it in there in order or keeping the database defragmented.  Another potential gain is that the natural order of the rows will be the same as the clustered index.  There are situations where you would want to take advantage of these side-effects but only an experienced DBA could make a good judgment call there.