My Wonderful Life

Browse through the life of Girish Prabhu

Archive for April 7th, 2010

Common mistakes we do in SQL Server Queries.

with one comment

For the last several years I was working with several enterprise applications, obviously that makes me work on MS SQL Server more than any other application or technology. When I speak about Enterprise applications, we have lot of data – millions. So applications built on that size of data need to be very cautious about the queries and performance results of those queries.

I learned so many things and as I was noticing the patterns in SQL Server ‘human-errors’ looked same for several programmers. Smaller steps that you keep in mind can really help you write high performing application. I would like to comment on few points that I noticed to improve the performance of queries.

  1. Make sure you have Indices created for all Join operations – When the huge tables are created and joined with queries, there are possibility that the table gets ‘scanned’ several times for validating all join conditions. So if you create Index for that join field, system will use hashing and will be much faster.
  2. Avoid using cursors – Cursors are real curses. Avoid cursors and use temporary tables if joins are complex. In case you have recursive cursors, you can use Common Table Expressions. These both will give much quicker results than a cursor. 
  3. Avoid views or complicated joins for large tables – When you have a big table containing tens-of-thousands of data and several other tables, the join causes to read through main table several times for each join (even thought is having index). In this case, you can make snapshot of large table in to a small table and then do all the joins with that temp table. For example if you need to prepare reports for year 2009, then get all the data related to 2009 from the large table to the temp table and then join other tables with this temp table.
  4. Avoid update statements even for temp tables – Consider for reports you need to calculate the extended price for the materials sold. I see several people declaring a temp table and then insert the material details and cost in to the temp table. After that they do an update for calculating the extended price = unitprice * qty. Instead you can directly use this as a calculated column in the select statement itself. 
  5. Avoid Delete – Very important. Any delete (physical delete) should be done when there are no much use of the system. So I would suggest to do a batch delete in off-peak hours. When we delete the system internally does so many calculations to verify the foreign keys, validations, recalculation of indexes etc. For a large table a delete can cause deadlock situation if some one else is trying to read from this table during a delete (if delete happens to take time).
  6. Get only what you want – It is a common practice to use Select (*) even though we need just 2 or 3 columns from that table. Even if you take all columns better make it a practice to specify the columns for the select. This way in later stages, you can avoid unwanted fetches even if we add additional columns may be years later. 
  7. Do not copy what is working for another procedure – It is again another common practice that we copy an existing SQL statement or procedure to make a variant of the same (may be for some additional filtering or joins). It is always a wrong practice, I strongly recommend to build the query from scratch, that allows you to revalidate your requirements and take only what you need. 
  8. Do not read table twice – I see several programmers write queries to get the record before we do an update. In a large table this would cost you almost twice the duration.
  9. Have a primary key for all tables – If you donot keep it, better do not program SQL Server :)

There are obviously lot of other performance improvements we can implement. I welcome you if you have any more points to add to this list.

Written by girishsprabhu

April 7, 2010 at 8:19 pm

Follow

Get every new post delivered to your Inbox.