MSSQL Tips & Tricks: 2012

Monday, October 22, 2012

Dynamic SQL usage

One of the usefulness about dynamic SQL is for database maintenance. For example, you want to run the maintenance process in the selected databases.

Below is the flow of using dynamic SQL to ease the maintenance:

Create a table that stores the database names for your system.
Declare a cursor and loop through all your database names and match the process number (explain later).
Generate the dynamic SQL statement to be executed against the selected databases.
Finally, execute the dynamic SQL.

The reason to have a table to stores your database names:

You are not interested in all databases stored in sys.databases. You just want to run the script against the selected databases.
The database name did not follow any convention before you take over the maintenance. In you are handling the database server, then, you should follow a standard naming convention for the database name. This allows you to query the database name from sys.databases.
You don't want to execute the SQL statement against all your application databases. Only certain databases are going to run script 1 and other databases are going to run script 2, etc.

The table design to stores the database name:

db_name nvarchar(255) - this is the database name
process_1 int - if '1' means run the maintenance stored procedure #1. '0' to ignore.
process_2 int - if '1' means run the maintenance stored procedure #2.'0' to ignore.
(other process flag).

Monday, October 15, 2012

Executing dynamic SQL

In MSSQL, it is able to execute dynamic SQL. Meaning, you are allowed to execute the SQL statement that stores in a variable by calling sp_executesql system stored procedure. It compiles the SQL statement at runtime and then execute it. You may also create parameterized SQL statement so that it compiles once and run multiple times (for example, inserting many records within a loop).

Whenever it's possible, try to avoid using the dynamic SQL. This is because each time you are executing the dynamic SQL statement (not within a loop in the stored procedure), it will be recompiled and increase the execution time.

The advantage of implementing this idea will save you from having a permanent table to store the reporting data or data for the current session. Also, it saves you having tuning the SQL statement with complex WHERE clause where some of the conditions were optional.

The following sample code returns all the object ID-s in the database by using the dynamic SQL statement.

declare @sql nvarchar(max)

select @sql = '
declare @tb table ( id int )

insert into @tb (id)
select object_id from sys.objects

select * from @tb
'
exec sp_executesql @sql

I will discuss a bit more in detail in my next article.

Monday, October 8, 2012

Generating "reminder" for the user

In a mid to large scale application, often you need to run scheduled tasks which can be schedule through the SQL Agent. One type of the schedule is to generate reminder or alert for the user. For example, inform the collection department about the overdue invoice.

Below is the strategy to implement this type of process:

Create a stored procedures to scan the data.
In this stored procedure, it contains the business logic about what data will meet the requirement and then generate the reminder record.
The reminder record will be saved into tb_mail (this table stores the email to be pickup by a C# program).
Develop a program in the C# application which will send all the emails in tb_mail table.

In case you don't want to notify the user through email, you may append this reminder record into an "alert" table (which can be "tb_log" table in our design). Then, display the alerts upon user login.

For those alert requires immediate attention, you may send the reminder or alert through SMS (short messaging).

Monday, October 1, 2012

Benefits of using stored procedure instead of writing codes in C#

Started from my favorite programming language, Visual Basic 6 (that was in 1997), I've been thinking where is the best place to implement the "business rule". After many years of moving the business rules from the the programming language to the database and vice versa, I've finally decided the best place for the business rule is in the stored procedure.

The simplest stored procedure is for SELECT, INSERT, UPDATE and DELETE. Indeed, you can do more than that. Benefits of using the stored procedure:

Create audit log when doing any of the CRUD operation.
Able to implement sophisticated control in what kind of data can be return to caller.
Easier to enhance/process the records before returning the records.
Stored procedure has been compiled and it runs faster than the ad-hoc query.
Able to enhance the stored procedure at anytime.
You don't have to recompile the application if there are any changes in the rule.
You don't have to redistribute the application (at client workstation or web server) if there are any changes in the rule.

Well, I know that you might not 100% agree with my argument. But, as a customize solution builder like me, we can't ask all the users to stop for a while and wait for the latest update.

Thursday, September 27, 2012

Generate a series of number

Below is a way to generate a series of number (10 records where the value starts from 1 to 10) with Common Table Expression (CTE):

with tb (ii)
as (
    select
        1 ii
    union all
        select ii + 1
        from tb
        where ii + 1 <= 10
)

select *
from tb

For the detailed explanation, check this out:

http://www.databasejournal.com/features/mssql/article.php/3502676/Common-Table-Expressions-CTE-on-SQL-2005.htm

When to use it

Use CTE + UNION ALL to generate hierarchical result. For example, you have a discussion forum where it allows the members to post question (a thread) with many replies (many level of sub-threads).
Reduce re-typing the same SELECT statement especially you are aggregating the data.

Thursday, September 20, 2012

Reading the system settings using stored procedure/function

Stop hard-coding the constants in a stored procedure. Instead, you should read the value from your "settings" table or returning the constant value by a function call. In C# and other programming languages, you can declare constant in a class and use it anywhere in the program. But, in MSSQL, you can't do that.

To get the constant or settings, you may try to do this:

Create a function which returns the value (could be constant or read a field from a table).
Create a stored procedure and returns the value through OUTPUT parameter.

There are some factors which you might have to decide how to design the stored procedure/function:

Should the value configurable by the the user? If no, you may return the value by a function call. If the value is user configurable, then, you need a "setting table" to store the value.
If the value must be stored in a table, do you store the value in a generic table structure (with prop_name and prop_value fields) OR do you need to design a "settings" table which contains only 1 record (with the appropriate field name which can reflect the setting)?

In our database design, we have a generic table that store setting. In case the values are frequently use, we will design a setting table to store the user configured values.

Monday, September 10, 2012

Show me the runtime value

In MSSQL, you may use PRINT statement to display some information when you are executing stored procedure or a SQL batch. This is very helpful to debug the stored procedure which contains many IF statements or you would like to find out the runtime value of certain variable.

For example,

declare @cust_name nvarchar(255)

select @cust_name = cust_name
from tb_customer
where cust_code = 'A001'

print @cust_name

When to use it:

Your stored procedure contains many IF statement and you want to know which path has been executed with different parameter value.
You want to find out which stored procedure has been called when the main stored procedure has been executed.
You want to find out what is the runtime value of the variables.
You want to find out where is the process has exited.

Monday, September 3, 2012

Enabled debugging mode within the stored procedure without using parameter

In our previous article, we discussed that the stored procedure should contain a parameter call "@debug_mode" so that it can be debug in the development phase. But, some people might feel that it makes the stored procedure looks ugly. In this article, we are going to show you another way to enable the debugging in the development without having this parameter.

To store the information to the current session/connection, use the following command:

SET CONTEXT_INFO 0x1000

To retrieve the value, use the following function:

SELECT context_info()

context_info able to store upto 128 bytes of information. The only thing is that you must learn how to work with bitwise operator.

First thing you have to do with context_info is to define the meaning of each "box". For example, you need to implement 4 hidden features. Then, the value will look like "0x0000" where "0x" indicates it is hex value and you can't use these two character to store any information. But, the 4 digits after "0x", you are freely to define the meaning.

1st box - if "1" means enable the debugging mode. Otherwise, "0".
2nd box - could be defined as "1" to log down the process details.
3rd box - could be defined as "1" to print the runtime value in the IDE.
4th box - could be defined as "1" to rollback the process so that it won't disturb the test data.

Lastly, before you run your stored procedure, you have to call "set context_info 0x1001" to enable the debug mode and rollback the changes.

Some modification is required in your stored procedure:

declare
@info varbinary(128),
@debug_mode int,
@rollback_data int

select @info = context_info()

if @info & 0x1000 = 0x1000
select @debug_mode = 1
else
select @debug_mode = 0

if @info & 0x0001 = 0x0001
select @rollback_data = 1
else
select @rollback_data = 0

(..contine your process here..)

References:
http://msdn.microsoft.com/en-us/library/ms187768.aspx
http://msdn.microsoft.com/en-us/library/ms176122.aspx

Tuesday, August 28, 2012

Allow to run the process again?

For the stored procedure that is executed by the SQL Agent (i.e., the scheduler), you might have to prevent the same stored procedure run more than once.

Allow re-entrant

Not all processes allow to run more than once. For example, the stored procedure that is responsible for generating monthly statement. Another example is the stored procedure that is generating the exception report.

Why re-entrant

In the development phase, you might want to run the stored procedure as many time as possible so that you can fine tune and debug the process. But, once these stored procedures go live, it will run once for the day, week, month or whatever predefined date.

Now, how to prevent re-entrant
In order to prevent the stored procedure to run more than once for a day, you need to have a condition to check the log table. In our example, the "tb_whatever_process_log" table is keeping track the process date.

    if exists(
        select *
        from tb_whatever_process_log
        where
            process_date = cast(convert(nvarchar, getdate(), 102) as datetime)
    )
    begin
        print 'Already done. Exit'

        set nocount off
        return
    end

    (... continue with the actual process..)

How to test this stored procedure?

Due to the re-entrant prevention condition, you will find it almost like no way to test out the stored procedure during the development phase. An easiest way to do this is to add a new parameter to the stored procedure call "@debug_mode int = 0". In the development phase, you will have to specify the "@debug_mode = 1".

Monday, August 20, 2012

How many lines should be written in a stored procedure?

I have an experience while participating in a project. While reviewing the stored procedure done by the programmer, I was having problem in understanding the stored procedure not because of it is complexity but it's length. One stored procedure consists of a few thousand lines of code. I was scratching my head and asked myself is it necessary to write everything in one stored procedure? No. It should be break into many stored procedures and then call by the main stored procedure.

Our stored procedure development guideline is quite simple and straight forward where it must:

Must be able to re-test and/or re-use.
Must be written in a maintainable way - if the stored procedure is handling small process, then, it should be easier to maintain.
The stored procedure should contains less than 1,000 lines. I mean, that should be the maximum it should be.

Tuesday, August 14, 2012

How should we name the stored procedure?

As the project goes bigger and bigger, many enhancements were done to the existing systems, you might find out that the programmer must have discipline in naming the stored procedure and other database objects. Otherwise, the system maintenance is going to be a painful job.

Let me explain a bit on the stored procedure naming convention:

"pr_" + "module initial" + "sub module initial" + "process name"

The stored procedure name starts with "pr_" instead of "sp_" where "sp_" is meant for MSSQL system stored procedure. Basically, "pr_" stands for "procedure".
"module initial" is the second section in the stored procedure name where it is short code or initial of the module name.
"sub module" is the third section in the stored procedure name and it denotes the short code or initial for the sub-process.
Lastly, "process name" is the actual process to be implemented within this stored procedure.

For example,

pr_pos_rpt_sales - the module name is "pos" (i.e., Point Of Sales) and sub module name is "rpt" (i.e., Report). The "sales" means that it is sales data related process. From the stored procedure name, it is intuitive and it simply means that this stored procedure is handling the POS sales report generation.
pr_ic_process_trans - the module name is "ic" (i.e., Inventory Control), the sub module name is "process" (i.e., process related) and "trans" is the short name for "transaction". This stored procedure is responsible for processing the inventory transactions.

For other database objects,

"tb_" is for the table object.
"fn_" is the function.
"vw_" is the view.

Tuesday, August 7, 2012

How to design the reporting process that involves in multiple sub-processes

One problem that you might be facing (if you are doing the above mentioned) is that the main stored procedure is calling multiple sub-processes (i.e., other stored procedures) and generate/process some data and returns it to the main process. How do you simplify the entire process?

You need the following tables in order to split the entire process into smaller task which eases you from the testing to the maintenance:

A table for storing the user parameter values - in this table, it stores the user parameter values in a row. Each row is meant for one process session. By having this table, you will have to pass the process session ID to the sub-process and the sub-process will be able to load the parameter value from that is required. Another advantage of having this table is that you will be able to tell the actual parameter value that has been passed in. In case, the auditor would like to view what and who has been queried, you may retain the record in this table.
A few report working tables - all the data that has been processed will be stored in this table for other module to pickup. For example, the report writer will query this table for the final output. The reason why you need many working tables is that it allows you to query the necessary data and stores into first table. Then, run an aggregate process and store it into another table as the final output. This will be very useful when it's not straight forward for you to get the final output by using one SELECT statement.

The stored procedure design:

A stored procedure that is responsible for saving the parameter into the parameter table.
All stored procedures must have "sess_id" (i.e., the process session ID in UniqueIdentifier type).

Note: please take note that the above design is relying on "sess_id" (i.e., the process session ID in UniqueIdentifier type).

Real life example:

Generating Balance Sheet report (i.e., an Accounting report) which consist of multiple sections that includes at least the fixed assets, current asset, current liabilities and capital. The raw data might be stored in different table and requires some aggregate over the historical data.

Tuesday, July 31, 2012

Passing the report parameters into stored procedure

When you are designing a screen which allows the user to choose the period, multiple customer types, etc, you might have a hard time in passing the values to the stored procedure. The common way to achieve this by having a nvarchar(max) parameter type to cater for the multiple customer types. The simplest way to handle multiple values passing by the caller is splitting the incoming string into table variable.

For example, the user want to get the sales report for the selected customer types (corporate customers, travel agents and walk-in customers) for year 2011. Then, you might have something similar as shown below:

exec pr_rpt_sales_by_cust_type
@start_date = '2011-01-01',
@end_date = '2011-12-31',
@customer_type = 'corporate, travel agent, walkin customer'

Assuming that we are using the display text to filter the data to ease the explanation. In real live, you might have to pass in the foreign key value.

Also, you need another function or stored procedure to parse the @customer_type parameter into table variable which you might have to do some research. ;)

Ok. Back to this topic. The select statement will be something like this:

select customer_type, sum(sales_amount) as total_sales_amount
from vw_sales
where transaction_date between @start_date and @end_date
and (customer_type in (
      select customer_type
      from dbo.fn_parse_param_to_table(@customer_type))
    or @customer_type = ''
)
group by customer_type

Note: "fn_parse_param_to_table" is the function that you have to develop it.

Tuesday, July 24, 2012

You need an application log for your systems

All systems requires a common area to keep track certain information. For example,

The exception/error which was not handled properly - this eases the system maintenance.
The user login and logout - you may use this log to keep track all security related activities. For example, append a "warning" log when someone is trying to hack the system especially the web base system.
When certain system settings were missing and the system continues to work using default value - this is very important if you are distributing your system to more than one site. For example, the system administrator forgot to set the new sales email address.
When user clicks on certain function which handles using stored procedure in the database - this is important because the user might already have clicked it multiple times and the system like no-effect. By creating log record into this table, the system administrator will be able to tell whether the stored procedure is running or not and carry out necessary recovery.

Table design:

CREATE TABLE tb_log (
    log_id uniqueidentifier NOT NULL,        --primary key
    log_type_id INT NOT NULL,                --type of log.
    workstation NVARCHAR(255) NOT NULL,        --the workstation name
    uid NVARCHAR(255) NOT NULL,                --the current user ID.
    msg NVARCHAR(max) NULL,                    --the details of the log
    remarks NVARCHAR(255) NULL,                --the remarks/procedure name.
    is_sent INT NOT NULL,        --if '1', means the log has been email to admin.
    app_id INT NOT NULL,                    --the application ID
    module_id INT NOT NULL,                    --the module ID
    modified_on DATETIME NOT NULL,            --the last updated date/time
    modified_by NVARCHAR(255) NOT NULL,    --the person who perform last update.
    created_on DATETIME NOT NULL,     --this is the current date/time when the log record is inserted.
    created_by NVARCHAR(255) NOT NULL        --the current user ID while creating this record.
);

The allowed values for the log_type_id field:

Error - this indicates that the system crashed at runtime and the value in the msg field is the exception details.
Information - this indicates that some processes have been kicked off and running smoothly.
Warning - this indicates that some certain system settings were missing or using default value but the system is still able to function normally.
Audit Log - this indicates that the log is security related or certain important data has been changed by the user.
Critical - this indicates the system crash is critical or someone is trying to figuring out the login ID or password and requires immediate attention.

Add-on module to be build on top of this log table:

Email the log records to the system administrator (and mark "is_sent" with value of "1"). The email can be send in daily or weekly basis.
If the log_type_id indicates that it is a "threat" or "hacking", the log record must be send immediately to the system administrator.

Note: for process logging, please refers to "Tracking the progress of a long run process" article.

Tuesday, July 17, 2012

Designing a custom mail process

Another important feature in every system is the email notification. The email notification is useful for the following situations:

Workflow such as the application that requires approval;
Notifying the system administrator on the current system status (such as server health report);
Sending daily or weekly reports to the user;
An error has happened;
System crash report;
etc.

The email notification does not limit to the above. You may apply this concept into the process logging feature as mentioned in the earlier blog or any other situation that requires user's attention.

Below is the table to store the email record. But, we are not using the MSSQL email notification feature to send the email out. You have to rely on external program to load the email contents and send it out. This gives you a choice of implementing this idea in any database engine.

CREATE TABLE tb_mail (
    mail_id UNIQUEIDENTIFIER NOT NULL primary key, -- stores the value of NEWID().
    send_to_email NVARCHAR(MAX) NULL, -- stores the recipient email address.
    cc_to_email NVARCHAR(MAX) NULL, -- stores the CC of the current email.
    subject NVARCHAR(255) NULL, -- stores the email subject.
    body_text NVARCHAR(MAX) NULL, -- stores the email message.
    attach_file NVARCHAR(MAX) NULL, -- stores the attachment file name.
    created_on DATETIME NULL, -- the record creation date/time.
    sent_status_id INT NULL, -- the email sending status. '0'-pending, '1'-sent & '2'-failed to send
    sent_on DATETIME NULL, -- the date/time of sending the email.
    mail_type_id INT NULL -- the email type (optional)
);

In the module that is responsible for sending the email, you must implement the following feature:

Able to send all the emails (sent_status_id = 0) out and
Able to send a specific email out upon request (which could be trigger through a TCP port or WCF).

Tuesday, July 10, 2012

Designing a stoppable long running process

Some processes are running at least 30 minutes. The question is how to design the long running process to be stoppable and upon restart it continues from where it left in the previous process? This is an interesting question. Unfortunately, most of the long running processes that I have heard are not stoppable and requires manual patching the 'corrupted' output before re-run the process.

For example, the process requires to issue monthly bill/invoice on the web hosting. This web hosting company has more than 10,000 customers and each of the customer are subscribing different services.

Reason for stopping the long run process:

The hard disk is running out of space and the process has stopped.
There are some new subscriptions to be queued into the bill/invoice.
Server has to be restarted due to OS (Operating System) updates or running out of memory.
For whatever unforeseen reason.

First thing first, we need a table to store the records to be process. You might have to customize this table so that it can cater for your need. The batch_no field is a text field which you can design it to store multi-locations or multi-companies billing process.

create table tb_bill_process_log (
    bill_process_log uniqueidentifier not null primary key,
    sess_id uniqueidentifier not null, -- this is the session ID of the process.
    batch_no nvarchar(50) not null, -- this is the batch number.
    customer_id uniqueidentifier not null, -- this is the customer ID.
    process_start_on datetime, -- the process start time.
    process_end_on datetime, -- the process completion time.
    remarks nvarchar(255), -- the remarks set by the long running process.
    created_on datetime -- the record creation date/time.
)

create table tb_bill_process_batch (
    bill_process_batch_id uniqueidentifier not null primary key,
    batch_no nvarchar(50) not null, -- the batch number.
    process_restart_counter int, -- increment this value everytime the process restarted.
    start_on datetime, -- the first process start date/time.
    complete_on datetime, -- the process completion date/time.
    created_on datetime -- the record creation date/time
)

The flow of the long running process:

Upon the long running process starts, you should assign a batch number (stores in batch_no field). If the batch number has been created, increment the tb_bill_process_batch. process_restart_counter value by one.
Then, copy all the customer who is still subscribing the services or the subscription has been terminated in the last month. All these customers will receive the bill/invoice. If possible, do this within a database transaction. If there are too many customers to be bill, you might have to split the customers by group or location. This reduces the time to lock all the necessary resources.
Now, the billing process starts (this is the process of calculating the charges). You need a cursor where it is selecting the customer ID base on the batch_no + process_start_on is null.
After retrieved the first customer ID value from the cursor, set the process_start_on to current date/time and insert the necessary billing records. Upon completion, update the process_end_on to current date/time. In this step, you need a database transaction to ensure that all records are committed at the same scope.
Then, continue with the next customer ID until all records have been processed.
Finally, update tb_bill_process_batch.complete_on field to indicate that the long running process has completed.

That's all you need to design a stoppable long running process. If you think this is not enough and would like to reduce the process time, this is what you can do:

Develop a program in whatever programming language that supports multi-threading and execute the charges calculation stored procedure in multiple threads. To do this, you need to split the long running process stored procedures into a few sub-processes:

The first stored procedure is responsible to generate a new batch number and collect all the customer ID that requires to be bill (step 1 and 2). This stored procedure will be run by 1 thread only.
The second stored procedure is responsible for calculating the charges for each customer (step 3 to 6). This stored procedure will be run in multiple threads. In this stored procedure, you can't loop the cursor anymore. The cursor must be replace with a query that always returns the TOP 1 customer ID by matching the batch_no + process_start_on is null.

Here you go. If you are running 2 threads for the charges calculation stored procedure, the entire process time will be reduce by <50%. Please take note that it won't be exactly 50% of reduction because of the process requires to lock the necessary resources.

Tuesday, July 3, 2012

Tracking the progress of a long run process

Have you ever wonder what is the current progress of an active long run process? The following solution will help you in tracking the long run process in a stored procedure or even C# program.

For example, you have a stored procedure that requires to run at least 30 minutes and the DBA wants to be able to see the current progress.

In the long run process, it must fulfill the following requirements:

The UPDATE/DELETE statement must be affecting minimal of records - this means you should not execute an UPDATE/DELETE statement that affects thousands of record. By minimizing the affected records per UPDATE/DELETE statement, it reduces the resources to be locked and it allows other users to have chance to access it concurrently. This will improve the overall database response time.
The long run process must be breaking down to many sub-processes and report the status back to the main process - all sub-processes must run within a stored procedure whenever possible. This will ease the testing and future maintenance/enhancement. The way of the status returning back to the main process must be standardized. For example, if you have decided to use the OUTPUT parameter, then, all stored procedures must follow the same way.
The transaction scope must be efficient for the number of records affected - in case your process is running mail merge process, you may consider to have a transaction for every 1,000 records. Another example is that if you are running a billing process, you might consider to have a transaction for each bill. Anyway, this requires many round of testing in the development or simulated live environment to confirm what is the appropriate numbers. In real life, the hardware specification, number of concurrent users, the number of records in the output, etc must be put into consideration.

After you have done the above, you will need a new table to store the process log.

CREATE TABLE tb_process_log (
process_log_id UNIQUEIDENTIFIER NOT NULL PRIMARY KEY,
sess_id UNIQUEIDENTIFIER,
seq INT NULL, msg NVARCHAR(MAX) NULL,
remarks NVARCHAR(255) NULL,
created_on DATETIME NULL,
created_by NVARCHAR(255) NOT NULL
);

Table usage

process_log_id - this is the primary key of the table which stores the value generated by NEWID() function.
sess_id - this field stores the session ID of the current process. For example, if two users are running the same process, then, both of them will have different session ID. The value is generated by NEWID() function.
seq - this field stores the sequence number of the log record or the current step in the process. This is optional field.
remarks - this is the special remarks set by the process and it is optional.
created_on - this is the current date/time when the log record is inserted.
created_by - this is the user ID who is executing the stored procedure.

Finally, we are ready to discuss about the process logging:

First thing first, for every process starts, it should insert a log record to indicate that the stored procedure is start running. This is very important because sometimes the stored procedure was not even has chance to run and failed. By having the process start log, you will be able to tell that there is no communication error in between the flow.
Before exiting any stored procedure, it should insert a log record to indicate that the exist is intentional. This is to ensure that the exist is by design or stop at where you are expecting.
Put the current process step details into "msg" field. This allows you to find out what is the runtime value instead of looking for way to re-produce the error or behavior. For example, "Issuing invoice to customer A.." and "Successfully issued an invoice #1234 (amounted $2,000.00) to customer A.". You just you have ensure that the "msg" value contains sufficient information for you to troubleshoot.
As for the "remarks" field, we normally use it as the stored procedure name. But, you may define the way you like since the "remarks" field is not critical.
Make sure that you have sufficient TRY..CATCH block within each stored procedure to track down the unexpected error and also perform cleanup.

Tuesday, June 26, 2012

Stores procedure development guide

Below is some of our guideline in developing stored procedures:

The stored procedure must be test-able & traceable - the PRINT keyword must be use and/or sufficient log records must be created for system administrator to trace the process.

The stored procedure must be able to re-run - we should not expect the stored procedure will be run once. It should be tested with the idea what-if the caller run it for multiple times. What will happen? Another way of thinking is that, if we run the same stored procedure multiple times with the same parameters, does it give the same output?

The stored procedure must be broke down into small piece of process and meet the above guideline - sometimes, it's unbelievable to see a stored procedure which contains 1000+ lines. This is not a good way in developing stored procedure. The best way is to modularize the processes and break it down into small processes. This will ease the testing and also future enhancement.

Stop hard-coding the constants in a stored procedure - some programmer wants to complete their job as soon as possible without really care the future maintenance. The best way to handle constant is to save the value in a table and load it when you need it.

Stop using "flag" parameter to instruct the stored procedure to run different process - you should not use any parameter as a flag to indicate running which process within the stored procedure. For example, flag=1 is to returns customer records and flag=2 is to update customer's outstanding balance. This is very confusing.

Naming convention - don't name the stored procedure the way you like. All stored procedure must follow the agreed naming convention. In fact, the stored procedure name should contains the main module code (as prefix) to ease the future maintenance.

Wednesday, June 20, 2012

Unique value for Primary Key

I used to defined the primary key in integer database until one day that I'm working on web project. I realized that the using integer as the data type for primary key might have some issues.

Issue #1 - passing the primary key in the URL. For example, the primary key value for customer A is "1001". If the user wants to edit the customer A's profile, you might use the query string to contain the primary key value so that the page can query the database and retrieve the appropriate record. This is very bad practice because the user will be able to change the query string and retrieve the customer profile that they want.

Issue #2 - bottleneck of inserting new record. For example, you are running system that requires handling many transactions per seconds. Whether you are using "identity" or generating the next ID value that stores in last ID table, it creates bottleneck while inserting new records. The bottleneck will become even worst if you have long process to be run in the triggers or stored procedure.

Issue #3 - multiple databases. In certain situation, you might have to run the same system in multiple locations and each location has it's own database. These databases requires to be merge into a huge centralized database at later time. Irregardless of the merging interval, by using integer data type in the primary key, you are facing a problem with a question: what is the appropriate range of values to be assign to each location?

After trial and error with integer primary key and nvarchar (i.e., add a "outlet code" field to make the composite key), we found an easier way to solve the above problem. Instead, we are using UNIQUEIDENTIFIER data type (or GUID in .Net framework).

When showing the UNIQUEIDENTIFIER value in the URL query string, it's hard for the user to guess the primary key value for other customer. The bottleneck problem also disappear because UNIQUEIDENTIFIER value can be generated at anytime by calling NEWID (MSSQL) function. Finally, merging the records from various database will not be a problem and you don't have to scratch your head to look for the appropriate range of values to be allocated.

Of course, UNIQUEIDENTIFIER data type comes with some costs. The first one is that it requires more storage space and the next cost is the searching takes more time. But, my argument is that the hard disk price is cheap and will be cheaper in the future. The searching time can be reduce with proper indexing or increase the hardware specification. Another cost is the data fragmentation which slows down the query. This happens when we are using NEWID() function to generate a new random value causing the record to be inserted into somewhere in the table. To solve the fragmentation, you will to rebuild or reorganise the index (you might have to google for the related commands).

In our database design, the primary key for master and child tables are UNIQUEIDENTIFIER data type and it is single field primary key. For the child table, the foreign key is not part of the primary key. The advantage is that it allows the DBA to retrieve any records easily because each record has it's own unique value.

Lastly, in MSSQL, the primary key is a clustered index. This means, inserting new records will cause the MSSQL engine to look for the appropriate page position. Some people will say this is fast and some deny it. Some DBA even say that because of the records were not stored in sorted order, it is more beneficial to the parallelism when reading records from hard disk.

All other potential factors must be consider carefully before implementing this idea.

Technical specification:
Database size that has implemented this idea: < 10GB.
MSSQL Server: 2005/2008.
Environment: Win2003, Win2008, Win XP/7 (even in the virtual PC).

Tuesday, June 5, 2012

Raise an error that the C# can catch

Often, our program execute stored procedure to do some processes. In the event of the data is missing or violate the business rules, the stored procedure should stop the process and return a useful message to the caller.

In order for the C#/ASP.net to be able to catch the exception, you have to pass in the appropriate values into raiserror() function.

For example:

  raiserror('This error will be able to catch by C#/asp.net',
11, 
1)

Just beware that the second parameter should start from 11 or otherwise C#/ASP.net will not be able to catch it.

To capture the current error, please use the following code:

declare @err_msg nvarchar(max),
@err_severity int,
@err_state int

set @err_msg = error_message()
set @err_severity = error_severity()
set @err_state = error_state()

raiserror(@err_msg,@err_severity,@err_state)

Monday, May 28, 2012

Converting the nvarchar value into date/time value

When you are importing the data from CSV file or other data source, you might require to convert the nvarchar(14) into date/time value. We assume that the column value is in a predefined format of 'yyyyMMddhhmmss' (14 characters).

Below function will be very handy in converting those values into date/time type:

create function fn_from_sys_date
(
 @dt_str as nvarchar(14)
)
returns datetime
begin

/*
9-may-12,lhw
-parse the date+time string and returns datetime type.

select 
 dbo.fn_from_sys_date('20120509174210'),
 dbo.fn_from_sys_date('20120509'),
 dbo.fn_from_sys_date('')
 

*/
 declare @result datetime,
  @dd nchar(2),
  @mm nchar(2),
  @yyyy nchar(4),
  @hh nchar(2),
  @minute nchar(2),
  @ss nchar(2)

 -- the value format is yyyymmddhhMMss
 if len(@dt_str) >= 8
 begin
  select 
   @yyyy = left(@dt_str, 4),
   @mm = substring(@dt_str, 5, 2),
   @dd = substring(@dt_str, 7, 2)
 end
 else
 begin
  -- min date.
  select 
   @yyyy = '1753',
   @mm = '01',
   @dd = '01'
 end

 if len(@dt_str) = 14
 begin 
  select
   @hh = substring(@dt_str, 9, 2),
   @minute = substring(@dt_str, 11, 2),
   @ss = substring(@dt_str, 13, 2)
 end
 else
 begin
  select 
   @hh = '00',
   @minute = '00',
   @ss = '00'
 end

 select @result = cast(@yyyy 
    + '-' + @mm 
    + '-' + @dd 
    + ' ' + @hh
    + ':' + @minute
    + ':' + @ss
   as datetime)

 return @result
end

Monday, May 21, 2012

Converting the date/time value into nvarchar

Sometimes, you have to store the data/time value into nvarchar format so that it can be understand by other programs that easier. You need this function to convert the date/time value into nvarchar:

create function fn_to_sys_date
(
 @dt as datetime
)
returns nvarchar(50)
begin

/*
9-may-12,lhw
-format the date value to human readable format. 

select dbo.fn_to_sys_date(getdate())

*/
  declare @result nvarchar(50)
  select @result = replace(convert(nvarchar,@dt,102), 
                           '.', 
                           '')
           + replace(left(convert(nvarchar, @dt, 114), 8), 
                            ':', 
                            '')

 return @result
end

Sunday, May 20, 2012

Getting the records within the given range

For example, you have a voucher table which stores the vouchers old to the customer. Each voucher record store the voucher_no_start and voucher_no_end in INT data type (i.e., this means a voucher record is a booklet but not one voucher).

In order for you to check within the voucher booklet has been sold or not (if sold, there will be a record in the tb_voucher), must be able to tell that the start/end number key in by the user falls within any existing records or not.

To do this, you will have to pass in the @start/@end value(of course, you must modify the following statement into stored procedure).

declare @start int,
 @end int

select @start = 300,
 @end = 450

select 
  voucher_no_start, voucher_no_end
from 
  tb_voucher
where 
  (voucher_no_start <= @end or voucher_no_end < @start)
  and @start <= voucher_no_end

If the above query returns some rows, it means that the values between @start and @end already exist in the database.