Tuesday, July 31, 2012

Passing the report parameters into stored procedure

When you are designing a screen which allows the user to choose the period, multiple customer types, etc, you might have a hard time in passing the values to the stored procedure. The common way to achieve this by having a nvarchar(max) parameter type to cater for the multiple customer types. The simplest way to handle multiple values passing by the caller is splitting the incoming string into table variable.

For example, the user want to get the sales report for the selected customer types (corporate customers, travel agents and walk-in customers) for year 2011. Then, you might have something similar as shown below:

exec pr_rpt_sales_by_cust_type
  @start_date = '2011-01-01',
  @end_date = '2011-12-31',
  @customer_type = 'corporate, travel agent, walkin customer'

Assuming that we are using the display text to filter the data to ease the explanation. In real live, you might have to pass in the foreign key value.

Also, you need another function or stored procedure to parse the @customer_type parameter into table variable which you might have to do some research. ;)

Ok. Back to this topic. The select statement will be something like this:

select customer_type, sum(sales_amount) as total_sales_amount
from vw_sales
where transaction_date between @start_date and @end_date
  and (customer_type in (
      select customer_type
      from dbo.fn_parse_param_to_table(@customer_type))
    or @customer_type = ''
  )
group by customer_type

Note: "fn_parse_param_to_table" is the function that you have to develop it.

Tuesday, July 24, 2012

You need an application log for your systems

All systems requires a common area to keep track certain information. For example,
  • The exception/error which was not handled properly - this eases the system maintenance.
  • The user login and logout - you may use this log to keep track all security related activities. For example, append a "warning" log when someone is trying to hack the system especially the web base system.
  • When certain system settings were missing and the system continues to work using default value - this is very important if you are distributing your system to more than one site. For example, the system administrator forgot to set the new sales email address.
  • When user clicks on certain function which handles using stored procedure in the database - this is important because the user might already have clicked it multiple times and the system like no-effect. By creating log record into this table, the system administrator will be able to tell whether the stored procedure is running or not and carry out necessary recovery.
Table design:

CREATE TABLE tb_log (
    log_id uniqueidentifier NOT NULL,        --primary key
    log_type_id INT NOT NULL,                --type of log.
    workstation NVARCHAR(255) NOT NULL,        --the workstation name
    uid NVARCHAR(255) NOT NULL,                --the current user ID.
    msg NVARCHAR(max) NULL,                    --the details of the log
    remarks NVARCHAR(255) NULL,                --the remarks/procedure name.
    is_sent INT NOT NULL,          --if '1', means the log has been email to admin.
    app_id INT NOT NULL,                    --the application ID
    module_id INT NOT NULL,                    --the module ID
    modified_on DATETIME NOT NULL,            --the last updated date/time
    modified_by NVARCHAR(255) NOT NULL,    --the person who perform last update.
    created_on DATETIME NOT NULL,     --this is the current date/time when the log record is inserted.
    created_by NVARCHAR(255) NOT NULL        --the current user ID while creating this record.
);

The allowed values for the log_type_id field:
  1. Error - this indicates that the system crashed at runtime and the value in the msg field is the exception details.
  2. Information - this indicates that some processes have been kicked off and running smoothly.
  3. Warning - this indicates that some certain system settings were missing or using default value but the system is still able to function normally.
  4. Audit Log - this indicates that the log is security related or certain important data has been changed by the user.
  5. Critical - this indicates the system crash is critical or someone is trying to figuring out the login ID or password and requires immediate attention.
Add-on module to be build on top of this log table:
  • Email the log records to the system administrator (and mark "is_sent" with value of "1"). The email can be send in daily or weekly basis.
  • If the log_type_id indicates that it is a "threat" or "hacking", the log record must be send immediately to the system administrator.
Note: for process logging, please refers to "Tracking the progress of a long run process" article.


Tuesday, July 17, 2012

Designing a custom mail process

Another important feature in every system is the email notification. The email notification is useful for the following situations:
  • Workflow such as the application that requires approval;
  • Notifying the system administrator on the current system status (such as server health report);
  • Sending daily or weekly reports to the user;
  • An error has happened;
  • System crash report;
  • etc.
The email notification does not limit to the above. You may apply this concept into the process logging feature as mentioned in the earlier blog or any other situation that requires user's attention.

Below is the table to store the email record. But, we are not using the MSSQL email notification feature to send the email out. You have to rely on external program to load the email contents and send it out. This gives you a choice of implementing this idea in any database engine.

CREATE TABLE tb_mail (
    mail_id UNIQUEIDENTIFIER NOT NULL primary key, -- stores the value of NEWID().
    send_to_email NVARCHAR(MAX) NULL,  -- stores the recipient email address.
    cc_to_email NVARCHAR(MAX) NULL,  -- stores the CC of the current email.
    subject NVARCHAR(255) NULL,  -- stores the email subject.
    body_text NVARCHAR(MAX) NULL,  -- stores the email message.
    attach_file NVARCHAR(MAX) NULL,  -- stores the attachment file name.
    created_on DATETIME NULL,  -- the record creation date/time.
    sent_status_id INT NULL,  -- the email sending status. '0'-pending, '1'-sent & '2'-failed to send
    sent_on DATETIME NULL,  -- the date/time of sending the email.
    mail_type_id INT NULL  -- the email type (optional)
);

In the module that is responsible for sending the email, you must implement the following feature:
  • Able to send all the emails (sent_status_id = 0) out and
  • Able to send a specific email out upon request (which could be trigger through a TCP port or WCF).

Tuesday, July 10, 2012

Designing a stoppable long running process

Some processes are running at least 30 minutes. The question is how to design the long running process to be stoppable and upon restart it continues from where it left in the previous process? This is an interesting question. Unfortunately, most of the long running processes that I have heard are not stoppable and requires manual patching the 'corrupted' output before re-run the process.

For example, the process requires to issue monthly bill/invoice on the web hosting. This web hosting company has more than 10,000 customers and each of the customer are subscribing different services.

Reason for stopping the long run process:
  • The hard disk is running out of space and the process has stopped.
  • There are some new subscriptions to be queued into the bill/invoice.
  • Server has to be restarted due to OS (Operating System) updates or running out of memory.
  • For whatever unforeseen reason.
First thing first, we need a table to store the records to be process. You might have to customize this table so that it can cater for your need. The batch_no field is a text field which you can design it to store multi-locations or multi-companies billing process.

create table tb_bill_process_log (
    bill_process_log uniqueidentifier not null primary key,
    sess_id uniqueidentifier not null,  -- this is the session ID of the process.
    batch_no nvarchar(50) not null,  -- this is the batch number.
    customer_id uniqueidentifier not null,  -- this is the customer ID.
    process_start_on datetime,  -- the process start time.
    process_end_on datetime,  -- the process completion time.
    remarks nvarchar(255),  -- the remarks set by the long running process.
    created_on datetime  -- the record creation date/time.
)

create table tb_bill_process_batch (
    bill_process_batch_id uniqueidentifier not null primary key,
    batch_no nvarchar(50) not null,  -- the batch number.
    process_restart_counter int,  -- increment this value everytime the process restarted.
    start_on datetime,  -- the first process start date/time.
    complete_on datetime, -- the process completion date/time.
    created_on datetime  -- the record creation date/time
)

The flow of the long running process:
  1. Upon the long running process starts, you should assign a batch number (stores in batch_no field). If the batch number has been created, increment the tb_bill_process_batch. process_restart_counter value by one.
  2. Then, copy all the customer who is still subscribing the services or the subscription has been terminated in the last month. All these customers will receive the bill/invoice. If possible, do this within a database transaction. If there are too many customers to be bill, you might have to split the customers by group or location. This reduces the time to lock all the necessary resources.
  3. Now, the billing process starts (this is the process of calculating the charges). You need a cursor where it is selecting the customer ID base on the batch_no + process_start_on is null.
  4. After retrieved the first customer ID value from the cursor, set the process_start_on to current date/time and insert the necessary billing records. Upon completion, update the process_end_on to current date/time. In this step, you need a database transaction to ensure that all records are committed at the same scope.
  5. Then, continue with the next customer ID until all records have been processed.
  6. Finally, update tb_bill_process_batch.complete_on field to indicate that the long running process has completed.
That's all you need to design a stoppable long running process. If you think this is not enough and would like to reduce the process time, this is what you can do:

Develop a program in whatever programming language that supports multi-threading and execute the charges calculation stored procedure in multiple threads. To do this, you need to split the long running process stored procedures into a few sub-processes:
  • The first stored procedure is responsible to generate a new batch number and collect all the customer ID that requires to be bill (step 1 and 2). This stored procedure will be run by 1 thread only.
  • The second stored procedure is responsible for calculating the charges for each customer (step 3 to 6). This stored procedure will be run in multiple threads. In this stored procedure, you can't loop the cursor anymore. The cursor must be replace with a query that always returns the TOP 1 customer ID by matching the batch_no + process_start_on is null.
Here you go. If you are running 2 threads for the charges calculation stored procedure, the entire process time will be reduce by <50%. Please take note that it won't be exactly 50% of reduction because of the process requires to lock the necessary resources.

Tuesday, July 3, 2012

Tracking the progress of a long run process

Have you ever wonder what is the current progress of an active long run process? The following solution will help you in tracking the long run process in a stored procedure or even C# program.

For example, you have a stored procedure that requires to run at least 30 minutes and the DBA wants to be able to see the current progress.

In the long run process, it must fulfill the following requirements:
  1. The UPDATE/DELETE statement must be affecting minimal of records - this means you should not execute an UPDATE/DELETE statement that affects thousands of record. By minimizing the affected records per UPDATE/DELETE statement, it reduces the resources to be locked and it allows other users to have chance to access it concurrently. This will improve the overall database response time.
  2. The long run process must be breaking down to many sub-processes and report the status back to the main process - all sub-processes must run within a stored procedure whenever possible. This will ease the testing and future maintenance/enhancement. The way of the status returning back to the main process must be standardized. For example, if you have decided to use the OUTPUT parameter, then, all stored procedures must follow the same way.
  3. The transaction scope must be efficient for the number of records affected - in case your process is running mail merge process, you may consider to have a transaction for every 1,000 records. Another example is that if you are running a billing process, you might consider to have a transaction for each bill. Anyway, this requires many round of testing in the development or simulated live environment to confirm what is the appropriate numbers. In real life, the hardware specification, number of concurrent users, the number of records in the output, etc must be put into consideration.

After you have done the above, you will need a new table to store the process log.

CREATE TABLE tb_process_log (
  process_log_id UNIQUEIDENTIFIER NOT NULL PRIMARY KEY,
  sess_id UNIQUEIDENTIFIER,
  seq INT NULL, msg NVARCHAR(MAX) NULL,
  remarks NVARCHAR(255) NULL,
  created_on DATETIME NULL,
  created_by NVARCHAR(255) NOT NULL
);

Table usage
  • process_log_id - this is the primary key of the table which stores the value generated by NEWID() function.
  • sess_id - this field stores the session ID of the current process. For example, if two users are running the same process, then, both of them will have different session ID. The value is generated by NEWID() function.
  • seq - this field stores the sequence number of the log record or the current step in the process. This is optional field.
  • remarks - this is the special remarks set by the process and it is optional.
  • created_on - this is the current date/time when the log record is inserted.
  • created_by - this is the user ID who is executing the stored procedure.
 Finally, we are ready to discuss about the process logging:
  1. First thing first, for every process starts, it should insert a log record to indicate that the stored procedure is start running. This is very important because sometimes the stored procedure was not even has chance to run and failed. By having the process start log, you will be able to tell that there is no communication error in between the flow.
  2. Before exiting any stored procedure, it should insert a log record to indicate that the exist is intentional. This is to ensure that the exist is by design or stop at where you are expecting.
  3. Put the current process step details into "msg" field. This allows you to find out what is the runtime value instead of looking for way to re-produce the error or behavior. For example, "Issuing invoice to customer A.." and "Successfully issued an invoice #1234 (amounted $2,000.00) to customer A.". You just you have ensure that the "msg" value contains sufficient information for you to troubleshoot.
  4. As for the "remarks" field, we normally use it as the stored procedure name. But, you may define the way you like since the "remarks" field is not critical.
  5. Make sure that you have sufficient TRY..CATCH block within each stored procedure to track down the unexpected error and also perform cleanup.