ASP.Net, .Net, BizTalk, Tech and Life - Francois Malgreve - Bangkok

Friday, August 8, 2008

How to purge BizTalkMsgDb?

If the BizTalk Message Box database (BizTalkMsgDb) grows too large, performance of the database subsystem will reduce. As a rule of thumb, the BizTalkMsgDb should never grow larger than 5 Gb for large systems with long running transactions. For high-volume environment with no long running transactions, the BizTalkMsgDb size should be much smaller than 5Gb.

Would your message box have already grown too large and the system become unresponsive, you might want to clean up the BizTalkMsgDb database manually but keep in mind that:
  • The BizTalk Server must be taken down during the procedure.
  • All existing messages will be purged and lost.

There is an article in the MSDN documentation that explains in details how to manually purge data from the Message Box Database.

To make a long story short, here is step by step summary:
  • Stop ALL BizTalk service host instances from the Services console.
  • Restart IIS by running IISRESET from the command prompt if you are running any adapters in isolated hosts (for example HTTP, SOAP, or WCF).
  • Execute the stored procedure bts_CleanupMsgbox on your message box database. If the stored procedure does not exist, create it by running the sql script “msgbox_cleanup_logic.sql” found in the directory “<BizTalk installation directory>\Schema\”.
    Important note: If you are running BizTalk Server 2006, you first have to get an updated version of the Msgbox_cleanup_logic.sql file to (re)create the bts_CleanupMsgbox stored procedure at http://support.microsoft.com/kb/924715. The hotfix is available for download directly without having to contact MS. If you are running BizTalk Server 2006 R2 then the updated version of the Msgbox_cleanup_logic.sql file is already installed and you do not need to download the hotfix.
  • Execute the stored procedure bts_PurgeSubscriptions on your message box database. If the stored procedure does not exist, you can create it by inspecting the sql script “msgboxlogic.sql” found in the directory “<BizTalk installation directory>\Schema\”. Copy paste only the part of the script that creates this specific stored procedure and run it.

As deleting data in a database does not reduce the size the database files on the disk; you need to shrink the database files if you want to reduce its physical size. There are 2 simple ways to do it:
  • Through SQL Server Management Studio, right click on the BizTalkMsgDb database, click on Tasks > Shrink > Database
  • Through T-SQL, using the DBCC SHRINKDATABASE command: DBCC SHRINKDATABASE (BizTalkDTADb);

There is also another useful command to truncate the database logfile, would it be necessary:
BACKUP LOG BizTalkDTADb WITH TRUNCATE_ONLY

Microsoft does not support this procedure on production systems simply because this will purge production data. So, as long as you keep in mind that all existing messages will be purged, it is safe to run it on a production environment.

Labels: ,

Thursday, July 31, 2008

How to Handle Unhandled Exception

When an unhandled exception occurs in the .Net Framework run-time 1.0 & 1.1, the Windows Process will terminate only if the exception occurred on the main application thread. Would an unhandled exception occur in a thread other than the main application thread, it will be caught by the runtime and therefore NOT cause the application to terminate. This means that unhandled exceptions in child threads disappear silently without anyone being able to know about it, implying that bugs are silently swallowed by the runtime without anyone being notified about it. The dangerous scenario is that a user might believe that a unit of work has executed successfully while it is actually not the case. This is in my opinion something not desirable and has been addressed in later version of the .Net Framework.

Indeed, from the .Net Framework 2.0 onwards, unhandled exceptions occurring on ANY thread (thus including any child threads being background threads or not) shuts down the application running the particular thread.

When the runtime terminates an application because of an unhandled exception, it writes an entry in the Windows Event Log which looks like the following:

EventType clr20r3, P1 processname, P2 1.0.0.0, P3 485f85f0, P4 system, P5 2.0.0.0, P6 471ebf0d, P7 3832, P8 bf, P9 system.componentmodel.win32, P10 NIL.

“processname” being the name of the .Net executable.


1. How to resolve unhandled exceptions? The System.AppDomain.UnhandledException event.

If an unhandled exception occurs in a windows application (multi-threaded or not), a windows service named winservice.exe for example, the runtime will terminate the service and write an Event Log entry such as:
“EventType clr20r3, P1 winservice.exe, P2 1.0.0.0, P3 485f85f0, P4 system, P5 2.0.0.0, P6 471ebf0d, P7 3832, P8 bf, P9 system.componentmodel.win32, P10 NIL.”

While it is good to be notified that a service crashed, the information supplied is rather cryptic and has very little added value. It would be nicer to have a way to log unhandled exceptions with a more meaningful message so that it is possible to get enough information to be able to fix the source code and re-deploy the application.

Conveniently, the .Net Framework provides the System.AppDomain.UnhandledException event. This event fires whenever an application domain unloads because of an unhandled exception. Note that registering for the event does not cause unhandled exceptions to be "handled". It simply notifies you that an exception has gone unhandled, so that it is possible to take some action such as logging the exception message and stack trace and eventually do some clean up before the application dies. The Exception is still unhandled and so still causes the application to shut down.

The discussion in this article is not a solution on how to handle exceptions. Exception handling is a strategy on how to handle exceptions and is a different discussion. This article explains how to act with unhandled exceptions so that details about the exception can be logged so that a proper solution can be found to resolve the issue/bug.

Ideally, all exceptions should be handled in the source code with try-catch-finally blocks so that unhandled exceptions do not occur. Nevertheless, there are always unforeseen cases and developer mistakes which, in my opinion, justify caring for unhandled exceptions across the board.

Here is an example on how to register for the event with an event handler that log exception details in the Windows Event Log:

// This event provides notification of uncaught exceptions. Write this in the entry point of your program, like in the OnStart() method of a Windows Service.
AppDomain currentDomain = AppDomain.CurrentDomain;
currentDomain.UnhandledException += new UnhandledExceptionEventHandler(UnhandledExceptionHandler);
static void UnhandledExceptionHandler(object sender, UnhandledExceptionEventArgs args)
{
Exception ex = (Exception)args.ExceptionObject;
EventLog.WriteEntry("WinService.exe", "Unhandled Exception caught: " + ex.Message + ex.StackTrace, EventLogEntryType.Error);
}


2. Note for Windows Form Applications - Application.ThreadException event.

2.1 Application.ThreadException event

For Windows Form Applications, there is another event that is raised when an unhandled exception occurs, the System.Windows.Forms.Application.ThreadException. Nevertheless, this event fires only when unhandled exceptions happen in Window Forms threads (UI threads). This means when an exception is thrown from code that was ultimately called as a result of a Windows Message. Windows Messages are emitted by keyboard hits, mouse clicks or "paint" messages,... in short, nearly all code in a typical Windows Forms application.

While this works perfectly, it lulls one into a false sense of security that all exceptions will be caught by the central exception handler:
  • Exceptions thrown on worker threads are a good example of exceptions not caught by Application.ThreadException.
  • Exceptions thrown by the code inside the Main method of the Windows Forms application, including the main form's constructor, executes before the Windows message loop begins and so is another example of exceptions that do not raise the Application.ThreadException event.

In this case, we said the Application.ThreadException event handler to be a “central exception handler” because it is still possible for the application to keep running when this event is raised, depending on what kind of logic is implemented in the handler.

As a reminder, Worker Threads are threads:
  • Created manually: Thread.Start()
  • Created by the ThreadPool: ThreadPool.QueueUserWorkItem()
  • Created by any kind of asynchronous call which internally uses a thread pool thread to execute: Delegate.BeginInvoke(), BeginXXX()

One must attach a handler to the Application.ThreadException event before instantiating the main form of the application by calling Application.Run(). Also, because this is a static event, you must detach the event handler(s) when the application is disposed or memory leaks will result.

The Application.ThreadException as a default event handler which behaves in the following way:
  • If an unhandled exception occurs in the main application thread, the default exception handler catches it and terminates the application.
  • If an exception occurs in a thread other than the main application thread, the thread exits, but the application continues to run.


2.2 Application.SetUnhandledExceptionMode

It is possible to instruct the application whether it should catch all unhandled exceptions thrown by Windows Forms components and terminate the application, or whether it should raise an event so that an event handler can be implemented; the event handler could halt execution and expose the unhandled exception to the user. This is setting is done by using the application configuration file or the Application.SetUnhandledExceptionMode() method.
It is possible to instruct the application whether it should catch all unhandled exceptions thrown by Windows Forms components and terminate the application, or whether it should raise an event so that an event handler can be implemented; the event handler could halt execution and expose the unhandled exception to the user. This is setting is done by using the application configuration file or the Application.SetUnhandledExceptionMode() method.
  • UnhandledExceptionMode.ThrowException never route exceptions to the Application.ThreadException event handler and so the default event handler will terminate the application when an unhandled exception occurs as explained earlier.
  • UnhandledExceptionMode.CatchException always route exceptions to the Application.ThreadException event handler.

Again, as a reminder, the Application.ThreadException event handler is only for unhandled exception occurring on UI threads and so the SetUnhandledExceptionMode() method affects only the way unhandled exceptions coming from UI threads are treated, it does not affect how non UI threads unhandled exceptions are treated. System.AppDomain.UnhandledException event handlers will always be called when non UI threads unhandled exceptions occur.


2.3 Sample

Hereunder is a code sample on how to register to the Application.ThreadException event in a Windows Form Application. As said earlier the event handler will only be called for unhandled exceptions occurring on the UI thread. Thus, the code sample also has an event handler for unhandled exception on non-UI threads by registering to the System.AppDomain.UnhandledException event:

static class Program
{
///
/// The main entry point for the application.
///

[STAThread]
static void Main()
{
// Add the event handler for handling UI thread exceptions to the event.
Application.ThreadException += new ThreadExceptionEventHandler(Form1_UIThreadException);

// Set the unhandled exception mode to force all Windows Forms errors on UI threads
// to go through our handler regardless of application settings.
Application.SetUnhandledExceptionMode(UnhandledExceptionMode.CatchException);

// Add the event handler for handling non-UI thread exceptions to the event.
AppDomain.CurrentDomain.UnhandledException +=
new UnhandledExceptionEventHandler(CurrentDomain_UnhandledException);

// Default generated code by Visual Studio for WinForms
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
Application.Run(new Form1());
}
}

You can find a zip file here containing a complete Visual Studio 2008 solution demonstrating both type of unhandled exception event handler and their effect.


3. Note for ASP.NET applications - Application_Error in Global.asax.

Would an unhandled exception occurs in an ASP.NET application on a worker thread, the runtime will terminate the worker process (w3wpp.exe) and write an Event Log entry such as:
“EventType clr20r3, P1 w3wp.exe, P2 6.0.3790.1830, P3 42435be1, P4 app_web_7437ep-9, P5 0.0.0.0, P6 433b1670, P7 9, P8 a, P9 system.exception, P10 NIL.”

In ASP.Net, there is an Application_Error method in the Global.asax file that can be implemented so that an action can be taken when an unhandled exception occurs (logging the exception details, for example). Again, treating an unhandled exception in such a way is not handling an exception (that should be done in the try-catch-finally block in the code), the web application will still crash and display an error message on the browser but it lets the programmer have a way to log information about the exception and eventually do some other clean-up tasks.

Nevertheless, similarly to Windows Forms Application, the Application_Error will be called only for unhandled exception that occurred on the main thread. Unhandled exceptions occurring on worker threads will not be caught by the Application_Error event handler in the global.asax file. If we want to be notified of unhandled exception occurring on worker threads, we need to register an event handler to the System.AppDomain.UnhandledException event, as we did for Windows Services and Forms Applications. The event handler should at least log the exception and eventually do some clean tasks.

Microsoft has a KB on how to implement an event handler for the System.AppDomain.UnhandledException event within an HTTPModule.

You can find here a zip file with a Visual Studio 2008 solution using the HTTPModule defined in the KB. I registered the HTTP module in the web.config the following way:

<system.web>
<httpModules>
<add name="UnhandledExceptionModule" type="WebMonitor.UnhandledExceptionModule, UnhandledExceptionModule, Version=1.0.0.0,
Culture=neutral, PublicKeyToken=51d169497dc4a81e, processorArchitecture=MSIL"/>
</httpModules>
</system.web>

The only non trivial attribute is the “PublicKeyToken”. The Public Key Token is a 64-bit hash of the public key corresponding to the private key used to sign the assembly. A .Net assembly is said to have a strong name when it is signed.

There are 2 ways to retrieve the Public Key Token from a strongly named assembly:
  • The lazy way is to add the assembly in the GAC and go to C:\WINDOWS\assembly, one of the column showed in Windows Explorer is the Public Key Token. Only strongly named assemblies can be stored in the GAC.
  • Run the command line “sn -T UnhandledExceptionModule.dll” which prints the Public Key token of a strongly named assembly.

Contrary to the instructions given in the KB, I did not GAC either NGEN the HTTP module; I just referenced the HTTPModule project from the web project so that the HTTP module library is automatically copied to the web application bin folder. All assemblies in the bin folder are automatically found by the run-time so that is not necessary to GAC all third party libraries.

Labels: ,

Thursday, June 26, 2008

SQL Server and .Net Data Type mapping

Today I would like to post a table listing the corresponding .Net Type to use with each SQL Data Type. This comes in handy when writing ADO.NET code.

The following link lists Microsoft SQL Server data types and their equivalent data type in .Net.
To be more exact it lists:
  • Every SQL Server data types
  • Their equivalent in the Common Language Runtime (CLR) for SQL Server in the System.Data.SqlTypes namespace
  • Their native CLR equivalents in the Microsoft .NET Framework

SQL Server Data Types and Their .NET Framework Equivalents

I have also copy-pasted the SQL Server and .Net Data Type mapping table hereunder for my own convenience:

SQL Server data typeCLR data type (SQL Server)CLR data type (.NET Framework)
varbinary SqlBytes, SqlBinary Byte[]
binary SqlBytes, SqlBinary Byte[]
varbinary(1), binary(1) SqlBytes, SqlBinary byte, Byte[]
image None None
varchar None None
char None None
nvarchar(1), nchar(1) SqlChars, SqlString Char, String, Char[]
nvarchar SqlChars, SqlString SQLChars is a better match for data transfer and access, and SQLString is a better match for performing String operations. String, Char[]
nchar SqlChars, SqlString String, Char[]
text None None
ntext None None
uniqueidentifier SqlGuid Guid
rowversion None Byte[]
bit SqlBoolean Boolean
tinyint SqlByte Byte
smallint SqlInt16 Int16 (short in C#)
int SqlInt32 Int32 (int in C#)
bigint SqlInt64 Int64 (long in C#)
smallmoney SqlMoney Decimal (decimal in C#)
money SqlMoney Decimal (decimal in C#)
numeric SqlDecimal Decimal (decimal in C#)
decimal SqlDecimal Decimal (decimal in C#)
real SqlSingle Single (float in C#)
float SqlDouble Double (double in C#)
smalldatetime SqlDateTime DateTime
datetime SqlDateTime DateTime
sql_variant None Object
User-defined type(UDT) None Same class that is bound to the user-defined type in the same assembly or a dependent assembly.
table None None
cursor None None
timestamp None None
xml SqlXml None


Regarding CLR Data Types, Data Types are basically divided in 3 categories:

Furthermore, 3 types of CLR Data Type structures exist to hold numeric values:

See Data Types (C# Reference) for a complete reference and correspondance between CLR types and C# types.

Thursday, June 19, 2008

T-SQL Weighted Round Robin Algorithm

In a previous post I was discussing of ways to load balance MSMQ messages to a group of servers by implementing a Weighted Round Robin Algorithm in a software module.

I chose to implement the load balancing feature in the database for 2 main reasons:
  • The application is heavily database driven; stored procedures have to run to retrieve the data that will be used to build each MSMQ message. Therefore, the cost of an additional statement or stored procedure to retrieve the MSMQ path the message should be sent to is much less than the existing cost.
  • All the application configuration information is also database driven. This is because our organization uses custom tools for administrators so that they can configure servers and applications centrally. In the case at hand, they need to be able to modify queue names, machine relative weights as well as to add and remove machines from the load balanced group of server (the cluster).

The Weighted Round Robin algorithm is implemented as a stored procedure returning the next MSMQ path a message should be sent to. The stored procedure could be bundled or merged with the other stored procedures that have to run before each MSMQ message is sent, so that it would only be one more parameter coming back from the database. This parameter will tell the .Net code to which queue the current MSMQ message build has to be sent to.


T-SQL Weighted Round Robin Algorithm Implementation.

Each machine of a cluster should receive a percentage of messages which is relative to its weight:
%age = ((weight of machine) / (sum of weights for all the machines in the cluster)) * 100
Moreover, the distribution of messages to each machine should be function of the weight in real time, not only in average.

To satisfy these conditions, one way to implement the weighted round robin algorithm and which suits well T-SQL capabilities is to calculate a ratio between the relative number of messages sent to a particular machine of a cluster and the relative weight of that machine within the cluster. The machine having the higher ratio will be the next machine a MSMQ message should be sent to in respect with its relative weight. An ORDER BY clause will easily implement that in T-SQL.

First of all, we need to create a table where we will hold the different server parameters:
Column nameData typePurpose
server_name[nvarchar](20)Machine name
cluster[char](10)Cluster name in which the machine belongs to
queue_path[nvarchar](100)MSMQ path
number_requests_sent[float]Number of request sent to this machine so far
weight_factor[float]Weight factor of this machine
enabled_status[bit]Says if the machine is active or inactive in the cluster

Here is the SQL script to create the table:

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[msmq_machines](

[server_name] [nvarchar](20) NOT NULL,

[cluster] [char](10) NOT NULL,

[queue_path] [nvarchar](100) NOT NULL,

[number_requests_sent] [float] NOT NULL CONSTRAINT [DF_msmq_machines_number_requests_sent] DEFAULT ((0)),

[weight_factor] [float] NOT NULL CONSTRAINT [DF_msmq_machines_weight_factor] DEFAULT ((100)),

[enabled_status] [bit] NOT NULL,

CONSTRAINT [PK_msmq_machines] PRIMARY KEY CLUSTERED

(

[server_name] ASC,

[cluster] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]


GO

SET ANSI_PADDING OFF



Second we need to create a Stored Procedure that will actually implement the algorithm:

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE PROCEDURE [dbo].[GetLoadBalancedQueueName]

@cluster as char(10),

@serverName as nvarchar(20) out,

@queuePath as nvarchar(100) out

AS

DECLARE @totalClusterRequestSent as float

DECLARE @totalWeight as float

SELECT @totalClusterRequestSent = sum(number_requests_sent)+1, @totalWeight = sum(weight_factor)

FROM dbo.msmq_machines

WHERE cluster = @cluster

AND enabled_status = 1

SELECT top 1 @serverName = server_name, @queuePath = queue_path

FROM dbo.msmq_machines as A

WHERE A.cluster = @cluster

AND A.enabled_status = 1

order by (number_requests_sent/@totalClusterRequestSent)/(weight_factor/@totalWeight), weight_factor desc

UPDATE dbo.msmq_machines

SET number_requests_sent = number_requests_sent + 1

WHERE server_name = @serverName

AND cluster = @cluster

Note on the T-SQL implementation:
  • In the definition of the table msmq_machines, I have chosen data types to be float instead of integers (as they actually will hold only integers values) so that no data type conversion is needed when calculating ratios.
  • @totalClusterRequestSent is calculated as the sum of number_requests_sent + 1 so that if the sum of number_requests_sent is 0 (zero), no division by zero error occurs.

In addition of this, an SQL job will have to run on a regular basis to re-initialize the number_requests_sent column to 0 (zero) so that the value never overflows the data type.


T-SQL weighted Round Robin algorithm Testing

We can test if the algorithm works as expected by creating 3 machines in the msmq_machines table and give them different weights:
MSMQ Load Balancing Weighted Round Robin Configuration Table
If we send 100 messages to the cluster “Bangkok”:
  • DEV1 should receive (100 / 350) * 100 = 28.57 -> 29 messages
  • DEV2 should receive (200 / 350) * 100 = 57.14 -> 57 messages
  • DEV3 should receive (50 / 350) * 100 = 14.28 -> 14 messages


We can create an SQL script that calls the Stored Procedure 100 times and see if the result is what is expected.

declare @counter int

set @counter = 0

while @counter < 100

begin

set @counter = @counter + 1

DECLARE @server as nvarchar(20)

DECLARE @msmq as nvarchar(100)

EXEC [dbo].[GetLoadBalancedQueueName] 'Bangkok' , @serverName = @server out, @queuePath = @msmq out

print @server

print @msmq

end


Opening the msmq_machines table, we can see that the expected number of messages has been sent to each server of the cluster proving that the implementation works. MSMQ Load Balancing Weighted Round Robin Configuration Table Result

Labels:

Tuesday, June 17, 2008

Load Balancing MSMQ messages

I am currently writing a database-driven .Net application which needs to send MSMQ messages load balanced across a variable number of machines.

The case scenario is the following:

A server needs to send MSMQ messages to servers A, B and C (we choose 3 MSMQ message recipients for the purpose of this example). The original idea was to put a network load balancer (NLB) between the machine sending MSMQ messages and the recipients.

MSMQ Messages Load Balancing with Network Load Balancer (NLB)
If the "MSMQ Sender" machine pictured above sends 75 MSMQ messages, the goal is for machines A, B and C to receive 25 messages each. As we know that a network load balancer distributes load on a connection basis, we were hoping that messages would be load-balanced if the .Net code was creating a new System.Messaging.MessageQueue object for each MSMQ messages sent to the NLB. To be more precise, we were hoping that a new connection to the NLB would be created each time we send a MSMQ message with a new instance of the MessageQueue object.

This was pure speculation and a quick test proved that it did not hold; using a simple network load balancer, all the messages were pushed to a single destination server. This happens because connections are re-used within the MSMQ Windows Service, regardless of how the .Net code is written (as a reminder, the .Net class MessageQueue is just a wrapper around the MSMQ Windows service).

Because traffic is load-balanced on a connection basis (not on a message basis) and because the same TCP connection is re-used by the MSMQ service, the NLB forwards all the traffic to the same destination machine. Would the server send 100 MSMQ messages to the NLB, 100 messages would be forwarded to the same target machine as they are all sent using the same underlying TCP connection.

As we have very little control over the way connections are managed within the MSMQ Windows Service, we had to part away with this simple NLB solution and implement an ad-hoc solution.
We chose to implement the load balancing feature in the .Net application itself. The only addition needed to the existing software is a way to configure it so that it can send MSMQ messages to different queue paths. A simple isolated class, module or piece of code could easily do that.

Server Load Balancing: Algorithms

Before choosing how to implement the solution, let’s have a look at different ways (algorithms) on how to implement load balancing.

  • Random Allocation
    In a random allocation, the traffic (MSMQ messages in our case) is assigned to any server picked randomly among the group of destination servers. In such a case, one of the server may be assigned many more requests to process while the other servers are sitting idle. However, on average, each server gets an approximately equal share of the load due to the random selection.
    Pros: Simple to implement.
    Cons: Can lead to overloading of one server or more while under-utilization of others.
  • Round-Robin Allocation
    In a round-robin algorithm, the traffic is sent to the destination server on a rotating basis. The first request is allocated to a server picked randomly from the group of destination server. For subsequent requests, the algorithm follows the circular order destination servers are listed. Once a server is assigned a request, the server is moved to the end of the list and the next server is chosen for the following request. This keeps all the servers equally assigned.
    Pros: Better than random allocation because the requests are equally divided among the available servers in an orderly fashion.
    Cons: The round robin algorithm is not good enough for load balancing if technical specification of the servers part of the destination group differs greatly (making that the load each server can handle differs greatly).
  • Weighted Round-Robin Allocation
    Weighted Round-Robin is an advanced version of the Round-Robin that takes in account server capability. In case of a weighted round-robin, one can assign a weight to each server in the destination group. For example, if the server group consists of 2 servers and that one server is capable of handling twice as much load as the other, the powerful server gets twice the weight factor. In such a case, the application would assign two requests to the powerful server for each request assigned to the weaker one. In effect, a server with more weight will receive load proportionally to their weight factor.
    Pros: Takes care of the capacity of the servers in the group.
    Cons: Does not consider advanced load balancing requirements such as processing time for each individual request.

We have chosen the Weighted Round Robin algorithm as it is the most advantageous load balancing algorithm to implement easily.
We have chosen to implement the algorithm in T-SQL, but it could be easily implemented in a singleton class in any language such as C# or Java. I will explain why and how in a next blog post: T-SQL Weighted Round Robin Algorithm.

Labels: ,

Friday, May 23, 2008

SQL Server Execution Plan Tutorial

Here is a great article written by Grant Fritchey on the basics of analyzing execution plans on SQL Server. It is actually the first chapter of his book on the topic. Yes, a whole book on SQL Server Execution Plans! I haven’t read the whole book but it should probably reveal itself very interesting for DBAs.

Most of us are familiar with the graphical representation of execution plans but Grant Fritchey shows us how to also get execution plans in an XML format and how they actually give more information that their graphical counterpart.

XML executions plans can be saved into files (.sqlplan) and can be viewed either in their XML form or in a graphical form. Being able to save execution plans in a file is very convenient as it makes it easy to share it with DBAs and peers to ask them about their opinion on a slow running query.

Note that on my machine I actually had to edit the .sqlplan file as the first line of the file was a piece of text reading “Microsoft SQL Server 2005 XML Showplan”. This made that the XML file was not well-formed and so could not be loaded by SQL Server Management Studio - this might be fixed in a later Service Pack or hotfix. Once I removed the line of text, I could successfully open the .sqlplan file in SQL Server Management Studio and see the graphical representation of the execution plan. I could also open the file in a text or XML editor and thus see the execution plan in a textual manner. I think that both approaches can prove themselves being complimentary.

The article also shows us how to collect XML execution plans through SQL Server 2005’s Profiler tool. Once execution plans are collected from the server they can be analyzed by DBAs and developers. This can reveal very useful when you want to profile and analyze activities on production environment where you can’t play with data as freely as on a development environment.

For my own sake, I have summarized the introductory materials on the topic hereunder. You might choose to not read it as it is anyway based on the article written by Grant Fritchey.


Execution Plan Introduction.

When you are tuning T-SQL code for performance on SQL Server, the most important information available to you is the Execution Plan. It tells you what kind of JOIN operations and other algorithms are executed as well as what indexes are used. This kind of information will reveal being crucial on poorly performing queries that you need to optimize.

Please keep in mind that this discussion is focused on DML T-SQL.

DML stands for Data Manipulation Language and is aimed at fetching or manipulating data. Basically, it is any SELECT, INSERT, UPDATE and DELETE statement.

DDL stands for Data Definition Language and is aimed at defining data structures. Basically, it is any CREATE, DROP and ALTER statement. DDL statements do not need any query optimization because there is always only 1 way to execute those statements. For example, there is only 1 way to create a table or an index.

When executing a T-SQL query, the T-SQL code is interpreted into instructions understandable by the Database engine. The Database engine is made of multiple processes/sub-engines but 2 are of particular interests regarding Execution Plans: the Relational Engine and the Storage Engine.

Note that in the context of this text, a Process does NOT mean a Windows Process but rather has the more generic meaning of a collection of instructions processing some data. It can be seen as a software module or component.


The Relational Engine.

The Relational Engine is responsible for 3 processes which are of interest in our study:

  • The Parser. The Parser receives the T-SQL query as input and outputs a parse tree or query tree. The parse tree represents the logical steps necessary to execute the query.
  • The Algebrizer. The Algebrizer receives the parse tree from the Parser process as input and resolves all datatypes, names, aliases and synonyms. The output is binary information called the query processor tree.
  • The Query Optimizer. The query optimizer is a piece of software that models the way in which the database relational engine works. Using the query processor tree together with the statistics it has about the data and applying the model, the Query Optimizer works out heuristically what it thinks will be the optimal way to execute the query – that is, it generates an optimized estimated execution plan.


The Storage Engine.

The Storage Engine will execute the estimated execution plan except if it judges that it should be modified. It could be the case if:
  • The estimated execution plan exceeds the threshold for parallel execution.
  • The statistics used to generate the plan are out of date.
  • The estimated execution plan is invalid (for example it creates temp table and so contains DDL statement)

The final execution plan, called the actual execution plan is what is actually executed by the Storage Engine. It might or might not be the same as the estimated execution plan.
Note that generally, there won't be any differences between the esti¬mated and actual execution plans.


Execution Plan Cache.


As it is expensive for the Server to generate execution plans, SQL Server will keep and reuse plans wherever possible. As they are created, plans are stored in a section of memory called the Plan Cache.

Once the estimated execution plan is created, and before it gets passed to the storage engine, the optimizer compares this estimated plan to actual execution plans that already exist in the Plan Cache. This reuse avoids the overhead of creating actual execution plans. This is obviously beneficial for large and complex queries but also for simple queries which could potentially be called very often (hundreds or thousands of time).

Execution Plans can be removed from the cache in the following scenarios:
  • Memory is required by the system.
  • The "age" of the plan (its time-to-live) has reached zero.
  • The plan isn't currently being referenced by an existing connection.

Also, cached execution plans might not be reused if an execution plan needs to be recompiled. Certain events and actions can cause a plan to be recompiled. Grant Fritchey enumerates those events in his article.

Labels:

Tuesday, May 6, 2008

New Features in Visual Studio 2008

In this post I will go through some Visual Studio 2008 tools and features I found interesting.


1. Unit Test Tool

In VS 2008, Unit Testing is facilitated through a unit test class code generator.

Unit Testing is the act of having a piece of code which only purpose is to test another piece of code, this code being part of the end product. It is a particularly tedious task, so having a unit test code generator is very handy.

A Unit Test class is used to test a class that is part of the software being built. If every classes of a software has a matching class used to unit test it, all the code will be unit tested. Unit testing is the lowest level of Quality Assurance, it does not test the software as a whole neither on its external functionalities but rather makes sure that any testable piece of code part of the software is behaving as it should.

You should consider doing Unit Testing as:
  • It creates test units to check if the code is producing expected results.
  • It improves code coverage. Code coverage is a number telling how many percent of the code has actually been tested. The higher value the more confident we can be in the quality of the code (as it means that a large part of the code returns expected values).

To generate a Unit Test class in VS 20008, simply right-click on any class definition and select the Create Unit Tests option to call the unit test generator tool.

Visual Studio 2008 - Create Unit Test Class
This will create a Unit Test class in a separated project dedicated for Unit Testing, this way your Unit Test code and the actual code are separated in different projects and so code is neatly kept separated.

Note that if you try to generate a Unit Test Class for a class that has private or internal modifiers, VS2008 will add a special InternalsVisibleTo attribute in your original project so that your Unit Test project (and only that one) has access to all private, internal and protected methods and classes of the original project containing the classes you want to unit test. This means that the attribute is not added at the class level but at the project level in the AssemblyInfo.cs file.

Moreover, as it can be seen hereunder, only the Unit Test project (here called CalculatorTest) will have access to internal classes and methods:

[assembly: System.Runtime.CompilerServices.InternalsVisibleTo("CalculatorTest")]

Once the Unit Test classes are generated, you will have to go into them, inspect the code and set input values that will be used to call the methods that need to be tested and also set the expected result value that should be returned by the method. This way, the actual value returned can be compared with the expected value to decide if the test was conclusive or not. There are TODO sections declared in the generated code so that you can easily locate where to set test values.

In the example hereunder, values are set for the length and width of the rectangle and the variable expected contains the pre-calculated area of the rectangle so that we can check if the output of the method is equal to what should be returned. If the value returned by the method is not what is expected, that means that the unit test fail and that there is a bug in the tested method. Visual Studio will clearly show in the test results what unit test succeeded or failed.

Lastly, do not forget to comment out the Assert.Inconclusive() method call.

Visual Studio 2008 - Initialize parameter values for Unit Test
The Test project creates a vsmdi file in the solution item folder, named after the solution name, Calculator.vsdmi for a Visual Studio solution called Calculator.

To actually test some or all the unit test methods, open the vsdmi file, select the methods that you want to test, right-click on the list of methods and select Run Checked Tests.

Visual Studio 2008 - Choose Method To Unit Test
Once the test ran, the result will be displayed showing if the test succeeded, failed or was inconclusive.

Visual Studio 2008 - Unit Test Result Window

2. Object Test Bench

If you do not have the time to create full blown Unit Test but still want to test some of your classes and/or methods, there is a quick and dirty way to do it by using the Object Test Bench tool. The tool can call static methods on classes and create object instances of classes so that instance methods can be called ad-hoc. Methods can thus be tested very simply, in a similar way as when you chose to execute a Stored Procedure in SQL Server Management Studio and that a window pops up to let you input the SP parameters.

To use the Object Test Bench, you need first to create Class Diagram of your project. To do so, click on the Class View of your Visual Studio project, select the root namespace of your project, right-click on it and choose the View Class Diagram option.

Visual Studio 2008 - Generate Class Diagram From Source Code
Once the class diagram is created you can call the Object Test Bench tool by either calling a static method on the class or one of the class constructor so that you will be able to call an instance methods after the object is instanciated.

In my case I want to test the Add() method to check if my Calculator object correctly adds numbers. To do so I first instanciate a Calculator object which opens the Object Test Bench window under the class diagram. After the object is created in memory and appears in the Object Test Bench window, I can chose to call any method of the object. I will thus dynamically call the Add() method through the Visual Studio IDE and check if the method returns a correct result.

Visual Studio 2008 - Start Object Test Bench Tool - Create Object Instance
Visual Studio 2008 - Start Object Test Bench Window
A window will pop up so that you can give values to the input parameters.

Visual Studio 2008 - Object Test Bench Invoke Method And Set Input Parameters
Once the parameters are entered and you click on OK, a pop up window will display the result. You can choose to save the result in a variable so that you can re-use it later. For example you could re-use the variable as a parameter to call another method that you want to test-bench. This way, in a few clicks, you can create a bunch of objects and then re-use them later to call methods that take complex-type objects as parameter.

Visual Studio 2008 - Object Test Bench Invoke Method Result Window

3. Generate Method Stub (only for C#)

It is a code generation feature that creates methods before they exist; the method created is based on the call to that method. Once the call to the method is made, Visual Studio 2008 IntelliSense will give you the option to generate a method stub matching the call to that method, with the matching input parameters and return type.

Visual Studio 2008 - Generate Method Stub
The generated method will be created in the matching class and its stub implementation will simply throw a NotImplementedException.
Visual Studio 2008 - Generate Method Stub Result
Note that this code generation feature is only available for C#.


4. Refactoring Tools (C# only)

Refactoring is making changes to a body of code in order to improve its internal structure without changing its external behavior.

- Martin Fowler

It is useful concept to make code cleaner and more understandable/readable.

A typical refactoring case is to break up a lengthy method into separate methods. To do so, you can highlight a piece of a code, right-click on it (or go to the Refactor menu of Visual Studio) and choose Extract Method. This will generate a method containing the highlighted code as well as calling the generated method from the original location.

Refactor menu in Visual Studio 2008:

Visual Studio 2008 - Refactor Menu
Refactoring code by creating a method to shorten the original code, it is the action of extracting a method from a piece of code.

Visual Studio 2008 - Refactoring A Method
The example here above will create a method containing the highlighted code and replace the original code by a call to the generated method (which I called WriteLogToConsole):
Visual Studio 2008 - Refactoring, Generated Method

5. The .Net Framework Source Code

It is possible to go through the source code of the .Net Framework while debugging. Here are the steps I did to make it work:

1. Install the hotfix KB 944899 - Visual Studio 2008 performance decreases when you step through source code that you downloaded from Reference Source Server.

2. Configure Visual Studio debugger to be able to step in the .Net Framework Source Code:
  • Uncheck Enable Just My Code (Managed only).
  • Check Enable source server support.

Visual Studio 2008 - Configuring Visual Studio For Framework Source Code Debugging
3. Configure the Symbols part of the Visual Studio debugger options so that Visual Studio know where to download the .Net Framework debugging symbol (.pdb files) and source code.
  • Set the Symbol file (.pdb) location to be: http://referencesource.microsoft.com/symbols
  • Set a Cache location folder where the .Net Framework pdb and source code files will be stored. Make sure it is a location that your account has read/write access to. For example, a folder under your user hive.
  • Clear the Search the above locations only when symbols are loaded manually if you want that Visual Studio automatically download symbols and source code while you step in .Net Framework code (F11 shortcut key). Note that if your project is big and references many libraries, downloading all the debugging symbols will be slow at the first debug. If you prefer to load symbols only when needed, keep that box checked. You will then have to download debugging symbols and source code on demand by right-clicking the appropriate dll in the stacktrace and choose Load Symbol.

Here is how I configured my Visual Studio:

Visual Studio 2008 - Configuring Visual Studio Framework Source Code Symbol Location
Visual Studio 2008 is now all set to debug and step in .Net Framework Source Code!

While debugging, we can now see that the debugger call stack contains detailed file and line number information for the .NET Framework classes and methods:

Visual Studio 2008 - Debugger Call Stack
Example of use:

In the following screenshot, I stepped in a line of code that calls the ToString() method on a Double type, this makes that the mscorlib pdb file is downloaded as well as the source code for the Double structure so that I can actually debug into the Double type and see its implementation as written by the .Net team. That is something I find really cool and I think has been missing for a long time!

Visual Studio 2008 - Step In .Net Framework Source Code

Modules Window:

While you are debugging, you can bring up the Modules Wwndow by hitting the ALT+CTRL+U keys. This window shows all the dll loaded by the debugger and let you see which dll has debug information loaded and which does not. You can manually load debugging symbols from that window by right clicking on the library you want to load the symbols for and select the Load Symbols option.

Visual Studio 2008 - Debugger Modules Window

List of assemblies currently available at the time of writing for symbol/source loading:
  • Mscorlib.dll
  • System.dll
  • System.Data.dll
  • System.Drawing.dll
  • System.Web.dll
  • System.Web.Extensions.dll
  • System.Windows.Forms.dll
  • System.XML.dll
  • WPF (UIAutomation*.dll, System.Windows.dll, System.Printing.dll, System.Speech.dll, WindowsBase.dll, WindowsFormsIntegration.dll, Presentation*.dll, some others)
  • Microsoft.VisualBasic.dll

For reference, here is a lengthier blog post by Shawn Burke with more information regarding .Net Framework Source Code debugging.


6. SQL Metal

SQL Metal is used to help implementing LINQ to SQL scenarios. It is a command-line utility (sqlmetal.exe).

SQL Metal can:
  • Generate source code and mapping attributes or a mapping file from a database.
  • Generate an intermediate database markup language file (.dbml) for customization from a database.
  • Generate code and mapping attributes or a mapping file from a .dbml file.

What is a mapping file?
A mapping file is an XML file to specify mapping between the data model of the database and the object model of the .Net code. It keeps the mapping code out of the application code which helps in keeping the code cleaner and leaner. Moreover, since it is XML (like any other .config file) it can be changed without having to rebuild the application code.

Check out MSDN documentation for more information about SQLMetal.


7. Visual Studio 2008 Product Comparison

Great comparison between the different functionalities available on each edition of Visual Studio 2008.
http://msdn2.microsoft.com/en-us/vstudio/products/cc149003.aspx

Labels: