SQL Server and .Net Data Type mapping

Today I would like to post a table listing the corresponding .Net Type to use with each SQL Data Type. This comes in handy when writing ADO.NET code.

The following link lists Microsoft SQL Server data types and their equivalent data type in .Net.
To be more exact it lists:

  • Every SQL Server data types
  • Their equivalent in the Common Language Runtime (CLR) for SQL Server in the System.Data.SqlTypes namespace
  • Their native CLR equivalents in the Microsoft .NET Framework

SQL Server Data Types and Their .NET Framework Equivalents

I have also copy-pasted the SQL Server and .Net Data Type mapping table hereunder for my own convenience:

SQL Server data type CLR data type (SQL Server) CLR data type (.NET Framework)
varbinary SqlBytes, SqlBinary Byte[]
binary SqlBytes, SqlBinary Byte[]
varbinary(1), binary(1) SqlBytes, SqlBinary byte, Byte[]
image None None
varchar None None
char None None
nvarchar(1), nchar(1) SqlChars, SqlString Char, String, Char[]
nvarchar SqlChars, SqlString SQLChars is a better match for data transfer and access, and SQLString is a better match for performing String operations. String, Char[]
nchar SqlChars, SqlString String, Char[]
text None None
ntext None None
uniqueidentifier SqlGuid Guid
rowversion None Byte[]
bit SqlBoolean Boolean
tinyint SqlByte Byte
smallint SqlInt16 Int16 (short in C#)
int SqlInt32 Int32 (int in C#)
bigint SqlInt64 Int64 (long in C#)
smallmoney SqlMoney Decimal (decimal in C#)
money SqlMoney Decimal (decimal in C#)
numeric SqlDecimal Decimal (decimal in C#)
decimal SqlDecimal Decimal (decimal in C#)
real SqlSingle Single (float in C#)
float SqlDouble Double (double in C#)
smalldatetime SqlDateTime DateTime
datetime SqlDateTime DateTime
sql_variant None Object
User-defined type(UDT) None Same class that is bound to the user-defined type in the same assembly or a dependent assembly.
table None None
cursor None None
timestamp None None
xml SqlXml None

Regarding CLR Data Types, Data Types are basically divided in 3 categories:

Furthermore, 3 types of CLR Data Type structures exist to hold numeric values:

See Data Types (C# Reference) for a complete reference and correspondance between CLR types and C# types.

Multi-Threading in C# tutorial

I have read the Threading in C# tutorial written by Joseph Albahari.

It is by far the most complete, yet easy to read tutorial I have read so far on threading in .Net. It can also serve as introductory material for anyone unexperienced with multi-threading concepts. Most .Net / C# books usually have a chapter or two on the topic but most often just scratch the surface of multi-threading in .Net.

I highly recommend anyone to read his paper.

T-SQL Weighted Round Robin Algorithm

In a previous post I was discussing of ways to load balance MSMQ messages to a group of servers by implementing a Weighted Round Robin Algorithm in a software module.

I chose to implement the load balancing feature in the database for 2 main reasons:

  • The application is heavily database driven; stored procedures have to run to retrieve the data that will be used to build each MSMQ message. Therefore, the cost of an additional statement or stored procedure to retrieve the MSMQ path the message should be sent to is much less than the existing cost.
  • All the application configuration information is also database driven. This is because our organization uses custom tools for administrators so that they can configure servers and applications centrally. In the case at hand, they need to be able to modify queue names, machine relative weights as well as to add and remove machines from the load balanced group of server (the cluster).

The Weighted Round Robin algorithm is implemented as a stored procedure returning the next MSMQ path a message should be sent to. The stored procedure could be bundled or merged with the other stored procedures that have to run before each MSMQ message is sent, so that it would only be one more parameter coming back from the database. This parameter will tell the .Net code to which queue the current MSMQ message build has to be sent to.

T-SQL Weighted Round Robin Algorithm Implementation.

Each machine of a cluster should receive a percentage of messages which is relative to its weight:
%age = ((weight of machine) / (sum of weights for all the machines in the cluster)) * 100
Moreover, the distribution of messages to each machine should be function of the weight in real time, not only in average.

To satisfy these conditions, one way to implement the weighted round robin algorithm and which suits well T-SQL capabilities is to calculate a ratio between the relative number of messages sent to a particular machine of a cluster and the relative weight of that machine within the cluster. The machine having the higher ratio will be the next machine a MSMQ message should be sent to in respect with its relative weight. An ORDER BY clause will easily implement that in T-SQL.

First of all, we need to create a table where we will hold the different server parameters:

Column name Data type Purpose
server_name [nvarchar](20) Machine name
cluster [char](10) Cluster name in which the machine belongs to
queue_path [nvarchar](100) MSMQ path
number_requests_sent [float] Number of request sent to this machine so far
weight_factor [float] Weight factor of this machine
enabled_status [bit] Says if the machine is active or inactive in the cluster

Here is the SQL script to create the table:

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[msmq_machines](

[server_name] [nvarchar](20) NOT NULL,

[cluster] [char](10) NOT NULL,

[queue_path] [nvarchar](100) NOT NULL,

[number_requests_sent] [float] NOT NULL CONSTRAINT [DF_msmq_machines_number_requests_sent]DEFAULT ((0)),

[weight_factor] [float] NOT NULL CONSTRAINT [DF_msmq_machines_weight_factor]DEFAULT ((100)),

[enabled_status] [bit] NOT NULL,

CONSTRAINT [PK_msmq_machines] PRIMARY KEY CLUSTERED

(

[server_name] ASC,

[cluster] ASC

)WITH (PAD_INDEX= OFF, STATISTICS_NORECOMPUTE= OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS= ON, ALLOW_PAGE_LOCKS= ON) ON [PRIMARY]

) ON [PRIMARY]


GO

SET ANSI_PADDING OFF

Second we need to create a Stored Procedure that will actually implement the algorithm:

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE PROCEDURE [dbo].[GetLoadBalancedQueueName]

@cluster as char(10),

@serverName as nvarchar(20) out,

@queuePath as nvarchar(100) out

AS

DECLARE @totalClusterRequestSent as float

DECLARE @totalWeight as float

SELECT @totalClusterRequestSent = sum(number_requests_sent)+1, @totalWeight = sum(weight_factor)

FROM dbo.msmq_machines

WHERE cluster = @cluster

AND enabled_status = 1

SELECT top 1 @serverName = server_name, @queuePath = queue_path

FROM dbo.msmq_machines as A

WHERE A.cluster = @cluster

AND A.enabled_status = 1

order by (number_requests_sent/@totalClusterRequestSent)/(weight_factor/@totalWeight), weight_factor desc

UPDATE dbo.msmq_machines

SET number_requests_sent = number_requests_sent + 1

WHERE server_name = @serverName

AND cluster = @cluster

Note on the T-SQL implementation:

  • In the definition of the table msmq_machines, I have chosen data types to be float instead of integers (as they actually will hold only integers values) so that no data type conversion is needed when calculating ratios.
  • @totalClusterRequestSent is calculated as the sum of number_requests_sent + 1 so that if the sum of number_requests_sent is 0 (zero), no division by zero error occurs.

In addition of this, an SQL job will have to run on a regular basis to re-initialize the number_requests_sent column to 0 (zero) so that the value never overflows the data type.

T-SQL weighted Round Robin algorithm Testing

We can test if the algorithm works as expected by creating 3 machines in the msmq_machines table and give them different weights:
MSMQ Load Balancing Weighted Round Robin Configuration Table
If we send 100 messages to the cluster “Bangkok”:

  • DEV1 should receive (100 / 350) * 100 = 28.57 -> 29 messages
  • DEV2 should receive (200 / 350) * 100 = 57.14 -> 57 messages
  • DEV3 should receive (50 / 350) * 100 = 14.28 -> 14 messages

We can create an SQL script that calls the Stored Procedure 100 times and see if the result is what is expected.

declare @counter int

set @counter = 0

while @counter < 100

begin

set @counter = @counter + 1

DECLARE @server as nvarchar(20)

DECLARE @msmq as nvarchar(100)

EXEC [dbo].[GetLoadBalancedQueueName] ‘Bangkok’ , @serverName = @server out, @queuePath = @msmq out

print @server

print @msmq

end

Opening the msmq_machines table, we can see that the expected number of messages has been sent to each server of the cluster proving that the implementation works. MSMQ Load Balancing Weighted Round Robin Configuration Table Result

Load Balancing MSMQ messages

I am currently writing a database-driven .Net application which needs to send MSMQ messages load balanced across a variable number of machines.

The case scenario is the following:

A server needs to send MSMQ messages to servers A, B and C (we choose 3 MSMQ message recipients for the purpose of this example). The original idea was to put a network load balancer (NLB) between the machine sending MSMQ messages and the recipients.

MSMQ Messages Load Balancing with Network Load Balancer (NLB)
If the “MSMQ Sender” machine pictured above sends 75 MSMQ messages, the goal is for machines A, B and C to receive 25 messages each. As we know that a network load balancer distributes load on a connection basis, we were hoping that messages would be load-balanced if the .Net code was creating a new System.Messaging.MessageQueue object for each MSMQ messages sent to the NLB. To be more precise, we were hoping that a new connection to the NLB would be created each time we send a MSMQ message with a new instance of the MessageQueue object.

This was pure speculation and a quick test proved that it did not hold; using a simple network load balancer, all the messages were pushed to a single destination server. This happens because connections are re-used within the MSMQ Windows Service, regardless of how the .Net code is written (as a reminder, the .Net class MessageQueue is just a wrapper around the MSMQ Windows service).

Because traffic is load-balanced on a connection basis (not on a message basis) and because the same TCP connection is re-used by the MSMQ service, the NLB forwards all the traffic to the same destination machine. Would the server send 100 MSMQ messages to the NLB, 100 messages would be forwarded to the same target machine as they are all sent using the same underlying TCP connection.

As we have very little control over the way connections are managed within the MSMQ Windows Service, we had to part away with this simple NLB solution and implement an ad-hoc solution.
We chose to implement the load balancing feature in the .Net application itself. The only addition needed to the existing software is a way to configure it so that it can send MSMQ messages to different queue paths. A simple isolated class, module or piece of code could easily do that.

Server Load Balancing: Algorithms

Before choosing how to implement the solution, let’s have a look at different ways (algorithms) on how to implement load balancing.

  • Random Allocation
    In a random allocation, the traffic (MSMQ messages in our case) is assigned to any server picked randomly among the group of destination servers. In such a case, one of the server may be assigned many more requests to process while the other servers are sitting idle. However, on average, each server gets an approximately equal share of the load due to the random selection.
    Pros: Simple to implement.
    Cons: Can lead to overloading of one server or more while under-utilization of others.
  • Round-Robin Allocation
    In a round-robin algorithm, the traffic is sent to the destination server on a rotating basis. The first request is allocated to a server picked randomly from the group of destination server. For subsequent requests, the algorithm follows the circular order destination servers are listed. Once a server is assigned a request, the server is moved to the end of the list and the next server is chosen for the following request. This keeps all the servers equally assigned.
    Pros: Better than random allocation because the requests are equally divided among the available servers in an orderly fashion.
    Cons: The round robin algorithm is not good enough for load balancing if technical specification of the servers part of the destination group differs greatly (making that the load each server can handle differs greatly).
  • Weighted Round-Robin Allocation
    Weighted Round-Robin is an advanced version of the Round-Robin that takes in account server capability. In case of a weighted round-robin, one can assign a weight to each server in the destination group. For example, if the server group consists of 2 servers and that one server is capable of handling twice as much load as the other, the powerful server gets twice the weight factor. In such a case, the application would assign two requests to the powerful server for each request assigned to the weaker one. In effect, a server with more weight will receive load proportionally to their weight factor.
    Pros: Takes care of the capacity of the servers in the group.
    Cons: Does not consider advanced load balancing requirements such as processing time for each individual request.

We have chosen the Weighted Round Robin algorithm as it is the most advantageous load balancing algorithm to implement easily.
We have chosen to implement the algorithm in T-SQL, but it could be easily implemented in a singleton class in any language such as C# or Java. I will explain why and how in a next blog post: T-SQL Weighted Round Robin Algorithm.

SQL Server Execution Plan Tutorial

Here is a great article written by Grant Fritchey on the basics of analyzing execution plans on SQL Server. It is actually the first chapter of his book on the topic. Yes, a whole book on SQL Server Execution Plans! I haven’t read the whole book but it should probably reveal itself very interesting for DBAs.

Most of us are familiar with the graphical representation of execution plans but Grant Fritchey shows us how to also get execution plans in an XML format and how they actually give more information that their graphical counterpart.

XML executions plans can be saved into files (.sqlplan) and can be viewed either in their XML form or in a graphical form. Being able to save execution plans in a file is very convenient as it makes it easy to share it with DBAs and peers to ask them about their opinion on a slow running query.

Note that on my machine I actually had to edit the .sqlplan file as the first line of the file was a piece of text reading “Microsoft SQL Server 2005 XML Showplan”. This made that the XML file was not well-formed and so could not be loaded by SQL Server Management Studio – this might be fixed in a later Service Pack or hotfix. Once I removed the line of text, I could successfully open the .sqlplan file in SQL Server Management Studio and see the graphical representation of the execution plan. I could also open the file in a text or XML editor and thus see the execution plan in a textual manner. I think that both approaches can prove themselves being complimentary.

The article also shows us how to collect XML execution plans through SQL Server 2005’s Profiler tool. Once execution plans are collected from the server they can be analyzed by DBAs and developers. This can reveal very useful when you want to profile and analyze activities on production environment where you can’t play with data as freely as on a development environment.

For my own sake, I have summarized the introductory materials on the topic hereunder. You might choose to not read it as it is anyway based on the article written by Grant Fritchey.

Execution Plan Introduction.

When you are tuning T-SQL code for performance on SQL Server, the most important information available to you is the Execution Plan. It tells you what kind of JOIN operations and other algorithms are executed as well as what indexes are used. This kind of information will reveal being crucial on poorly performing queries that you need to optimize.

Please keep in mind that this discussion is focused on DML T-SQL.

DML stands for Data Manipulation Language and is aimed at fetching or manipulating data. Basically, it is any SELECT, INSERT, UPDATE and DELETE statement.

DDL stands for Data Definition Language and is aimed at defining data structures. Basically, it is any CREATE, DROP and ALTER statement. DDL statements do not need any query optimization because there is always only 1 way to execute those statements. For example, there is only 1 way to create a table or an index.

When executing a T-SQL query, the T-SQL code is interpreted into instructions understandable by the Database engine. The Database engine is made of multiple processes/sub-engines but 2 are of particular interests regarding Execution Plans: the Relational Engine and the Storage Engine.

Note that in the context of this text, a Process does NOT mean a Windows Process but rather has the more generic meaning of a collection of instructions processing some data. It can be seen as a software module or component.

The Relational Engine.

The Relational Engine is responsible for 3 processes which are of interest in our study:

  • The Parser. The Parser receives the T-SQL query as input and outputs a parse tree or query tree. The parse tree represents the logical steps necessary to execute the query.
  • The Algebrizer. The Algebrizer receives the parse tree from the Parser process as input and resolves all datatypes, names, aliases and synonyms. The output is binary information called the query processor tree.
  • The Query Optimizer. The query optimizer is a piece of software that models the way in which the database relational engine works. Using the query processor tree together with the statistics it has about the data and applying the model, the Query Optimizer works out heuristically what it thinks will be the optimal way to execute the query – that is, it generates an optimized estimated execution plan.

The Storage Engine.

The Storage Engine will execute the estimated execution plan except if it judges that it should be modified. It could be the case if:

  • The estimated execution plan exceeds the threshold for parallel execution.
  • The statistics used to generate the plan are out of date.
  • The estimated execution plan is invalid (for example it creates temp table and so contains DDL statement)

The final execution plan, called the actual execution plan is what is actually executed by the Storage Engine. It might or might not be the same as the estimated execution plan.
Note that generally, there won’t be any differences between the esti¬mated and actual execution plans.

Execution Plan Cache.

As it is expensive for the Server to generate execution plans, SQL Server will keep and reuse plans wherever possible. As they are created, plans are stored in a section of memory called the Plan Cache.

Once the estimated execution plan is created, and before it gets passed to the storage engine, the optimizer compares this estimated plan to actual execution plans that already exist in the Plan Cache. This reuse avoids the overhead of creating actual execution plans. This is obviously beneficial for large and complex queries but also for simple queries which could potentially be called very often (hundreds or thousands of time).

Execution Plans can be removed from the cache in the following scenarios:

  • Memory is required by the system.
  • The “age” of the plan (its time-to-live) has reached zero.
  • The plan isn’t currently being referenced by an existing connection.

Also, cached execution plans might not be reused if an execution plan needs to be recompiled. Certain events and actions can cause a plan to be recompiled. Grant Fritchey enumerates those events in his article.