T-SQL Weighted Round Robin Algorithm

In a previous post I was discussing of ways to load balance MSMQ messages to a group of servers by implementing a Weighted Round Robin Algorithm in a software module.

I chose to implement the load balancing feature in the database for 2 main reasons:

  • The application is heavily database driven; stored procedures have to run to retrieve the data that will be used to build each MSMQ message. Therefore, the cost of an additional statement or stored procedure to retrieve the MSMQ path the message should be sent to is much less than the existing cost.
  • All the application configuration information is also database driven. This is because our organization uses custom tools for administrators so that they can configure servers and applications centrally. In the case at hand, they need to be able to modify queue names, machine relative weights as well as to add and remove machines from the load balanced group of server (the cluster).

The Weighted Round Robin algorithm is implemented as a stored procedure returning the next MSMQ path a message should be sent to. The stored procedure could be bundled or merged with the other stored procedures that have to run before each MSMQ message is sent, so that it would only be one more parameter coming back from the database. This parameter will tell the .Net code to which queue the current MSMQ message build has to be sent to.

T-SQL Weighted Round Robin Algorithm Implementation.

Each machine of a cluster should receive a percentage of messages which is relative to its weight:
%age = ((weight of machine) / (sum of weights for all the machines in the cluster)) * 100
Moreover, the distribution of messages to each machine should be function of the weight in real time, not only in average.

To satisfy these conditions, one way to implement the weighted round robin algorithm and which suits well T-SQL capabilities is to calculate a ratio between the relative number of messages sent to a particular machine of a cluster and the relative weight of that machine within the cluster. The machine having the higher ratio will be the next machine a MSMQ message should be sent to in respect with its relative weight. An ORDER BY clause will easily implement that in T-SQL.

First of all, we need to create a table where we will hold the different server parameters:

Column name Data type Purpose
server_name [nvarchar](20) Machine name
cluster [char](10) Cluster name in which the machine belongs to
queue_path [nvarchar](100) MSMQ path
number_requests_sent [float] Number of request sent to this machine so far
weight_factor [float] Weight factor of this machine
enabled_status [bit] Says if the machine is active or inactive in the cluster

Here is the SQL script to create the table:

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[msmq_machines](

[server_name] [nvarchar](20) NOT NULL,

[cluster] [char](10) NOT NULL,

[queue_path] [nvarchar](100) NOT NULL,

[number_requests_sent] [float] NOT NULL CONSTRAINT [DF_msmq_machines_number_requests_sent]DEFAULT ((0)),

[weight_factor] [float] NOT NULL CONSTRAINT [DF_msmq_machines_weight_factor]DEFAULT ((100)),

[enabled_status] [bit] NOT NULL,

CONSTRAINT [PK_msmq_machines] PRIMARY KEY CLUSTERED

(

[server_name] ASC,

[cluster] ASC

)WITH (PAD_INDEX= OFF, STATISTICS_NORECOMPUTE= OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS= ON, ALLOW_PAGE_LOCKS= ON) ON [PRIMARY]

) ON [PRIMARY]


GO

SET ANSI_PADDING OFF

Second we need to create a Stored Procedure that will actually implement the algorithm:

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE PROCEDURE [dbo].[GetLoadBalancedQueueName]

@cluster as char(10),

@serverName as nvarchar(20) out,

@queuePath as nvarchar(100) out

AS

DECLARE @totalClusterRequestSent as float

DECLARE @totalWeight as float

SELECT @totalClusterRequestSent = sum(number_requests_sent)+1, @totalWeight = sum(weight_factor)

FROM dbo.msmq_machines

WHERE cluster = @cluster

AND enabled_status = 1

SELECT top 1 @serverName = server_name, @queuePath = queue_path

FROM dbo.msmq_machines as A

WHERE A.cluster = @cluster

AND A.enabled_status = 1

order by (number_requests_sent/@totalClusterRequestSent)/(weight_factor/@totalWeight), weight_factor desc

UPDATE dbo.msmq_machines

SET number_requests_sent = number_requests_sent + 1

WHERE server_name = @serverName

AND cluster = @cluster

Note on the T-SQL implementation:

  • In the definition of the table msmq_machines, I have chosen data types to be float instead of integers (as they actually will hold only integers values) so that no data type conversion is needed when calculating ratios.
  • @totalClusterRequestSent is calculated as the sum of number_requests_sent + 1 so that if the sum of number_requests_sent is 0 (zero), no division by zero error occurs.

In addition of this, an SQL job will have to run on a regular basis to re-initialize the number_requests_sent column to 0 (zero) so that the value never overflows the data type.

T-SQL weighted Round Robin algorithm Testing

We can test if the algorithm works as expected by creating 3 machines in the msmq_machines table and give them different weights:
MSMQ Load Balancing Weighted Round Robin Configuration Table
If we send 100 messages to the cluster “Bangkok”:

  • DEV1 should receive (100 / 350) * 100 = 28.57 -> 29 messages
  • DEV2 should receive (200 / 350) * 100 = 57.14 -> 57 messages
  • DEV3 should receive (50 / 350) * 100 = 14.28 -> 14 messages

We can create an SQL script that calls the Stored Procedure 100 times and see if the result is what is expected.

declare @counter int

set @counter = 0

while @counter < 100

begin

set @counter = @counter + 1

DECLARE @server as nvarchar(20)

DECLARE @msmq as nvarchar(100)

EXEC [dbo].[GetLoadBalancedQueueName] ‘Bangkok’ , @serverName = @server out, @queuePath = @msmq out

print @server

print @msmq

end

Opening the msmq_machines table, we can see that the expected number of messages has been sent to each server of the cluster proving that the implementation works. MSMQ Load Balancing Weighted Round Robin Configuration Table Result

2 Replies to “T-SQL Weighted Round Robin Algorithm”

  1. Hey,

    This was most interesting. I chanced upon this post while googling for Load Balancing techniques. My problem is bit different. I am trying to design a system where not only load balancing is a concern but also order of message processing. Let me explain my problem. May be your expertise might help with the solution.

    I have thousands of messages coming into a Oracle Advanced Queue ( AQ ) from another queue ( MQ ).

    These are trade messages and each message has an unique order id.If there are corrections or cancels for the order, the order id will be the same but differs from its parent in the sequeuence. So you can say order id and sequence number forms unique key for the message.

    A simple program listens to MQ and gets the above message and inserts it into AQ along with a listenerId . This program knows
    before hand how many listeners are involved ( lets say 3 ) and assigns the result of MOD([OrderId],[Number of Listeners]) to the message.

    So a message with OrderId,Sequeuence 1234,1 will be assigned listerner id MOD(1234,3) = 1

    1235,1 = 2
    1236,1 = 0
    1235,2 = 2
    1234,2 = 1
    1234,3 = 1
    1234,4 = 1
    1234,5 = 1
    1234,6 = 1

    and so on. In this it makes sure same lisenter will process all the messages for a same order. However the flaw in this round robin logic is, if there are too many messages for a same order Id ( that is many corrections for a same order) then all these will be queued up for the same lisenter and other two will be relatively free. In the above example if Lisenter 1 is processing messages related to orderId 1234 and then if another message comes with Id 1237, then that too will be assigned to same listner though other two are relatively free.

    Have you implemented something which will solve my problem? The program that assigns lisenterId should be smart enough to distribute the load but also make sure order of processing is intact.

    Another way I thought of is having a map which contains orderid and listner acting on that order. Program has to look at the map every time it assigns the listner id and if parent order is being processed by any listner, then assign the same listner to the child message other pick one which is having least work. But looking at the map ( database ) for every message might become expensive.

    Thanks
    Prashanth

  2. Hi,

    Thanks for your comment.

    The modulus operator is a good idea for classic round-robin, especially when you need to have a specific message familly (in your case defined by the OrderId) to a specific destination.

    I have 2 different advises I can give you:

    1. Will an order really have so many more revisions that one server might end up being completly overloaded while the others are sitting iddle? Probably not. Also, I guess that revisions on orders are pretty random and so, in average, should be distributed across orders assigned to all the destination servers. If my asumptions conform with your business scenario, your implementation is already good enough, and you might have very small spikes once in a while on your destination servers, but over time, on average the servers will execute the same amount of work and so have to same load. Because all the servers will in average receive the same number of orders as well as the approximatly the same numbers or revision.
    On a site note, all of your servers should be able to handle a spike of traffic once in a while for a short duration. If they can’t then it means that your servers are already overloaded and you need to either tune your application or upgrade the hardware. Anyway, I do not think that 1 orderId will keep having revisions while no new orders are made so that 1 server is 100% CPU usage while the 2 others are doing nothing.
    So in short, I think that on average and in real-life applications, your load balancing algorithm is already most likely a good enough approximation.

    2. Would your business scenario be so extreme that my theory in the first paragraph is not valid, then you will just have to add some logic after calculating the modulus.
    2.1. Would the current message be have a sequenceId different than 1 (if 1 is the first sequence for a particular order), then no additional calculation will be made after you calculate the result of the modulus operation as the message has to go to the same server as previous messages for that order.
    2.2. Would the current message have a sequenceId of 1, you will have to check the number of requests sent to each server and override the result returned by the modulus calculaton would you deam that the result of the modulus points to a server which received too many requests compared to the others.

    Hope this helps,

    Francois Malgreve.

Leave a Reply

Your email address will not be published. Required fields are marked *


*