Windows Process, .Net Application Domain and 2 GB limit on 32-bit Windows

A few weeks ago I heard some comments from a colleague about how .Net applications run and that “all .Net applications run in the same runtime (CLR) so that if you start 10 separate .Net applications, they would share together a single 2 GB limit on Windows 32-bit”. This of course not true and it gave me the idea to blog about the 2GB limit on 32-bit systems, Windows Process, .Net applications and the concept of .Net Application Domain.

2 GB limitation on Windows 32-bit.

32-bit Operating Systems are capped by the number of unique pointers that can exist at a time. On 32-bit processors, only 2^32 distinct addresses can exist. Would all these addresses be used, that would represent 4 GB of memory. On a Windows Operating System the memory address system is not a 1-to-1 relationship to the physical memory of your hardware otherwise you would be stuck with a maximum of 4 GB of addressable memory for the whole machine. This would include all the I/O address space, kernel memory and so leave much less actual memory for programmers to use.

This is why when we talk about memory, it is important to realize the distinction between the physical memory (RAM on the motherboard) and the Virtual Memory accessible through the Virtual Address Space. Note that actually, Virtual Memory is not the same as Virtual Address Space and that there are ways to use Virtual Memory without using the Virtual Address Space. I will nevertheless not go into these details; the important thing to remember is that Windows has a complex memory management system that enables the O.S. to use much more than 4 GB as a whole. The inner workings are not for the faint-hearted and are actually not of interest for most .Net programmers living in the managed world.

Check this blog post for a primer on memory management on Windows Operating System.

When Windows 32 starts a program, a 32 bit process using 32 bit size pointers is created and so the process has a maximum of 4 GB of addressable memory.
Windows will assign to the process a Virtual Address Space of 4 GB (2^32) split in two; 2 GB of user mode virtual address space and 2 GB of kernel mode virtual address space. The user mode virtual address space is the “memory” (read the virtual address space to be correct) available for your program to use.
This 2 GB user mode virtual address space limit is what is commonly called the 2 GB memory limit on Windows 32-bit.

/3 GB switch on 32-bit Windows

The /3GB switch changes the way the 4GB virtual address space is split up. With the /3GB switch, the split is 3GB of user mode virtual address space and 1GB of kernel mode virtual address space. It is nevertheless not recommended to use this option as it can bring unexpected bug from drivers and other kernel-mode processes which might expect to have 2 GB of kernel virtual address space available (not that a driver would ever need 2 GB, just that an older driver might expect to have addresses from 0x80000000 to 0xFFFFFFFF available).
See here and here for other problems that can arise when using the /3GB switch.

AWE

AWE does not give more virtual address space to a process. AWE stands for Address Windowing Extension and is a Microsoft API (Application Programming Interface) that allows a 32-bit software application to access more physical memory that it has virtual address space.
AWE enables programs to reserve physical memory as non-paged memory and then to dynamically map portions of the non-paged memory to the program’s working set of memory. This process enables memory-intensive programs, such as large database systems, to reserve large amounts of physical memory for data without having to be paged in and out of a paging file for usage.
To be clear, AWE can only be available on programs that actually use the AWE API, it is not an OS switch that can be turned on/off on any program.

Windows 64-bit

On windows 64-bit there is not 2 GB limit, the user mode virtual address space limit being 8TB. See here for reference.

Windows Process and Runtime Host

A Windows process is an instance of a program that is executing over the Windows layer. A process contains the executable code and data inside the memory reserved for it by the Operating System. There will be at least one thread executing instructions within the process but more in most cases.

Any program running on Windows is actually working within a process. If you open 2 instances of notepad, you can see that 2 processes running notepad.exe are visible under the Processes tab of the Windows Task Manager.

The concept of a Process exists for two main reasons:

  • To enable multitasking (time sharing), the different processes a CPU is running will have their states changing between running and waiting very quickly and so give the illusion to the end-user that all processes are running in the same time. This brings multitasking as well as scalability.
  • To provide boundaries between running programs so that a process cannot peak into another one and that erroneous code inside a process cannot corrupt areas outside of that process (so that a process cannot crash another one). This is brings security and stability.

The isolation between processes is achieved by making sure that any given unique virtual address space runs exactly into one process and not any other.

Runtime Host

.Net applications are compiled in CIL (Common Intermediate Language, formally called MSIL – Microsoft Intermediate Language), and then are JITed (Just-In-Time compiled) by the CLR (Common Language Runtime) into instructions directly understandable by the CPU (native code).
Here is an illustration of this 2 step compilation process:


This means that .Net applications are not Win32 applications and so cannot be executed directly by the Operating System. As any application running on Windows has to run through a Windows Process, a Windows Process called a Runtime Host will actually execute (host) the .Net Application. The Runtime Host first loads the CLR dll (a native Windows library – unmanaged code) which in turn loads the .Net application (managed code), JIT compiles it and runs it. The process thus effectively transitions the control of running the application from itself to the CLR.

There are 2 types of Runtime Host shipped with the .Net Framework, ASP.NET and Shell. Shell runs all Windows-type applications (Windows Form, Windows Service or Console App).

We can see that this concept actually adds a new layer between the .Net application and the Operating System. This layer, implemented by the CLR, is generically called a Virtual Machine and has OS-like features. It is an abstraction layer between the .Net application and the Operating System. As with Java, this permits any .Net Application to run on any Operating System as long as there is a CLR implemented for that OS.

2 GB limit for .Net applications

As the Runtime Host is a Windows Process, the .Net applications run by a Runtime Host is limited to the 2GB barrier on 32-bit Windows OS. Nevertheless, every Runtime Host has a separate 2 GB virtual address space limit. So would you launch 2 instances of a .Net application, each being a separate process in Task Manager, they would each have 2 GB limit.

.Net Application Domain

An Application Domain is the CLR equivalent of an Operating System’s process. As the Windows OS brings logical and physical isolation between Windows applications through the use of Processes, a single Runtime Host Windows Process can run several isolated .Net applications through the use of Application Domains. As explained before, Windows isolate processes by assigning different virtual memory address space to each process. In the .Net world, the memory is actively managed by the CLR and so the CLR can make sure that memory addresses are not shared between application domains, effectively isolating different Application Domains running in the same Runtime Host.

When a Runtime Host starts a .Net application, the CLR will create a default Application Domain to run the .Net application. As multiple Processes can run on a single OS, multiple Application Domains can run within the same Runtime Host.

An Application Domain is cheaper to create than a Windows Process and has relatively less overhead to maintain. It is thus more efficient to isolate .Net Application through Application Domains rather than Windows Processes. Application Domains are sometimes referenced as lightweight processes but strictly speaking, they are NOT processes.

To summarize, here is a list of advantages of having Application Domains within a Runtime Host Process (which are for most of them similar to the advantages of having Processes within an Operating System):

  • An Application Domain is a more lightweight mean to provide isolation between .Net applications than Processes.
  • A .Net application in an Application Domain can be stopped without affecting the state of another application running in a separate Application Domain.
  • A crash in an Application Domain will not affect other Application Domains neither the Runtime Host Process hosting the Application Domains.
  • Configuration information is part of an Application Domain scope, not the process’ scope.
  • Each Application Domain can have different security access levels assigned to them, all within the same Runtime Host Process.
  • Code in one .Net Application Domain cannot directly access memory in another Application Domain. If two .Net applications need to communicate across Application Domains, they need to use .Net Remoting to do so. In .Net 1.x, this kind of inter-process communication was expensive because the TCP/IP stack needed to be involved. In .Net 2.0, .Net Remoting supports named pipe remoting which is much more efficient. WCF in .Net 3.x has this feature as well.

List of BizTalk prerequesites redistributable CAB files

While I was installing BizTalk Server 2006 the other day, I decided to compile a list of links to the redistributable BizTalk prerequisite CAB file so that I do not have to search for it the next time I install BizTalk Server.

The BizTalk prerequisites CAB file is a compilation of all the prerequisites that are needed to install and run BizTalk Server.
The redistributable CAB file is different for different Windows versions (Windows XP, Vista, Server 2003, 64-bit or 32-bit) and languages (English, French, …). So, when chosing the prerequesite CAB file, make sure that it is the one matching your Windows Operating System version and language.

When installing BizTalk Server, the setup procedure gives you the choice to either automatically install the prerequisites from the web or to automatically install the prerequisites from a redistributable CAB file.
Choosing the redistributable CAB file options helps in speeding up the installation process and is necessary if the BizTalk server has a limited access or no access at all to the internet.

BizTalk Server 2006

For BizTalk Server 2006, the links to the redistributable prerequisites CAB files are listed in the BizTalk Server 2006 Installation and Upgrade Guides. The guides are MS Word documents containing links to all the prerequisites CAB files for all supported versions of the Windows Operating System.

BizTalk Server 2006 R2

For BizTalk Server 2006 R2, the links to the redistributable prerequisites CAB files can be found in the BizTalk Server 2006 R2 Installation and Upgrade Guides. Here again the guides are a bunch of different MS Word documents containing links to all the prerequisites CAB files for all supported versions of the Windows Operating System.

As BizTalk Server 2006 R2 is the current release of BizTalk Server 2006, the links to the prerequisites CAB files can also directly be found on the MSDN documentation site:
http://msdn2.microsoft.com/en-us/library/aa578652.aspx

For information purpose, here is a list of all requirements for BizTalk 2006 R2: http://download.microsoft.com/documents/australia/biztalk/post_event/BizTalkServer2006R2_SystemRequirements.pdf

Biztalk 64 bit high CPU utilization

I started to deploy BizTalk server 2006 64-bit edition on Windows Server 2003 64-bit in our production environment and we noticed an unusual high CPU utilization by the BizTalk host processes.

The symptoms were twofold:

1. The BizTalk 64 bit servers suffered from a high CPU usage (one server had its CPU usage constantly stuck at 100%). This happened even when the load on the server was low, in fact much lower than on other BizTalk Server 32 bit machines running the same applications on a similar hardware.
The reason for the server high CPU usage was that some BizTalk host processes had their CPU usage stuck at a high value even if the BizTalk application(s) run by the host would process very few messages or no message at all. Actually, the CPU usage for those hosts would still be stuck at a high value even if the BizTalk application(s) run by the host were stopped!

2. Even when the CPU utilization was showing 100%, the server was still processing messages as fast as you would expect from a server under little load. No messages were queued for processing and no orchestrations were dehydrated!

Those 2 points gave me the feeling that nothing could be wrong with the BizTalk server itself (by that I mean the BizTalk code and the BizTalk installation) as the same code is running fine on 32 bit servers and as the performance monitor readings did not match the actual facts.

I requested help from Microsoft Support and they guided me to the KB 943165 which has a hotfix solving the high CPU usage problem I encountered on BizTalk Server 64-bit edition.
This hotfix fixes a problem brought by the security bulletin MS07-040, a security update for the .Net framework which brings CPU usage spikes on BizTalk Server 64 bit (meaning that the problem is only for the 64 bit version of the .Net framework). For information, Microsoft Support told me that the problem is suspected to be caused by an infinite loop in CounterManager.RunCacheThread when System.Threading.ThreadAbortException is raised.

I should also add that I could not actually find a Windows update directly related to the bulletin MS07-040 in the list of updates installed on the server running BizTalk Server 64-bit. Nevertheless, my Microsoft Support contact told me that the version of mscorwks.dll (found in c:\WINDOWS\Microsoft.NET\Framework64\v2.0.50727) has a timestamp showing that it is later than the hotfixes issued by bulletin MS07-040 and that the problem should then also occur with any later version of the dll (as it is a code change from that version on that brings the problem).

To conclude, would you encounter the same symptoms I enumerated above, you will want the check the KB 943165 and contact Microsoft Product Support to see if your case is applicable for this hotfix. They will then email you the details to download the hotfix.

I installed this hotfix on all my BizTalk Server 2006 64 bit machines and all of them are running smoothly now 🙂

Turn off tracking globally in BizTalk Server 2006

The BizTalk Tracking (BizTalkDTADb) database grows in size as BizTalk Server processes data on your system. If the size of the BizTalk Tracking database causes poor disk performance or fills up the disk subsystem, you can manually purge the data from the Tracking database. See my previous post about how to purge and maintain the BizTalkDTADb database.
If you repeatedly have issues with the BizTalk tracking database, you may want to configure BizTalk to no longer collect tracking information. This is possible by turning off global tracking for the whole BizTalk Server.

Here is the procedure to turn off tracking globally for BizTalk server 2006 and 2006 R2:

  • Open SQL Server Management Studio and connect to the Database Server where the BizTalk Management Database is running, BizTalkMgmtDb
  • Expand the BizTalkMgmtDb, database, expend Tables, right-click the adm_Group table, and then click Open Table.
  • In the GlobalTrackingOption column, change the value from 1 to 0 and then press ENTER. A value of 0 turns off global tracking for the whole BizTalk server while a value of 1 turns it on.
  • Restart all your BizTalk hosts for the change to take effect.

As when Tracking is turned off, tracking information is not collected so no information will be available in HAT anymore (Health and Activity Tracking). You should consider this effect against keeping tracking data for a shorter time when thinking about turning off Tracking globally.

An alternative to turn off Tracking globally is to turn it off on an application per application basis.
I think that once a BizTalk application is deployed and running smoothly for a while, there is no reason to still have tracking turned on at all time.
If you deploy new BizTalk applications or keep updating existing ones on your production server regularly, you will probably want to be able to consult HAT after the application is freshly deployed. For that to be possible you would have to keep global tracking on.
In that case, I think that a best practice kind of approach is to have Tracking turned on for the freshly deployed application for a few days until you deem it running fine so that you do not need to consult HAT anymore. Once HAT is not needed regularly anymore, you can use the BizTalk Server Administration Console to disable tracking for all the artifacts belonging to the concerned application (orchestrations, ports, etc.). Would you need to turn it on again, it would just take a few minutes to configure tracking back on for the application’s artifacts.

Reference can be found at the MSDN BizTalk documentation How to Turn Off Global Tracking

BizTalk Messaging architecture

The core of the BizTalk Server product is the BizTalk Messaging subsystem, called the Message Bus. As said in the BizTalk documentation, The Message Bus is a publisher/subscriber model; indeed the Message Bus queries messages published into the BizTalk Message Box database looking for messages that match a particular subscription.

The most important point to understand about the publisher/subscriber model is that publishing and subscription concepts are relative to the Database.

Here is a picture illustrating the concept:

BizTalk Server Publisher Subscriber Architecture
The Messaging infrastructure is composed of the Message Box database and also different software components, called the Messaging Components. The Db and the components together compose the publisher/subscriber BizTalk Messaging subsystem, the Message Bus.

An important side effect of this architecture is that in BizTalk Server, messages are immutable once published (in the Message Box DB). This is because more than one end-point can subscribe to the same message. Would messages be mutable, some end-points might not match the subscriptions rule after the message has been modified. Having the subscription query result vary with time on the same message (due to change in the message payload) would break the publisher/subscription architecture thus making the whole BizTalk product unpredictable (and so useless).

1. The Message Box.

The Message Box is an SQL Server database which stores XML messages as well as metadata related to each messages. The message’s metadata is called the message context. Each metadata item (a key/value pair) of the message context is called a context property. The most important information to know about the message context is that it holds all the necessary information for message routing – message subscription.

2. Messaging Components.

While the Message Box database is the message storage facility of the Message Bus, the messaging components are software components that actually move messages between subscribers and publishers. They receive and send messages in and out of the BizTalk Server system.

2.1 Host Services.

A BizTalk host is a logical container. It provides the ability to structure the BizTalk application into groups that could be spread across multiple processes or machines.
When you create a host in BizTalk server, it creates a logical unit in which you can run different BizTalk applications or different type of BizTalk artifact.
For example, if your BizTalk applications are pretty small, you could create 1 host per BizTalk application you develop. On the opposite if your applications are big, you can create different hosts to separate logical grouping of your application, such adapters, orchestrations, ports and so on. If each host runs on a separate physical machine this help in balancing the load between processors (some sort of manual load balancing).

A host instance is simply a running instance of the host logical grouping. It runs as a Windows Service, each host being a separate windows process; a separate instance of BTSNTSvc.exe (or BTSNTSvc64.exe for BizTalk Server 64 bit).

As explained before, the host instance raison d’être is to provide logical grouping units. It does not implement itself the BizTalk runtime, it is a container where the BizTalk subservices run. These subservices running inside the host instance implements together the actual runtime of the BizTalk Message Bus.

Host instances can run all BizTalk subservices or only some of them depending on what type of BizTalk artifacts they are running. To understand which subservice is used by which type of artifact, here is a list of the different subservices running inside a BizTalk host instance – note that the list of services can be found in the the adm_HostInstance_SubServices table in the Management Database:

Service Description
Caching Service used to cache information that is loaded into the host. Examples of cached information would be assemblies that are loaded, adapter configuration information, custom configuration information, etc.
End Point Manager (EPM) Go-between for the Message Agent and the Adapter Framework. The EPM hosts send/receive ports and is responsible for executing pipelines and BizTalk transformations. The Message Agent is responsible to search for messages that match subscriptions and route them to the EPM.
Tracking Service that moves information from the Message Box to the Tracking Database.
XLANG/s Host engine for BizTalk Server orchestrations.
MSMQT MSMQT adapter service; serves as a replacement for the MSMQ protocol when interacting with BizTalk Server. The MSMQT protocol has been deprecated in BizTalk Server 2006 and should only be used to resolve backward compatibility issues.

2.2 Subscriptions.

In a publish/subscribe design, you have three components:

  • Publishers
  • Subscribers
  • Events

Publishers include:

  • Receive ports that publish messages that arrive in their receive locations
  • Orchestrations that publish messages when sending messages (orchestration send shape)
  • Orchestrations that start another orchestration asynchronously (start orchestration shape). On a side note, the call orchestration shape does not publish the message into the Message Box, the message is just passed as a parameter.
  • Solicit/response send ports publish messages when they receive a response from the target application or transport.

Subscriptions:

Subscription is the mechanism by which ports and orchestrations are able to receive and send messages within BizTalk server (see picture above).

A subscription is a collection of comparison statements, known as predicates, comparing the values of message context properties and the values specific to the subscription.

There are two types of subscriptions: activation and instance.

An activation subscription is one specifying that a message fulfilling a subscription should create a new instance of the subscriber when it is received. Examples of things that create activation subscriptions include:

  • Send ports with filters
  • send ports that are bound to orchestrations
  • orchestration receive shapes that have their Activate property set to true.

An instance subscription indicates that messages fulfilling the subscription should be routed to an already-running instance of the subscriber. Examples of things that create instance subscriptions are:

  • Orchestrations with correlated receives.
  • request/response-style ports waiting for a response.

It is also important to know that when you define filter criteria on a send port, you are actually modifying the subscription of the port. As a reminder, filter expressions determine which messages are routed to the send port from the Message Box.

Enlisting:

The process of enlisting a port simply means that a subscription is written for that port in the Message Box. Consequently, un-enlisted ports do not have subscriptions in the Message Box.
The same is true for other BizTalk artifacts. An un-enlisted orchestration is an orchestration ready to process messages but having no way to receive messages from the Messaging Engine as no subscription is created for it yet.

The difference between an un-enlisted artifacts and a stopped artifacts is that ports and orchestrations that are enlisted, but not started, will have any messages with matching subscription information queued within the Message Box and ready to be processed once the artifact is started. If the port or orchestration is not enlisted, the message routing will fail, since no subscription is available and the message will produce a “No matching subscriptions were found for the incoming message” exception within the Windows Event Log.

Typical port usage with an orchestration:

What happens in an orchestration that as a send shape connected to a logical port which is in turn bound to a physical port, is that the message sent by the send shape will have a TransportID context property set to a value that matches the physical port TransportID. As the TransportID uniquely defines the port, this mechanism assures that the physical port will always receive the messages coming from the orchestration. It does not mean that only that port will receive the message as due to the nature of a publisher/subscriber architecture, any other port having a subscription matching the message context will also receives the message.

2.3 Messages

As said earlier a Message is more than just an XML document. It is actually a message containing both data and context. To be more precise, a message is composed of context properties and zero or more message parts.

Keep in mind that message parts are not always XML document. If the message is received through a port using the pass-through pipeline, the message can be any kind of data including binary data. On a side note, a pass-through pipeline does not promote context properties; this makes sense as the message is not even supposed to be XML in a pass-through pipeline, so it is not possible to evaluate XPath expression on the message to determine the value of the context property.

As said earlier a message is immutable once it is published. This means that once stored in the MessageBox DB, it can’t be changed. A message can nevertheless be changed once it is out of the database. In a receive pipeline component, a message can be modified before it is published in the MessageBox. In a send pipeline component, a message can be modified after being received from the MessageBox. A typical place to create or modify a message is also inside an orchestration.

2.4 Message Context Properties

Message context properties are used for the subscription mechanism (routing the message to its appropriate end point). They are defined in a property schema. At runtime, the property values are stored into a context property bag.

The property schema is associated with the message schema within BizTalk so that every inbound schema-based message has a schema and a property schema attached to it.

The property schema consists of a global property schema that every message can use by default and of an optional custom property schema which can be created to define application-specific properties. Both types of properties are essentially the same at runtime and both are stored in the context property bag.

So, both types of properties can be used by the subscription mechanism to evaluate which endpoints have a subscription matching the message. The most common subscription is based on a global property called the messageType which is a combination of the XML namespace of the message and the root node name separated by a # character. Ex: http://www.abc.com#RootElementName.

Using subscription to route documents to the proper endpoint is called Content Based Routing (CBR).
For information, if the message is not schema-based, there will be no MessageType property value. Such is the case for binary data message.

Message context properties are populated by the BizTalk runtime in 2 artifacts:

  • The adapter writes and promotes into the message context properties related to the location, adapter type, and others properties related to the adapter.
  • The Receive Pipeline can write and promote properties into the message context in any of its pipeline components. Disassembling components are of particular interest because they promote the messageType property which is commonly used for Content Based Routing.

Property bag.

It is possible to use the BizTalk API in pipeline component code to read/write context properties from the property bag. The property bag is an object implementing the IBasePropertyBag interface. If you intend to use that interface in a custom pipeline to write properties that will be used for routing, you have to keep in mind that properties that are simply written into the property bag using the Write() method are not available for routing. To have a property available for routing, you need to promote the property with a different API call, the Promote() method. This method writes the property and its value in the property bag but ALSO flag the property as promoted and so make it available for routing.