PrefaceGarbage Collection and CLR is a pain point for many. Following article is written and extract taken from the reference URLs (see end of article). I tried to put all information in context to understand as your read through, in other words those questions that may be asked in interview. 95 per cent interviewer doesn't ask in depth but it is anytime good to impress them.
Why GC? Because GC and its processes in CLR covers critical understanding of .NET environment and how it works. I experienced once that interviewer started with "What is GC in .NET?" and I ended up spending almost half an hour on it discussing... funny?? But it’s true!!
All the best...
OverviewThe .NET Framework's garbage collector manages the allocation and release of memory for your application. Each time you create a new object, the common language runtime allocates memory for the object from the managed heap. As long as address space is available in the managed heap, the runtime continues to allocate space for new objects. However, memory is not infinite. Eventually the garbage collector must perform a collection in order to free some memory. The garbage collector's optimizing engine determines the best time to perform a collection, based upon the allocations being made. When the garbage collector performs a collection, it checks for objects in the managed heap that are no longer being used by the application and performs the necessary operations to reclaim their memory.
In the common language runtime (CLR), the garbage collector serves as an automatic memory manager. It provides the following benefits:
- Enables you to develop your application without having to free memory.
- Allocates objects on the managed heap efficiently.
- Reclaims objects that are no longer being used, clears their memory, and keeps the memory available for future allocations. Managed objects automatically get clean content to start with, so their constructors do not have to initialize every data field.
- Provides memory safety by making sure that an object cannot use the content of another object.
- Virtual memory can be in three states:
- Free. The block of memory has no references to it and is available for allocation.
- Reserved. The block of memory is available for your use and cannot be used for any other allocation request. However, you cannot store data to this memory block until it is committed.
- Committed. The block of memory is assigned to physical storage.
- Virtual address space can get fragmented. This means that there are free blocks, also known as holes, in the address space. When a virtual memory allocation is requested, the virtual memory manager has to find a single free block that is large enough to satisfy that allocation request. Even if you have 2 GB of free space, the allocation that requires 2 GB will be unsuccessful unless all of that space is in a single address block.
- You can run out of memory if you run out of virtual address space to reserve or physical space to commit.
Stack and Heap in Depth
- Managed Heap – for applications and assemblies runs in .NET Framework umbrella and GC is responsible to its memory management(in this section term “heap” means “managed heap”)
- Heap – for unmanaged applications and assemblies outside .Net Framework. We are not discussing it here.
- Value Type: The items which use System.ValueType when they are created. Example bool, byte, char, decimal, double, enum, float, int, long, sbyte, short, struct, uint, ulong and ushort.
- Reference Type: All the items declared with the types in this list are Reference types and inherit from System.Object. Example class, interface, delegate, object and string.
- Pointer: The item to be put in our memory management scheme is a Reference to a Type. A Reference is often referred to as a Pointer. We don't explicitly use Pointers, they are managed by the Common Language Runtime (CLR). A Pointer (or Reference) is different than a Reference Type in that when we say something is a Reference Type is means we access it through a Pointer. A Pointer is a chunk of space in memory that points to another space in memory. A Pointer takes up space just like any other thing that we're putting in the Stack and Heap and its value is either a memory address or null.
- Instruction: When compiling to managed code, the compiler translates your source code into Microsoft intermediate language (MSIL), which is a CPU-independent set of instructions that can be efficiently converted to native code. MSIL includes instructions for loading, storing, initializing, and calling methods on objects, as well as instructions for arithmetic and logical operations, control flow, direct memory access, exception handling, and other operations. Before code can be run, MSIL must be converted to CPU-specific code, usually by a just-in-time (JIT) compiler. Because the common language runtime supplies one or more JIT compilers for each computer architecture it supports, the same set of MSIL can be JIT-compiled and run on any supported architecture.
- A Reference Type always goes on the Heap - easy enough, right?
- Value Types and Pointers always go where they were declared. This is a little more complex and needs a bit more understanding of how the Stack works to figure out where items are declared.
- When our code makes a call to execute a method the thread starts executing the instructions that have been JIT compiled and live on the method table, it also puts the method's parameters on the thread stack.
- Then, as we go through the code and run into variables within the method they are placed on top of the stack. Once we start executing the method, the method's parameters are placed on the stack.
- Next, control (the thread executing the method) is passed to the instructions to the method which lives in our type's method table, a JIT compilation is performed if this is the first time we are hitting the method.
- As the method executes, we need some memory for the "result" variable and it is allocated on the stack.
- And all memory allocated on the stack is cleaned up by moving a pointer to the available memory address where method started and we go down to the previous method on the stack.
- "result" variable is placed on the stack. As a matter of fact, every time a Value Type is declared within the body of a method, it will be placed on the stack.
- Space is allocated for information needed for the execution of our method on the stack (called a Stack Frame). This includes the calling address (a pointer) which is basically a GOTO instruction so when the thread finishes running our method it knows where to go back to in order to continue execution.
- Our method parameters are copied over. This is what we want to look at more closely.
- Control is passed to the JIT'ted method and the thread starts executing code. Hence, we have another method represented by a stack frame on the "call stack".
Condition for Garbage Collection
- The system has low physical memory.
- The memory that is used by allocated objects on the managed heap surpasses an acceptable threshold. This means that a threshold of acceptable memory usage has been exceeded on the managed heap. This threshold is continuously adjusted as the process runs.
- The GC.Collect method is called. In almost all cases, you do not have to call this method, because the garbage collector runs continuously. This method is primarily used for unique situations and testing.
What is Managed Heap and how it related to GC?
What are the Generations in GC?
Survival and Promotions in GC
What are Ephemeral Generations and Segments in GC and how it works?
What happens during Garbage Collection?
- A marking phase that finds and creates a list of all live objects.
- A relocating phase that updates the references to the objects that will be compacted.
- A compacting phase that reclaims the space occupied by the dead objects and compacts the surviving objects. The compacting phase moves objects that have survived a garbage collection toward the older end of the segment.
- Stack roots: Stack variables provided by the just-in-time (JIT) compiler and stack walker.
- Garbage collection handles: Handles that point to managed objects and that can be allocated by user code or by the common language runtime.
- Static data: Static objects in application domains that could be referencing other objects. Each application domain keeps track of its static objects.
How Unmanaged Resources managed in Garbage Collection?
What is Workstation and Server Garbage Collection?
- Workstation garbage collection, which is for all client workstations and stand-alone PCs. This is the default setting for the <gcServer> element in the runtime configuration schema.
- Workstation garbage collection can be concurrent or non-concurrent. Concurrent garbage collection enables managed threads to continue operations during a garbage collection.
- Starting with the .NET Framework 4, background garbage collection replaces concurrent garbage collection.
Configuring Garbage Collection; How?
Comparing Workstation and Server Garbage Collection
- The collection occurs on the user thread that triggered the garbage collection and remains at the same priority. Because user threads typically run at normal priority, the garbage collector (which runs on a normal priority thread) must compete with other threads for CPU time.
- Threads that are running native code are not suspended.
- Workstation garbage collection is always used on a computer that has only one processor, regardless of the <gcServer> setting. If you specify server garbage collection, the CLR uses workstation garbage collection with concurrency disabled.
- The collection occurs on multiple dedicated threads that are running at THREAD_PRIORITY_HIGHEST priority level.
- A dedicated thread to perform garbage collection and a heap are provided for each CPU, and the heaps are collected at the same time. Each heap contains a small object heap and a large object heap, and all heaps can be accessed by user code. Objects on different heaps can refer to each other.
- Because multiple garbage collection threads work together, server garbage collection is faster than workstation garbage collection on the same size heap.
- Server garbage collection often has larger size segments.
- Server garbage collection can be resource-intensive. For example, if you have 12 processes running on a computer that has 4 processors, there will be 48 dedicated garbage collection threads if they are all using server garbage collection. In a high memory load situation, if all the processes start doing garbage collection, the garbage collector will have 48 threads to schedule.
Concurrent Garbage Collection
What is Background Garbage Collection?
Background Server Garbage Collection
What are Weak References?
- Short - The target of a short weak reference becomes null when the object is reclaimed by garbage collection. The weak reference is itself a managed object, and is subject to garbage collection just like any other managed object. A short weak reference is the default constructor for WeakReference.
- Long - A long weak reference is retained after the object's Finalize method has been called. This allows the object to be recreated, but the state of the object remains unpredictable. To use a long reference, specify true in the WeakReference constructor. If the object's type does not have a Finalize method, the short weak reference functionality applies and the weak reference is valid only until the target is collected, which can occur any time after the finalizer is run.
What is Latency in GC and what are the Latency Modes?
- LowLatency suppresses generation 2 collections and performs only generation 0 and 1 collections. It can be used only for short periods of time. Over longer periods, if the system is under memory pressure, the garbage collector will trigger a collection, which can briefly pause the application and disrupt a time-critical operation. This setting is available only for workstation garbage collection.
- SustainedLowLatency suppresses foreground generation 2 collections and performs only generation 0, 1, and background generation 2 collections. It can be used for longer periods of time, and is available for both workstation and server garbage collection. This setting cannot be used if concurrent garbage collection is disabled.
- The system receives a low memory notification from the operating system.
- Your application code induces a collection by calling the GCCollect method and specifying 2 for the generation parameter.
- Keep the period of time in low latency as short as possible.
- Avoid allocating high amounts of memory during low latency periods. Low memory notifications can occur because garbage collection reclaims fewer objects.
- While in the low latency mode, minimize the number of allocations you make, in particular allocations onto the Large Object Heap and pinned objects.
- Be aware of threads that could be allocating. Because the LatencyMode property setting is process-wide, you could generate an OutOfMemoryException on any thread that may be allocating.
- Wrap the low latency code in constrained execution regions (for more information, see Constrained Execution Regions).
- You can force generation 2 collections during a low latency period by calling the GCCollect(Int32, GCCollectionMode) method.
Using "Using"Garbage Collection always impacts performance as you have seen that it suspends all other threads. We can’t do away with this in simple implementation but we can work out to increase performance of Garbage Collection. One main reason why Garbage Collection takes time is to make sure that it is not deleting any object which is in use. But at the same time Garbage Collector doesn’t make sure that it has 100 per cent efficiency. To avoid the load of Garbage Collector to identify which objects to clear one way is use of statement "using". When we use "using" statement after the scope of operation dispose method automatically get called. It basically sets the objects for Garbage Collection and hereby reduces the load the Garbage Collector. But "using" statement has some one condition, user need to implement Dispose method using IDisposable interface.
Now, what I am missing is how GC works in summarized flow, let’s refresh it again and conclude this article
Phase I: Mark
- The GC identifies live object references or application roots.
- It starts walking the roots and building a graph of all objects reachable from the roots.
- If the GC attempts to add an object already present in the graph, then it stops walking down that path. This serves two purposes. First, it helps performance significantly since it doesn’t walk through a set of objects more than once. Second, it prevents infinite loops should you have any circular linked lists of objects. Thus cycles are handles properly.
- The garbage collector now walks through the heap linearly, looking for contiguous blocks of garbage objects (now considered free space).
- The garbage collector then shifts the non-garbage objects down in memory, removing all of the gaps in the heap.
- Moving the objects in memory invalidates all pointers to the objects. So the garbage collector modifies the application’s roots so that the pointers point to the objects’ new locations.
- In addition, if any object contains a pointer to another object, the garbage collector is responsible for correcting these pointers as well.
- Weak references
- Short weak reference table - the object which has a short weak reference to itself is collected immediately without running its finalization method.
- Long weak reference table - the garbage collector collects object pointed to by the long weak reference table only after determining that the object’s storage is reclaimable. If the object has a Finalize method, the Finalize method has been called and the object was not resurrected.
- Newly created objects tend to have short lives i.e. Gen 0 and 1.
- The older an object is, the longer it will survive Gen 2.