All the best...
Overview
The .NET Framework's garbage collector manages the allocation and release of memory for your application. Each time you create a new object, the common language runtime allocates memory for the object from the managed heap. As long as address space is available in the managed heap, the runtime continues to allocate space for new objects. However, memory is not infinite. Eventually the garbage collector must perform a collection in order to free some memory. The garbage collector's optimizing engine determines the best time to perform a collection, based upon the allocations being made. When the garbage collector performs a collection, it checks for objects in the managed heap that are no longer being used by the application and performs the necessary operations to reclaim their memory.
In the common language runtime (CLR), the garbage collector serves as an automatic memory manager. It provides the following benefits:
- Enables you to develop your
application without having to free memory.
- Allocates objects on the managed
heap efficiently.
- Reclaims objects that are no longer
being used, clears their memory, and keeps the memory available for future
allocations. Managed objects automatically get clean content to start
with, so their constructors do not have to initialize every data field.
- Provides memory safety by making
sure that an object cannot use the content of another object.
Virtual Memory
- Virtual memory can be in three
states:
- Free. The block of memory
has no references to it and is available for allocation.
- Reserved. The block of
memory is available for your use and cannot be used for any other
allocation request. However, you cannot store data to this memory block
until it is committed.
- Committed. The block of
memory is assigned to physical storage.
- Virtual address space can get
fragmented. This means that there are free blocks, also known as holes, in
the address space. When a virtual memory allocation is requested, the
virtual memory manager has to find a single free block that is large
enough to satisfy that allocation request. Even if you have 2 GB of free
space, the allocation that requires 2 GB will be unsuccessful unless all
of that space is in a single address block.
- You can run out of memory if you run
out of virtual address space to reserve or physical space to commit.
Stack and Heap in Depth
We need to
understand stack and heap before we move forward with Garbage Collection. There are three types of virtual space available
to play with
- Stack
- Managed Heap – for applications and assemblies
runs in .NET Framework umbrella and GC is responsible to its memory
management(in this section term “heap” means “managed heap”)
- Heap – for unmanaged applications and assemblies
outside .Net Framework. We are not
discussing it here.
The Stack is
more or less responsible for keeping track of what's executing in our
code. The Heap is more or less
responsible for keeping track of our objects. The Stack is self-maintaining,
meaning that it basically takes care of its own memory management. When the top box is no longer used, it's
thrown out. The Heap, on the other hand,
has to worry about Garbage collection which deals with how to keep the Heap
clean.
What goes on the Stack and Heap?
Four main
types of things go in the Stack and Heap as our code is executing: Value Types,
Reference Types, Pointers, and Instructions.
- Value
Type: The items which use System.ValueType when they are created. Example
bool, byte, char, decimal, double, enum, float, int, long, sbyte, short,
struct, uint, ulong and ushort.
- Reference
Type: All the items declared with the types in this list are Reference
types and inherit from System.Object. Example class, interface, delegate,
object and string.
- Pointer:
The item to be put in our memory management scheme is a Reference to a Type. A
Reference is often referred to as a Pointer.
We don't explicitly use Pointers, they are managed by the Common
Language Runtime (CLR). A Pointer (or Reference) is different than a Reference
Type in that when we say something is a Reference Type is means we access it
through a Pointer. A Pointer is a chunk
of space in memory that points to another space in memory. A Pointer takes up space just like any other
thing that we're putting in the Stack and Heap and its value is either a memory
address or null.
- Instruction:
When compiling to managed code, the compiler translates your source code into
Microsoft intermediate language (MSIL), which is a CPU-independent set of
instructions that can be efficiently converted to native code. MSIL includes
instructions for loading, storing, initializing, and calling methods on
objects, as well as instructions for arithmetic and logical operations, control
flow, direct memory access, exception handling, and other operations. Before
code can be run, MSIL must be converted to CPU-specific code, usually by a
just-in-time (JIT) compiler. Because the common language runtime supplies one
or more JIT compilers for each computer architecture it supports, the same set
of MSIL can be JIT-compiled and run on any supported architecture.
Here are our two golden rules:
- A Reference Type always goes on the Heap - easy
enough, right?
- Value Types and Pointers always go where they
were declared. This is a little more
complex and needs a bit more understanding of how the Stack works to figure out
where items are declared.
The Stack,
as we mentioned earlier, is responsible for keeping track of where each thread
is during the execution of our code (or what's been called). This means that
each thread has its own stack.
- When our code makes a call to execute a method
the thread starts executing the instructions that have been JIT compiled and
live on the method table, it also puts the method's parameters on the thread
stack.
- Then, as we go through the code and run into
variables within the method they are placed on top of the stack. Once we start
executing the method, the method's parameters are placed on the stack.
- Next, control (the thread executing the method)
is passed to the instructions to the method which lives in our type's method
table, a JIT compilation is performed if this is the first time we are hitting
the method.
- As the method executes, we need some memory for
the "result" variable and it is allocated on the stack.
- And all memory allocated on the stack is cleaned
up by moving a pointer to the available memory address where method started and
we go down to the previous method on the stack.
- "result" variable is placed on the
stack. As a matter of fact, every time a
Value Type is declared within the body of a method, it will be placed on the
stack.
Value Types
are also sometimes placed on the Heap.
Remember the rule; Value Types always go where they were declared? Well, if a Value Type is declared outside of
a method, but inside a Reference Type it will be placed within the Reference
Type on the Heap. Let’s assume a method whose return type is user defined class
object, it’s a reference type and stored in heap. Then after execution of
method stack will get cleared as stated above but the result will be stored in
heap.
* Once
our program reaches a certain memory threshold and we need more Heap space, our
GC will kick off. The GC will stop all
running threads (a FULL STOP), find all objects in the Heap that are not being
accessed by the main program and delete them.
The GC will then reorganize all the objects left in the Heap to make
space and adjust all the Pointers to these objects in both the Stack and the
Heap. As you can imagine, this can be
quite expensive in terms of performance, so now you can see why it can be important
to pay attention to what's in the Stack and Heap when trying to write
high-performance code.
Parameters - When we make a method call
here's what happens:
- Space is allocated for information needed for
the execution of our method on the stack (called a Stack Frame). This includes
the calling address (a pointer) which is basically a GOTO instruction so when
the thread finishes running our method it knows where to go back to in order to
continue execution.
- Our method parameters are copied over. This is
what we want to look at more closely.
- Control is passed to the JIT'ted method and the
thread starts executing code. Hence, we have another method represented by a
stack frame on the "call stack".
Passing
Value Type - When we are passing a value types, space is allocated (new class
object of Type is created) and the value in our type is copied to the new space
on the stack.
* If
we have a very large value type (such as a big struct) and pass it to the
stack, it can get very expensive in terms of space and processor cycles to copy
it over each time. The stack does not have infinite space and just like filling
a glass of water from the tap, it can overflow.
Passing
Reference Type - It is similar to passing value types by reference; that means
no new object is created only the reference to the memory location is passed.
So, duplicate copies are not created which is sometimes efficient way when
passing large value type like struct.
Condition for Garbage Collection
Garbage
collection occurs when one of the following conditions is true:
- The system has low physical memory.
- The memory that is used by
allocated objects on the managed heap surpasses an acceptable threshold.
This means that a threshold of acceptable memory usage has been exceeded
on the managed heap. This threshold is continuously adjusted as the
process runs.
- The GC.Collect method is called. In
almost all cases, you do not have to call this method, because the garbage
collector runs continuously. This method is primarily used for unique
situations and testing.
What is Managed Heap and how it related to GC?
After the
garbage collector is initialized by the CLR, it allocates a segment of memory
to store and manage objects. This memory is called the managed heap, as opposed
to a native heap in the operating system.
There is a
managed heap for each managed process. All threads in the process allocate
memory for objects on the same heap. To reserve memory, the garbage collector
calls the Win32 VirtualAlloc function and reserves one segment of memory at a
time for managed applications. The garbage collector also reserves segments as
needed, and releases segments back to the operating system (after clearing them
of any objects) by calling the Win32 VirtualFree function.
The fewer
objects allocated on the heap, the less work the garbage collector has to do.
When you allocate objects, do not use rounded-up values that exceed your needs,
such as allocating an array of 32 bytes when you need only 15 bytes.
When a
garbage collection is triggered, the garbage collector reclaims the memory that
is occupied by dead objects. The reclaiming process compacts live objects so
that they are moved together, and the dead space is removed, thereby making the
heap smaller. GC takes care of re-referencing of live objects and update root
table. This ensures that objects that are allocated together stay together on
the managed heap, to preserve their locality. This process of re- referencing
breaks any functionality which uses unsafe pointers to the managed objects.
The
intrusiveness (frequency and duration) of garbage collections is the result of
the volume of allocations and the amount of survived memory on the managed
heap.
The heap can
be considered as the accumulation of two heaps: the large object heap and the
small object heap. The large object heap contains objects that are 85,000 bytes
and larger. Very large objects on the large object heap are usually arrays. It
is rare for an instance object to be extremely large.
What are the Generations in GC?
The managed
heap is organized into generations so it can handle long-lived and short-lived
objects. Garbage collection primarily occurs with the reclamation of
short-lived objects that typically occupy only a small part of the heap.
There are
three generations of objects on the heap:
Generation 0 - This is the youngest
generation and contains short-lived objects. An example of a short-lived object
is a temporary variable. Garbage collection occurs most frequently in this
generation.
Newly
allocated objects form a new generation of objects and are implicitly
generation 0 collections, unless they are large objects, in which case they go
on the large object heap in a generation 2 collections. Most objects are
reclaimed for garbage collection in generation 0 and do not survive to the next
generation.
Generation 1 - This generation contains
short-lived objects and serves as a buffer between short-lived objects and
long-lived objects.
Generation 2 - This generation contains
long-lived objects. An example of a long-lived object is an object in a server
application that contains static data that is live for the duration of the
process.
Garbage
collections occur on specific generations as conditions warrant. Collecting a
generation means collecting objects in that generation and all its younger
generations. A generation 2 garbage collection is also known as a full garbage collection, because it
reclaims all objects in all generations (that is, all objects in the managed
heap).
Survival and Promotions in GC
Objects that
are not reclaimed in a garbage collection are known as survivors, and are
promoted to the next generation. Objects that survive a generation 0 garbage
collections are promoted to generation 1; objects that survive a generation 1
garbage collection are promoted to generation 2; and objects that survive a
generation 2 garbage collection remain in generation 2.
When the
garbage collector detects that the survival rate is high in a generation, it
increases the threshold of allocations for that generation, so the next
collection gets a substantial size of reclaimed memory. The CLR continually
balances two priorities: not letting an application's working set get too big
and not letting the garbage collection take too much time.
What are Ephemeral Generations and Segments in GC
and how it works?
Because
objects in generations 0 and 1 are short-lived, these generations are known as
the ephemeral generations. Ephemeral generations must be allocated in the
memory segment that is known as the ephemeral segment.
Each new
segment acquired by the garbage collector becomes the new ephemeral segment and
contains the objects that survived a generation 0 garbage collection. The old
ephemeral segment becomes the new generation 2 segment. The ephemeral segment
can include generation 2 objects. Generation 2 objects can use multiple
segments (as many as your process requires and memory allows for).
The amount
of freed memory from an ephemeral garbage collection is limited to the size of
the ephemeral segment. The amount of memory that is freed is proportional to
the space that was occupied by the dead objects.
What happens during Garbage Collection?
A garbage
collection has the following phases:
- A marking phase that finds and creates a list of
all live objects.
- A relocating phase that updates the references
to the objects that will be compacted.
- A compacting phase that reclaims the space
occupied by the dead objects and compacts the surviving objects. The compacting
phase moves objects that have survived a garbage collection toward the older
end of the segment.
* Because generation 2 collections can occupy
multiple segments, objects that are promoted into generation 2 can be moved
into an older segment. Both generation 1 and generation 2 survivors can be
moved to a different segment, because they are promoted to generation 2. The
large object heap is not compacted, because copying large objects imposes a
performance penalty.
The garbage
collector uses the following information to determine whether objects are live:
- Stack roots: Stack variables provided by the
just-in-time (JIT) compiler and stack walker.
- Garbage collection handles: Handles that point
to managed objects and that can be allocated by user code or by the common
language runtime.
- Static data: Static objects in application
domains that could be referencing other objects. Each application domain keeps
track of its static objects.
*
Before a garbage collection starts, all managed threads are suspended except
for the thread that triggered the garbage collection.
How Unmanaged Resources managed in Garbage
Collection?
If your
managed objects reference unmanaged objects by using their native file handles,
you have to explicitly free the unmanaged objects, because the garbage
collector tracks memory only on the managed heap.
Users of
your managed object may not dispose the native resources used by the object. To
perform the clean-up, you can make your managed object finalizable.
Finalization consists of clean-up actions that you execute when the object is
no longer in use. When your managed object dies, it performs clean-up actions
that are specified in its finalizer method.
When a
finalizable object is discovered to be dead, its finalizer is put in a queue so
that its clean-up actions are executed, but the object itself is promoted to
the next generation. Therefore, you have to wait until the next garbage
collection that occurs on that generation (which is not necessarily the next
garbage collection) to determine whether the object has been reclaimed.
What is Workstation and Server Garbage Collection?
The garbage
collector is self-tuning and can work in a wide variety of scenarios. The only
option you can set is the type of garbage collection, based on the
characteristics of the workload. The CLR provides the following types of
garbage collection:
- Workstation garbage collection, which is for all
client workstations and stand-alone PCs. This is the default setting for the <gcServer> element in the runtime configuration schema.
- Workstation garbage collection can be concurrent
or non-concurrent. Concurrent garbage collection enables managed threads to
continue operations during a garbage collection.
- Starting with the .NET Framework 4, background
garbage collection replaces concurrent garbage collection.
* Server garbage collection, which is intended
for server applications that need high throughput and scalability. Server
garbage collection can be non-concurrent or background.
Configuring Garbage Collection; How?
You can use
the <gcServer> element of the runtime configuration schema to specify
the type of garbage collection you want the CLR to perform. When this element's
enabled attribute is set to false (the default), the CLR performs workstation
garbage collection. When you set the enabled attribute to true, the CLR
performs server garbage collection.
Concurrent
garbage collection is specified with the <gcConcurrent>
element of the runtime configuration schema. The default setting is enabled.
Concurrent garbage collection is available only for workstation garbage
collection and has no effect on server garbage collection.
You can also
specify server garbage collection with unmanaged hosting interfaces. Note that
ASP.NET and SQL Server enable server garbage collection automatically if your
application is hosted inside one of these environments.
Comparing Workstation and Server Garbage Collection
Threading
and performance considerations for workstation garbage collection:
- The collection occurs on the user thread that
triggered the garbage collection and remains at the same priority. Because user
threads typically run at normal priority, the garbage collector (which runs on
a normal priority thread) must compete with other threads for CPU time.
- Threads that are running native code are not
suspended.
- Workstation garbage collection is always used on
a computer that has only one processor, regardless of the <gcServer> setting. If you specify server garbage collection,
the CLR uses workstation garbage collection with concurrency disabled.
Threading
and performance considerations for server garbage collection:
- The collection occurs on multiple dedicated
threads that are running at THREAD_PRIORITY_HIGHEST priority level.
- A dedicated thread to perform garbage collection
and a heap are provided for each CPU, and the heaps are collected at the same
time. Each heap contains a small object heap and a large object heap, and all
heaps can be accessed by user code. Objects on different heaps can refer to
each other.
- Because multiple garbage collection threads work
together, server garbage collection is faster than workstation garbage
collection on the same size heap.
- Server garbage collection often has larger size
segments.
- Server garbage collection can be
resource-intensive. For example, if you have 12 processes running on a computer
that has 4 processors, there will be 48 dedicated garbage collection threads if
they are all using server garbage collection. In a high memory load situation,
if all the processes start doing garbage collection, the garbage collector will
have 48 threads to schedule.
* If
you are running hundreds of instances of an application, consider using
workstation garbage collection with concurrent garbage collection disabled.
This will result in less context switching, which can improve performance.
Concurrent Garbage Collection
In
workstation or server garbage collection, you can enable concurrent garbage
collection, which enables threads to run concurrently with a dedicated thread
that performs the garbage collection for most of the duration of the
collection. This option affects only garbage collections in generation 2;
generations 0 and 1 are always non-concurrent because they finish very fast.
Concurrent
garbage collection enables interactive applications to be more responsive by
minimizing pauses for a collection. Managed threads can continue to run most of
the time while the concurrent garbage collection thread is running. This
results in shorter pauses while a garbage collection is occurring.
To improve
performance when several processes are running, disable concurrent garbage
collection.
Concurrent
garbage collection is performed on a dedicated thread. By default, the CLR runs
workstation garbage collection with concurrent garbage collection enabled. This
is true for single-processor and multi-processor computers.
Your ability
to allocate small objects on the heap during a concurrent garbage collection is
limited by the objects left on the ephemeral segment when a concurrent garbage
collection starts. As soon as you reach the end of the segment, you will have
to wait for the concurrent garbage collection to finish while managed threads
that have to make small object allocations are suspended.
Concurrent
garbage collection has a slightly bigger working set (compared with
non-concurrent garbage collection), because you can allocate objects during
concurrent collection. However, this can affect performance, because the
objects that you allocate become part of your working set. Essentially,
concurrent garbage collection trades some CPU and memory for shorter pauses.
What is Background Garbage Collection?
*
Background garbage collection is available only in the .NET Framework 4 and
later versions. In the .NET Framework 4, it is supported only for workstation
garbage collection. Starting with the .NET Framework 4.5, background garbage
collection is available for both workstation and server garbage collection.
In
background garbage collection, ephemeral generations (0 and 1) are collected as
needed while the collection of generation 2 is in progress. There is no setting
for background garbage collection; it is automatically enabled with concurrent
garbage collection. Background garbage collection is a replacement for
concurrent garbage collection. As with concurrent garbage collection,
background garbage collection is performed on a dedicated thread and is
applicable only to generation 2 collections.
A collection
on ephemeral generations during background garbage collection is known as
foreground garbage collection. When foreground garbage collections occur, all
managed threads are suspended.
When
background garbage collection is in progress and you have allocated enough
objects in generation 0, the CLR performs a generation 0 or generation 1
foreground garbage collection. The dedicated background garbage collection
thread checks at frequent safe points to determine whether there is a request
for foreground garbage collection. If there is, the background collection
suspends itself so that foreground garbage collection can occur. After the
foreground garbage collection is completed, the dedicated background garbage
collection thread and user threads resume.
Background
garbage collection removes allocation restrictions imposed by concurrent
garbage collection, because ephemeral garbage collections can occur during
background garbage collection. This means that background garbage collection
can remove dead objects in ephemeral generations and can also expand the heap
if needed during a generation 1 garbage collection.
Background Server Garbage Collection
Starting
with the .NET Framework 4.5, background server garbage collection is the
default mode for server garbage collection. To choose this mode, set the
enabled attribute of the <gcServer>
element to true in the runtime configuration schema. This mode functions
similarly to background workstation garbage collection, described in the
previous section, but there are a few differences. Background workstation
garbage collection uses one dedicated background garbage collection thread,
whereas background server garbage collection uses multiple threads, typically a
dedicated thread for each logical processor. Unlike the workstation background
garbage collection thread, these threads do not time out.
What are Weak References?
The garbage collector cannot collect an object in use by an
application while the application's code can reach that object. The application
is said to have a strong reference to the object.
A weak reference permits the garbage collector to collect
the object while still allowing the application to access the object. A weak
reference is valid only during the indeterminate amount of time until the
object is collected when no strong references exist. When you use a weak
reference, the application can still obtain a strong reference to the object,
which prevents it from being collected. However, there is always the risk that
the garbage collector will get to the object first before a strong reference is
re-established.
Weak references are useful for objects that use a lot of
memory, but can be recreated easily if they are reclaimed by garbage
collection.
Suppose a tree view in a Windows Forms application displays
a complex hierarchical choice of options to the user. If the underlying data is
large, keeping the tree in memory is inefficient when the user is involved with
something else in the application.
When the user switches away to another part of the
application, you can use the WeakReference class to create a weak reference to
the tree and destroy all strong references. When the user switches back to the
tree, the application attempts to obtain a strong reference to the tree and, if
successful, avoids reconstructing the tree.
To establish
a weak reference with an object, you create a WeakReference using the instance
of the object to be tracked. You then set the Target property to that object
and set the original reference to the object to null. For a code example, see
WeakReference in the class library.
You can
create a short weak reference or a long weak reference:
- Short -
The target of a short weak reference becomes null when the object is reclaimed
by garbage collection. The weak reference is itself a managed object, and is
subject to garbage collection just like any other managed object. A short weak
reference is the default constructor for WeakReference.
- Long -
A long weak reference is retained after the object's Finalize method has been
called. This allows the object to be recreated, but the state of the object
remains unpredictable. To use a long reference, specify true in the
WeakReference constructor. If the object's type does not have a Finalize
method, the short weak reference functionality applies and the weak reference
is valid only until the target is collected, which can occur any time after the
finalizer is run.
To
establish a strong reference and use the object again, cast the Target property
of a WeakReference to the type of the object. If the Target property returns
null, the object was collected; otherwise, you can continue to use the object
because the application has regained a strong reference to it.
What is Latency in GC and what are the Latency
Modes?
To reclaim
objects, the garbage collector must stop all the executing threads in an
application. In some situations, such as when an application retrieves data or
displays content, a full garbage collection can occur at a critical time and
impede performance. You can adjust the intrusiveness of the garbage collector
by setting the GCSettingsLatencyMode property to one of the GCLatencyMode
values.
Latency
refers to the time that the garbage collector intrudes in your application.
During low latency periods, the garbage collector is more conservative and less
intrusive in reclaiming objects. The GCLatencyMode enumeration provides two low
latency settings:
- LowLatency
suppresses generation 2 collections and performs only generation 0 and 1
collections. It can be used only for short periods of time. Over longer
periods, if the system is under memory pressure, the garbage collector will
trigger a collection, which can briefly pause the application and disrupt a
time-critical operation. This setting is available only for workstation garbage
collection.
- SustainedLowLatency
suppresses foreground generation 2 collections and performs only generation 0,
1, and background generation 2 collections. It can be used for longer periods
of time, and is available for both workstation and server garbage collection.
This setting cannot be used if concurrent garbage collection is disabled.
During low
latency periods, generation 2 collections are suppressed unless the following
occurs:
- The system receives a low memory notification from
the operating system.
- Your application code induces a collection by
calling the GCCollect method and specifying 2 for the generation parameter.
When you use
LowLatency mode, consider the following guidelines:
- Keep the period of time in low latency as short
as possible.
- Avoid allocating high amounts of memory during
low latency periods. Low memory notifications can occur because garbage
collection reclaims fewer objects.
- While in the low latency mode, minimize the
number of allocations you make, in particular allocations onto the Large Object
Heap and pinned objects.
- Be aware of threads that could be allocating.
Because the LatencyMode property setting is process-wide, you could generate an
OutOfMemoryException on any thread that may be allocating.
- Wrap the low latency code in constrained
execution regions (for more information, see Constrained Execution Regions).
- You can force generation 2 collections during a
low latency period by calling the GCCollect(Int32, GCCollectionMode) method.
Using "Using"
Garbage Collection always impacts performance as you have seen that it suspends all other threads. We can’t do away with this in simple implementation but we can work out to increase performance of Garbage Collection. One main reason why Garbage Collection takes time is to make sure that it is not deleting any object which is in use. But at the same time Garbage Collector doesn’t make sure that it has 100 per cent efficiency. To avoid the load of Garbage Collector to identify which objects to clear one way is use of statement "using". When we use "using" statement after the scope of operation dispose method automatically get called. It basically sets the objects for Garbage Collection and hereby reduces the load the Garbage Collector. But "using" statement has some one condition, user need to implement Dispose method using IDisposable interface.
Now, what I am missing is how GC works in summarized flow, let’s refresh it again and conclude this article
Garbage
collection in .NET is done using tracing collection and specifically the CLR
implements the Mark/Compact collector. This method consists of two phases as described
below.
Phase I: Mark
When the
garbage collector starts running, it makes the assumption that all objects in
the heap are garbage. In other words, it assumes that none of the application’s
roots refer to any objects in the heap.
- The GC identifies live object references or
application roots.
- It starts walking the roots and building a graph
of all objects reachable from the roots.
- If the GC attempts to add an object already
present in the graph, then it stops walking down that path. This serves two purposes.
First, it helps performance significantly since it doesn’t walk through a set
of objects more than once. Second, it prevents infinite loops should you have
any circular linked lists of objects. Thus cycles are handles properly.
Once all the
roots have been checked, the garbage collector’s graph contains the set of all
objects that are somehow reachable from the application’s roots; any objects
that are not in the graph are not accessible by the application, and are
therefore considered garbage.
Phase II: Compact
Move all the
live objects to the bottom of the heap, leaving free space at the top.
Phase II
includes the following steps:
- The garbage collector now walks through the heap
linearly, looking for contiguous blocks of garbage objects (now considered free
space).
- The garbage collector then shifts the
non-garbage objects down in memory, removing all of the gaps in the heap.
- Moving the objects in memory invalidates all
pointers to the objects. So the garbage collector modifies the application’s roots
so that the pointers point to the objects’ new locations.
- In addition, if any object contains a pointer to
another object, the garbage collector is responsible for correcting these
pointers as well.
After all
the garbage has been identified, all the non-garbage has been compacted, and
all the non-garbage pointers have been fixed-up, a pointer is positioned just
after the last non-garbage object to indicate the position where the next
object can be added.
Finalization
.Net
Framework’s garbage collection implicitly keeps track of the lifetime of the
objects that an application creates, but fails when it comes to the un managed
resources (i.e. a file, a window or a network connection) that objects
encapsulate.
The
unmanaged resources must be explicitly released once the application has
finished using them. .Net Framework provides the Object. Finalize method: a
method that the garbage collector must run on the object to clean up its
unmanaged resources, prior to reclaiming the memory used up by the object.
Since Finalize method does nothing, by default, this method must be overridden
if explicit clean-up is required. The
potential existence of finalizer complicates the job of garbage collection in
.Net by adding some extra steps before freeing an object.
Whenever a
new object, having a Finalize method, is allocated on the heap a pointer to the
object is placed in an internal data structure called Finalization queue. When
an object is not reachable, the garbage collector considers the object garbage.
The garbage collector scans the finalization queue looking for pointers to
these objects. When a pointer is found, the pointer is removed from the
finalization queue and appended to another internal data structure called
F-reachable queue, making the object no longer a part of the garbage. At this
point, the garbage collector has finished identifying garbage. The garbage
collector compacts the reclaimable memory and the special runtime thread
empties the reachable queue, executing each object’s Finalize method.
The next
time the garbage collector is invoked, it sees that the finalized objects are
truly garbage and the memory for those objects is then, simply freed.
Thus when an
object requires finalization, it dies, then lives (resurrects) and finally dies
again. It is recommended to avoid using Finalize method, unless required.
Finalize methods increase memory pressure by not letting the memory and the
resources used by that object to be released, until two garbage collections.
Since you do not have control on the order in which the finalize methods are
executed, it may lead to unpredictable results.
Garbage Collection Performance
Optimizations
- Weak references
- Generations
When an
object has a weak reference to it, it basically means that if there is a memory
requirement and the garbage collector runs, the object can be collected and
when the application later attempts to access the object, the access will fail.
On the other hand, to access a weakly referenced object, the application must
obtain a strong reference to the object. If the application obtains this strong
reference before the garbage collector collects the object, then the GC cannot
collect the object because a strong reference to the object exists.
The managed
heap contains two internal data structures whose sole purpose is to manage weak
references:
- Short
weak reference table - the object which has a short weak reference to
itself is collected immediately without running its finalization method.
- Long weak
reference table - the garbage collector collects object pointed to by the
long weak reference table only after determining that the object’s storage is
reclaimable. If the object has a Finalize method, the Finalize method has been
called and the object was not resurrected.
*
These two tables simply contain pointers to objects allocated within the
managed heap. Initially, both tables are empty. When you create a WeakReference
object, an object is not allocated from the managed heap. Instead, an empty
slot in one of the weak reference tables is located; short weak references use
the short weak reference table and long weak references use the long weak
reference table.
Generations
Since
garbage collection cannot complete without stopping the entire program, they
can cause arbitrarily long pauses at arbitrary times during the execution of
the program. Garbage collection pauses can also prevent programs from
responding to events quickly enough to satisfy the requirements of real-time
systems.
One feature
of the garbage collector that exists purely to improve performance is called
generations. A generational garbage collector takes into account two facts that
have been empirically observed in most programs in a variety of languages:
- Newly
created objects tend to have short lives i.e. Gen 0 and 1.
- The older
an object is, the longer it will survive Gen 2.
Thus, as
objects “mature” (survive multiple garbage collections) in their current
generation, they are moved to the next older generation. Generation 2 is the
maximum generation supported by the runtime’s garbage collector. When future
collections occur, any surviving objects currently in generation 2 simply stay
in generation 2.
Thus,
dividing the heap into generations of objects and collecting and compacting
younger generation objects improves the efficiency of the basic underlying
garbage collection algorithm by reclaiming a significant amount of space from
the heap and also being faster than if the collector had examined the objects
in all generations.
Must Read
Reference
http://msdn.microsoft.com/en-us/magazine/bb985010.aspx