Background to DISM
What is DISM
Solaris Dynamic Intimate Shared Memory (
DISM) provides shared memory that is dynamically resizable. DISM means that applications can respond to changes in memory availability by dynamically increasing or decreasing the size of optimized shared memory segments. The first major application to support
DISM was Oracle9i - Oracle9i uses
DISM for its dynamic System Global Area (SGA) capability.
The Origins of DISM
Many applications, and especially databases, use shared memory to cache frequently-used data (the buffer cache) and for interprocess communication. Solaris provides an optimized shared memory capability known as Intimate Shared Memory (ISM), and all major databases take advantage of it.
ISM benefits
ISM offers a number of benefits over standard System V shared memory:
1. ISM shared memory is automatically locked by the kernel when the segment is created. This not only ensures that the memory cannot be paged out, it also allows the kernel to use a fast locking mechanism when doing I/O into or out of the shared memory segment, thereby saving significant CPU time.
2. Kernel virtual-to-physical memory address translation structures, are shared between processes that attach to the shared memory, saving kernel memory and CPU time.
3. Large pages, supported by the UltraSPARC Memory Management Unit (MMU), are automatically allocated for ISM segments (as of Solaris 2.6 OS). Large pages can reduce the number of memory pointers by a factor of 512. This reduction in complexity translates into noticeable performance improvements, especially on systems with large amounts of memory.
4. Since memory is locked, no swap space is needed to back it, thereby saving disk space.
Unfortunately though, ISM segments cannot be resized. To change the size of an ISM database buffer cache, the database must be shutdown and restarted, affecting system availability. For example, removing memory by Dynamic Reconfiguration may require shutting down database instances.
DISM overcomes this limitation. A large
DISM segment can be created when the database boots, with sections of it selectively locked or unlocked as memory requirements change. Instead of the kernel automatically locking
DISM memory, though, locking and unlocking is done by the application (e.g. Oracle).
DISM Performance
When
DISM was first released in Solaris 8 Update 3 (1/01), it inherited a number of the benefits of ISM. In particular:
1. Memory is locked (by the application), preventing paging and allowing I/O to use fast kernel locking mechanisms.
2. Kernel virtual to physical memory address translation structures are shared between processes that attach to the DISM segment, saving kernel memory and CPU time.
In this first release, large MMU pages were not supported. For Solaris 8 systems with 8GB of memory or less, it is reasonable to expect a performance degradation of up to 10% compared to ISM, due to the lack of large page support in
DISM. The actual performance impact will vary, though, according to the amount of shared memory and the frequency of access to it. Sun recommends avoiding
DISM on Solaris 8 either where SGAs are greater than 8 Gbytes in size, or on systems with a typical CPU utilization of 70% or more. In general, where performance is critical,
DISM should be avoided on Solaris 8. As we will see, Solaris 9 Update 2 (the 12/02 release) is the appropriate choice for using
DISM with systems of this type.
Solaris 9 Update 1 (the 9/02 release) introduced large page support for
DISM segments. Tests have shown that, as of this release,
DISM and ISM performance are equivalent. This enhancement is significant - the availability benefits of
DISM can be enjoyed without compromising performance.
Solaris 9 Update 2 (the 12/02 release) further enhances
DISM - unlocked memory can be more efficiently made available to other applications by proactively returning it to the free list, rather than waiting for the page daemon to locate it. This release also introduced a number of necessary
DISM-related bug fixes, making it the minimum Solaris 9 release for running
DISM.
Finally, since
DISM memory is not automatically locked, swap space must be allocated for the whole segment. If the swap defined is not big enough, the system will reserve memory pages in RAM, which can cause a memory shortage. Memory reservation in RAM does not decreases freemem, but decreases availrmem. This remaining difference between ISM and
DISM is unlikely to be a major issue, though, given the capacity of modern disk drives.
For more information on this extra swap reservation incurred when
DISM is used, see SunSolve InfoDoc 80799 "Solaris[TM] Operating System:
DISM double allocation of memory".
Oracle's Implementation of DISM in Oracle9i
Oracle took advantage of
DISM in Oracle9i (and subsequent releases, such as Oracle10g) to implement dynamic SGA resizing. Dynamic SGA resizing allows a database administrator (DBA) to respond to changing needs by increasing or decreasing SGA memory without a database reboot. Without
DISM this feature would not be available on Solaris.
Since
DISM requires the application to lock memory, and since memory locking can only be carried out by applications with superuser privileges, Oracle implemented a daemon that runs with root privileges. Since Oracle does not normally run with superuser privileges, the Role Based Access Control (RBAC) features introduced in Solaris 8 were used to provide that access in the first release of Oracle9i. RBAC allows a nominated binary to run with an effective uid of root. During the Oracle install procedure, the installer is asked to run a script as root; this script established the necessary RBAC permissions. In particular, entries were added to the /etc/user_attr and /etc/security/exec_attr files. Later Oracle releases simply made the $ORACLE_HOME/bin/oradism binary setuid root to achieve the same end.
The $ORACLE_HOME/bin/oradism binary implements the new root daemon. Look for it in ps with the description ora_dism_$ORACLE_SID.
For example:
serv1% ps -aef | grep dism
root 747 1 0 13:57:26 19:42 ora_dism_custdb
oradba1 18456 18391 0 23:37:08 pts/6 0:00 grep dism
Oracle9i introduced a new init.ora parameter, sga_max_size, to activate
DISM. This variable establishes the maximum size to which the SGA can grow; it can only be modified statically (in other words an Oracle reboot is required before any change to sga_max_size takes effect). Oracle will use
DISM instead of ISM if sga_max_size is set larger than the total of the database buffers (in particular, db_cache_size dynamic SGA resizing is not supported with the older db_block_buffers parameter), the shared pool, the redo buffers, the large pool, the Java pool, and the SGA fixed size (representing Oracle's internal requirements).
Once
DISM has been invoked, Oracle automatically locks an amount of memory determined by the total of the elements described above (database cache, shared pool, etc). Subsequently, the DBA can alter the size of the database buffers (db_cache_size) and the shared pool (shared_pool_size) with Oracle's alter system command. For example:
alter system set db_cache_size =
Depending on the command, Oracle then locks additional memory, subject to the upper limit imposed by sga_max_size, or releases memory for use elsewhere by the operating system. The results of alter system actions are shown in the alert log file, typically located in
$ORACLE_HOME/rdbms/log/alert_$ORACLE_SID.log. An example is shown below:
CKPT: Current size = 37904 MB, Target size = 3760 MB
Fri Jun 21 21:29:50 2002
Completed checkpoint up to RBA [0x7e.2.10], SCN: 0x0000.25c3f387
CKPT: Resize completed for buffer pool DEFAULT for blocksize 2048
Fri Jun 21 21:29:54 2002
ALTER SYSTEM SET db_cache_size='3932160000' SCOPE=MEMORCKPT: Begin
resize of buffer pool 3 (DEFAULT for block size 2048)Y;
What can go wrong
Oradism permissions not setup correctly
System administrators do not always install Oracle using the install script. Instead, the relevant directories may be transferred to another system using tar, cpio or a similar utility. If the /etc/user_attr and /etc/security/exec_attr files on the target system are not modified accordingly, the oradism program will not run with the correct permissions.
For later Oracle releases that made the oradism program setuid root, a transfer of the Oracle binaries to another system may not have preserved these permissions (for example, tar may have been run as the oracle user rather than as root). Also, if binaries are located on another system and mounted on the server with NFS, it is possible that the mount options needed to permit root access have not been set, with the result that the oradism setuid permissions will not be honored.
Oradism dies
If the Oracle9i oradism process dies for some reason (unlikely unless a system administrator accidentally kills it), all locked memory will be automatically unlocked. Performance will suffer accordingly.
Note, however, that oradism can be restarted automatically by Oracle10g if it dies.
SGA memory is only partially locked
If the DBA sets the database buffers, shared pool etc too large in the init.ora file, or later increases them too much with an alter system command, Oracle may not be able to successfully lock the whole of the requested memory. In this case, when Oracle uses that portion of the memory that is not locked, poor performance will result. Note that memory is not locked and unlocked in a single chunk lock and unlock operations are carried out on individual memory 'granules', typically 16Mbytes in size.
How to diagnose it
Solaris release
To determine your Solaris release, refer to the /etc/release file:
serv1% cat /etc/release
Solaris 9 12/02 s9s_u2wos_10 SPARC
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 05 November 2002
The system in the example above is running Solaris 9 Update 2 (as indicated by u2 in the s9s_u2wos_10 string).
SGA not locked
Systems with active use of unlocked SGA memory can be identified with either the lockstat or statit utilities. The lockstat utility ships with both Solaris 8 and 9; the statit utility is a widely-used although unsupported utility that can be downloaded from
http://www.solarisdatabases.com/utilities/statit.
The lockstat utility shows mutex contention. As root, look for mutex contention in the segspt_softunlock or spt_anon_getpages functions. Two examples are included below showing the first few lines of a lockstat profile from a system running an active Oracle instance with unlocked SGA memory. The following command can be used to generate these profiles.
serv1# /usr/sbin/lockstat -A -n 200000 sleep 10
Here are the two sample profiles:
Adaptive mutex spin: 22908 events in 10.006 seconds (2290 events/sec)
Count indv cuml rcnt spin Lock Caller
---------------------------------------------------------------------------
4896 21% 21% 1.00 27532 0x3001314ec88 spt_anon_getpages+0x54
81 0% 22% 1.00 11 0x300001a1e98 sc_flush+0x1c4
78 0% 22% 1.00 2 pse_mutex+0xb0 page_unlock+0x1c
77 0% 22% 1.00 2 pse_mutex+0x368 page_unlock+0x1c
Adaptive mutex spin: 561050 events in 10.017 seconds (56011 events/sec)
Count indv cuml rcnt spin Lock Caller
---------------------------------------------------------------------------
549729 98% 98% 1.00 146 0x300085a2580 segspt_softunlock+0xc4
834 0% 98% 1.00 10685 0x300085a2578 spt_anon_getpages+0x54
715 0% 98% 1.00 12 0x300085a2580 spt_anon_getpages+0x64
The presence of such mutexes indicates that SGA memory is not locked.
In statit, look for faults due to s/w lcking req. A non-zero result on a system with an active Oracle instance indicates that some or all SGA memory is not locked. The following example illustrates how to check for this behavior:
serv1% statit sleep 30 | grep lcking
31.62 faults due to s/w lcking req 0.00 kernel as as_flt()s
The example above shows softlocks, indicating a problem with SGA memory locking.
Permissions problems
Examine the alert log for each instance. The following entry indicates that Oracle was unable to start the oradism process with superuser privileges:
Wed Jul 16 14:03:39 2003
WARNING: -------------------------------
WARNING: oradism not set up correctly.
Dynamic ISM can not be locked. Please
setup oradism, or unset sga_max_size.
[diagnostic 0, 5, 0]
----------------------------------------
Performance will be very significantly degraded for this instance, since SGA memory is not locked.
To check whether a running oradism process has the appropriate permissions, use the pcred program (available on both Solaris 8 and 9). Consider the following example:
serv1% ps -aef | grep dism
root 747 1 0 13:57:26 19:42 ora_dism_custdb
oradba1 18456 18391 0 23:37:08 pts/6 0:00 grep dism
serv1% sudo pcred 747
747: e/r/suid=0 e/r/sgid=5432
In this example, the oradism program is running with superuser privileges (suid=0).
To check if RBAC is configured appropriately, look for the appropriate entries in the /etc/user_attr and /etc/security/exec_attr files.
serv1% grep -i dism /etc/security/exec_attr
Oracle DISM
mgmt:suser:cmd:::/export/home/oracle/bin/oradism:euid=serv1% grep -i
dism /etc/user_attr
oracle::::type=normal;profiles=Oracle DISM mgmt
In this case both files have been setup. Instances using binaries located in /export/home/oracle/bin will be able to run oradism with root privileges.
Check, too, if the setuid bit has been set on the oradism program. For example:
serv1% ls -l $ORACLE_HOME/bin/oradism
-rwsr-sr-x 1 root dba 9280 Apr 9 2002 /export/home/oracle/bin/oradism*
In the above example the oradism program is owned by root, and the setuid bit is set (as indicated by the first s in rws). The program should therefore be able to start with appropriate permissions.
Oradism dies
Check for the presence of the oradism process with the ps program.
Unfortunately, in the unlikely event that this process dies, in Oracle9i it will not be noted in the alert log file until the next time alter system is used to grow or shrink memory.
Oracle10g has the ability to restart oradism if it dies.
SGA memory only partially locked
In Oracle10g, a new table (x$ksmge) can be examined to determine the state of locks for each granule of SGA memory (each granule is typically 16Mbytes in size).
For Oracle9i, an interpose library is from the Veritas, Oracle, Sun Joint Escalation Center (VOSJEC). Untar the file into a temporary directory, and install the library according to the instructions in the README file. When oradism is started, a /tmp/mlock.log file will be created, and all lock and unlock operations will be logged to the file along with their return status (non-zero return status indicates failure). An example is given below of the information that will be logged if locks attempts are successful:
Mon Nov 10 13:57:26 2003: Detected mlock(380000000, 400000)
Mon Nov 10 13:57:26 2003: mlock return status=0
Mon Nov 10 13:57:26 2003: Detected mlock(1013000000, 1400000)
Mon Nov 10 13:57:26 2003: mlock return status=0
Mon Nov 10 13:57:26 2003: Detected mlock(381000000, 1000000)
Mon Nov 10 13:57:26 2003: mlock return status=0
An example of unsuccessful lock attempts is shown below:
Sat May 25 02:45:22 2002: Detected mlock(e61000000, 1000000)
Sat May 25 02:45:22 2002: mlock return status=-1, errno=11 <-------------
Sat May 25 02:45:22 2002: Detected mlock(e62000000, 1000000)
Sat May 25 02:45:23 2002: mlock return status=-1, errno=11 <-------------
How to fix it
Solaris OS release issues
As previously noted, a known Solaris 8 OS bug (
4966813) can cause this problem. The solution is to install Solaris Patch
117000-05.
DISM should only be used on Solaris 8 OS, if patch
117000-05 or later is installed .
DISM should not be used on Solaris 9 OS prior to Update 2 (the 9/02 rele ase).
Permissions problems
The simplest way of ensuring appropriate permissions is to make the oradism program setuid root, obviating the need for RBAC. This can be achieved as follows:
serv1% chown root $ORACLE_HOME/bin/oradism
serv1% chmod 4755 $ORACLE_HOME/bin/oradism
serv1% ls -l $ORACLE_HOME/bin/oradism
-rwsr-xr-x 1 root dba 9280 Apr 9 2002 /export/home/oracle/bin/oradism*
Oradism dies
If the oradism process dies in Oracle10g, no action is necessary; Oracle is able restart it. In Oracle9i, though, the only practical solution is to shutdown and restart the instance.
SGA memory only partially locked
The main solution to this problem is to ensure that it doesn't happen. By installing the interpose library, it is possible to monitor the success of locking operations. Attempts to lock more memory than the system has available should be avoided.
General Guidelines
The following guidelines are offered in conclusion:
1. Ensure you are using an appropriate release of Solaris OS(refer to the next section of this document).
2. If you don't need to resize the SGA dynamically, you probably don't need to set sga_max_size.
3. If you use sga_max_size, check that oradism is working correctly as described above.
4.
On Solaris 8 OS,
DISM should only be used if Solaris Patch
117000-05 is installed. Even once the patch is installed,
DISM should be avoided where performance is critical. Use of
DISM can cost up to 10% in performance compared to ISM for SGAs up to 8 Gbytes in size (although your mileage may vary depending on your circumstances!). Sun recommends avoiding
DISM on Solaris 8 where the SGA is larger than 8 Gbytes, or where CPU utilization is typically greater than 70%.
5. On Solaris 9 as of Update 2, the performance of DISM is equivalent to ISM. DISM should not be used with releases of Solaris 9 prior to Update 2.
| Document Audience: | SPECTRUM |
| Document ID: | 214947 |
| Old Document ID: | (formerly 80799) |
| Title: | Solaris[TM] Operating System: DISM double allocation of memory |
| Copyright Notice: | Copyright © 2008 Sun Microsystems, Inc. All Rights Reserved |
| Update Date: | Tue Mar 22 00:00:00 MST 2005 | |
Solution Type Technical Instruction
Solution 214947 : Solaris[TM] Operating System: DISM double allocation of memory
| Related Categories |
· Home>Product>Software>Operating Systems |
Description
SM212ZV Internal ID use only.
Memory allocation by Intimate Shared Memory (ISM) is well understood. Dynamic
Intimate Shared Memory(DISM) works slightly differently, and this needs to be
accounted for in the system configuration. Otherwise, DISM may not function as
expected and it's usefulness will be limited.
Steps to Follow
BTFNA7V Internal ID use only.
When shared memory is allocated via shmget(), its size is subtracted from the
available virtual swap space.
When a shared memory segment allocated via shmget() is attached to a process,
it is declared as ISM or DISM by the flags to the shmat() call:
#define SHM_SHARE_MMU 040000 /* share VM resources such as page table */
#define SHM_PAGEABLE 0100000 /* pageable ISM */
The SHM_SHARE_MMU flag makes the shared memory segment an ISM segment, while the
SHM_PAGEABLE flag makes it a DISM segment. It should be noted that a segment
cannot be both, and an attempt to set both flags will fail.
A DISM segment can have portions of the segment selectively locked into memory
by a (root-privileged) process, using the mlock() call. This allows the program
to more selectively manage which portions of the shared memory need to be locked
in at a given time.
The down side to this is, that the DISM segment will subtract the size of the
segment from the available memory a second time - once for its in-memory space,
and a second, for its possible swap usage - effectively doubling the virtual
memory requirement for any DISM segments. This second allocation occurs during
the mlock() call.
Thus, a system using DISM needs to have as much disk swap available, as the
total DISM segments in use. If the system exhausts available swap, DISM segments
will not be mlock()'able.
If the system is intended to have little or no disk swap, DISM is not
appropriate and should not be used.
Internal Comments
VVDFGN6 Internal ID use only.
1) This whole problem is caused by:
BugID 1225025 - mlock:ed anonymous memory remains backed by swap
2) In Solaris 2.5.1 OS and before, normal ISM had this same double-allocation
counting problem. This is discussed in InfoDoc 15505.
3) To demonstrate the effect, a program was written using DISM and mlock()'ing
the segment. "swap -s" is used to monitor the space consumed in its
"available" field. The program allocates 0.5GB of DISM.
This is taken prior to starting the test program:
total: 84816k bytes allocated + 6592k reserved = 91408k used, 3053320k available
We start with approximately 3GB available.
shmget() done:
total: 84896k bytes allocated + 530872k reserved = 615768k used, 2528936k
available
This shows that the available swap has dropped by 0.5GB as expected.
shmat() done (segment is returned with pages unlocked):
total: 84880k bytes allocated + 530888k reserved = 615768k used, 2528744k
available
no change here...
mlock() done:
total: 609136k bytes allocated + 6632k reserved = 615768k used, 2004456k
available
This shows that the mlock() call is responsible for the second decrement of
0.5GB in the available field.
munlock() done:
total: 609184k bytes allocated + 6584k reserved = 615768k used, 2528744k
available
And finally, when the segment is munlock()'ed, the second allocation
disappears.