Sanity Checks
Solaris CAT will run some sanity checks upon startup to look for hints of known problems. These will appear in a section just before the prompt appears, similar to:
sanity checks: settings...vmem...cpu...sysent...clock...misc...donewhich is an example of a warning-free startup. An example of one with things to pay attention to is:
sanity checks: settings...vmem...cpu...sysent...clock...misc...
WARNING: console output stopped by ctrl-s (1027 bytes pending)
WARNING: 1 pending softints (softlevel1 queued on CPU1)
done
Two types of messages will appear
- Issues which have been observed to cause problems in other systems or cores will begin with
WARNING:. - Interesting discoveries which are unlikely to cause problems will be preceded by
NOTE:.
These are intended to point you in a good direction to begin an investigation. They are not necessarily related to the problem being investigated and may, in fact, be wholly irrelevant.
There are two scatenv settings which can be used to control the operation of the sanity checks:
sanity_check: Turning this setting off will disable all of the sanity checks.sanity_note: Turning this setting off will disable only the NOTE information.
/etc/system settings
General checks performed on entries in this file are as follows:set mod:var=val
modnot loaded (NOTE)modloaded or NULL, andvardoesn't existmodloaded or NULL, andvarexists but is not STT_OBJECTmodloaded or NULL, andvarexists but is not STB_GLOBAL or STB_LOCALmod:varseen more than once with differentvalmod:varseen more than once with sameval(NOTE)
Checks are also made for the following problems with specific settings (some are intentionally redundant):
msginfo_mssegset to > 32767ngroups_maxset to <>sq_max_size== 0 or > 100rlim_fd_cur> 1024 on Solaris <>rlim_fd_max> 1024 on Solaris <>lwp_default_stksizenot a multiple of_pagesizesd_max_xfer_sizesetssd_max_xfer_sizesetdesfree!=lotsfree/2minfree!=desfree/2throttlefree!=minfreecachefree!=lotsfreeorlotsfree* 2 (Solaris 7 and 8)dyncachefree!=lotsfreeorcachefree(Solaris 7 and 8)lotsfree<desfreedesfree<minfreeminfree<throttlefreecachefree<lotsfreeufs_LW>ufs_HW
clock-related sanity checks
All of these are hints that that the kernel clock may not be advancing:panic_hrtime,hres_last_tick, orhrtime_basebehind by more than 10 minutesclock_pendorclock_rerunsnonzero- cyclics pending for more than 10 minutes (Solaris >= 8)
- callouts which expired more than 1 second ago
cpu structure sanity checks
The list of CPUs on the system are checked for various conditions which could indicate areas of interest:- a count of CPUs which are offline
- CPUs which have
cpu_intr_actvset - CPUs whose
cpu_base_splis greater than 0 - CPUs whose
last_swtchis greater than 15 seconds ago - CPUs which have a thread on the processor using more than 90% CPU
- CPUs which have a pinned thread using more than 90% CPU
- CPUs which have more than 5 threads in their dispatch queues
- CPUs which have threads on their dispatch queue whose
t_disp_queueis set to a different CPU - CPUs which have an implementation number different than that of the first CPU
- CPUs which have a clock speed different than that of the first CPU (NOTE)
memory-related sanity checks
availrmem<=tune.t_minarmemfreemem<throttlefree(page_create() throttled)avefree<minfree(hard swapping)avefree<desfree&&freemem<=desfree(soft swapping)avefree<lotsfree(paging)avefree<dyncachefree(paging fs pages)- kernel cage checks on Solaris >= 7
kcage_freemem<kcage_lotsfree(NOTE)kcage_freemem<kcage_desfreekcage_freemem<kcage_minfreekcage_freemem<kcage_throttlefreekcage_needfree> 0
miscellaneous sanity checks
- device in use for more than 1 filesystem or swap
rootdiris NULL (intentional panic)- coredump size doesn't match that calculated from the dumphdr (incomplete or corrupt coredump)
DF_LIVEset in dumphdrdump_flags(live coredump)DF_COMPLETEnot set in dumphdrdump_flags(incomplete coredump)kernelbasenot expected value (corrupt coredump)max_nprocs-nproc<= 0 (ran out of processes)nproc> 90% ofmaxnprocs(running out of processes)- symbol
bunyip_vnodeopspresent (bunyip module loaded) (NOTE) sysentorsysent32table entries which havesy_callorsy_callcwith modules other than those in the following list (system call interceptor code loaded):- "genunix"
- "unix"
- "pipe"
- "nfs"
- "doorfs"
- "msgsys"
- "shmsys"
- "semsys"
- "kaio"
- "pset"
- "cpc"
- "c2audit"
- "sysacct"
- "inst_sync"
- "srmlimitsys"
- "rpcmod"
- "samsys"
- "samsys64"
- "autofs"
- "portfs"
- processor sets created (NOTE)
- ndd parameter
ip_icmp_err_intervalset to 0 - init process is a zombie
- disk commands pending
- pending softints
- count of pages retired due to errors
- syncq service threads active (Solaris 8 only) (hung streams)
- console output stopped by ctrl-s
system_taskqwith active threads- callouts with high bit set in
c_runtime - ptl1 panics where TL[N] tt is 0x68, TL[N] tpc is a stx (64b) or stw (32b), and the address being stored to is in the onproc thread's redzone (probable stack overflow) (Solaris 9 and above only)
- vmem checks on Solaris >= 8
- vmem arenas with threads asleep on the
vm_cv(threads waiting for allocations)
- vmem arenas with threads asleep on the
- rmap checks on Solaris <>any of
kernelmap,kernelmap32, orkobj_mapwith: m_want== 1 (thread waiting for an entry)m_free==m_sizeandm_want== 1 (out of entries in map)



0 comments:
Post a Comment