Recovered from the older tannerjc.net wiki snapshot dated January 23, 2016.

General info

Startup and shutdown

SQL connect / as sysdba
Connected to an idle instance.
SQL startup
ORACLE instance started.

Total System Global Area  167772160 bytes
Fixed Size		    1218316 bytes
Variable Size		   83888372 bytes
Database Buffers	   79691776 bytes
Redo Buffers		    2973696 bytes
Database mounted.
Database opened.

SQL shutdown
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL

sga and pga sizes

SQL select name || ' ' || value from v$parameter where lower(name) like ('sga%');

NAME||''||VALUE
--------------------------------------------------------------------------------
sga_max_size 268435456
sga_target 268435456

SQL select name || ' ' || value from v$parameter where lower(name) like ('pga%');

NAME||''||VALUE
--------------------------------------------------------------------------------
pga_aggregate_target 134217728

connection errors

[oracle@database ~]$ sqlplus /nolog
SQL connect / as sysdba
ERROR:
ORA-01031: insufficient privileges

The oracle user needs to be in three groups

[oracle@database ~]$ id oracle
uid=500(oracle) gid=500(oracle) groups=500(oracle),501(oinstall),502(dba)
  • usermod -G oinstall -a oracle

Why not to use 32bit with large amounts of memory

http://linux.derkeiler.com/Mailing-Lists/RedHat/2008-01/msg00149.html

The kernel uses low memory to track allocations of all memory thus a
system with 16GB of memory will use significantly more low memory than a
system with 4GB, perhaps as much as 4 times.  This extra pressure
happens from the moment you turn the system on before you do anything at
all because the kernel structures have to be sized for the potential of
tracking allocations in four times as much memory.

You can check the status of low  high memory a couple of ways:

# egrep 'High|Low' /proc/meminfo
HighTotal:     5111780 kB
HighFree:         1172 kB
LowTotal:       795688 kB
LowFree:         16788 kB

# free -lm
total       used       free     shared    buffers    cached
Mem:          5769       5751         17          0          8      5267
Low:           777        760         16          0          0         0
High:         4991       4990          1          0          0         0
-/+ buffers/cache:        475       5293
Swap:         4773          0       4773

When low memory is exhausted, it doesn't matter how much high memory is
available, the oom-killer will begin whacking processes to keep the
server alive.

There are a couple of solutions to this problem:

If possible, upgrade to 64-bit Linux.  This is the best solution because
*all* memory becomes low memory.  If you run out of low memory in this
case, then you're *really* out of memory. ;-)

If limited to 32-bit Linux, the best solution is to run the hugemem
kernel.  This kernel splits low/high memory differently, and in most
cases should provide enough low memory to map high memory.  In most
cases this is an easy fix - simply install the hugemem kernel RPM
reboot.

64 bit kernels not having highmem can be verified by the fact that there is no highmem.c in the 64bit source:

[root@rhel5box linux]# pwd
/usr/src/linux
[root@rhel5box linux]# find . -name highmem.c -print
./mm/highmem.c
./arch/mips/mm/highmem.c
./arch/i386/mm/highmem.c
./arch/frv/mm/highmem.c
./arch/sparc/mm/highmem.c
  • The RHEL 3/4 smp kernel can be used on systems with up to 16 GB of RAM. The hugemem kernel is required in order to use all the memory on systems that have more than 16GB of RAM up to 64GB. However, I recommend the hugemem kernel even on systems that have 8GB of RAM or more due to the potential issue of “low memory” starvation (see next section) that can happen on database systems with 8 GB of RAM. The stability you get with the hugemem kernel on larger systems outperforms the performance overhead of address space switching.

  • With x86 architecture the first 16MB-896MB of physical memory is known as “low memory” (ZONE_NORMAL) which is permanently mapped into kernel space. Many kernel resources must live in the low memory zone. In fact, many kernel operations can only take place in this zone. This means that the low memory area is the most performance critical zone. For example, if you run many resources intensive applications/programs and/or use large physical memory, then “low memory” can become low since more kernel structures must be allocated in this area. Low memory starvation happens when LowFree in /proc/meminfo becomes very low accompanied by a sudden spike in paging activity. To free up memory in the low memory zone, the kernel bounces buffers aggressively between low memory and high memory which becomes noticeable as paging (don’t confuse it with paging to the swap partition). If the kernel is unable to free up enough memory in the low memory zone, then the kernel can hang the system.

  • Paging activity can be monitored using the vmstat command or using the sar command (option ‘-B’) which comes with the sysstat RPM. Since Linux tries to utilize the whole low memory zone, a low LowFree in /proc/meminfo does not necessarily mean that the system is out of low memory. However, when the system shows increased paging activity when LowFree gets below 50MB, then the hugemem kernel should be installed. The stability you gain from using the hugemem kernel makes up for any performance impact resulting from the 4GB-4GB kernel/user memory split in this kernel (a classic 32-bit x86 system splits the available 4 GB address space into 3 GB virtual memory space for user processes and a 1 GB space for the kernel). To see some allocations in the low memory zone, refer to /proc/meminfo and slabtop(1) for more information. Note that Huge Pages would free up memory in the low memory zone since the system has less bookkeeping to do for that part of virtual memory, see Large Memory Optimization (Big Pages, Huge Pages).

discussion of Oracle and 32bit vs 64bit

I can't speak to our support position, but from a technical standpoint, it only
works in the loosest of senses.  The kernel uses 32 bytes of lowmem to track
the status of each page in the mem_map array.  So at 16GB of ram, we burn 128Mb
of memory just to track what all its doing.  Physically a PAE enabled kernel can
handle up to 64Gb of physicall address space.  To track that we need 512Mb, or
half of our Lowmem.  That works in that the system can boot up and idle there
without much trouble, but once you start applying load, you'll quickly run out
of space and fall off a cliff.  We worked around that in RHEL3/4 with the
hugemem kernel to give us some more lowmem space, but the effort of maintaining
that solution just doesn't make sense when its arguably less expensive to just
buy 64 bit hardware, where it isn't a problem at all.

In short, please don't do this.  32 Bit x86 just wasn't designed for that much
memory, the proper solution is 64 bit systems.
Again, if Oracle is going to play this game, then ...

1.  Run RHEL 4.7 i386 with the latest errata i686 kernel (-78.0.x)
2.  Ensure the i686 kernel is hugemem and hugemem only
3.  Point the finger at Oracle when it comes to certification

  Even when the newer versions of these applications are certified on
  64-bit, customers tend to move *very* slowly in upgrading them.

Which is why the unified argument needs to be not only the above,
but ...

A.  For 2009+ processors (even late 2008), install on RHEL 5 x86-64
B.  Deal with the co-existence of RHEL 5 x86-64 with RHEL 4 i386
C.  Ensure BIOS updates (with Microcode errata) are up-to-date

Debugging

Oracle RAC

General

Lab oracle rac

RAC mailing list

Setup

CRS (Cluster Ready Services)

[oracle@vmora01rh4 ~]$ cd /u01/app/oracle/product/10.2.0/crs/bin
[oracle@vmora01rh4 bin]$ crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.fokerac.db application    ONLINE    ONLINE    vmora02rh4

Fencing

General

  • Oracle clsomon failed with fatal status 13 – will cause a fence
  • Oracle CSSD failure 134. – will cause a fence

Checking status

  • srvctl status database -d orcl

Node fencing

  • carried out via sysctl/sysrq commands, and system_reboot calls by ocfs

  • prints a message to console: Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing

  • prior to OCFS2 1.2.5, ocfs fences were via sysrq. Later versions use system_reboot

  • ocfs patch with fence message: http://oss.oracle.com/pipermail/ocfs2-devel/2007-April/001212.html

  • sysrq is not enabled with sysctl.conf: kernel.sysrq = 0

  • cssd is the cluster synchronization services daemon responsible for node membership.

  • Full description of how a fence occurs/looks http://www.freelists.org/post/oracle-l/Oracle-CRS-and-Split-Brin,1

  • master node looses its private network

  • surviving node becomes the master

  • old master is ejected from the cluster configuration - and rebooted.

crsd.log will show …

I AM THE NEW OCR MASTER at incar 6. Node Number = 1
Processing member leave for node1, incarnation: 7
Do failover for: node1
Reconfiguration started (old inc 24, new inc 26)

$ORA_CRS_HOME/log will show …

CSSD evicting node node11. Details in /ORACLE_HOME/log/node0/cssd/ocssd.log.
CRS-1601:CSSD Reconfiguration complete. Active nodes are node0 .

occsd.log file for the node that has a problem in its private network (node1)…

clssnmPollingThread: node node1 (1) missed(590) checkin(s)
clssnmPollingThread: node node1 (1) missed(591) checkin(s)

Fence examples

Oct 13 13:43:07 prd2 xinetd[15320]: EXIT: vnetd status=0 pid=19580 duration=0(sec)
Oct 13 13:45:25 prd2 kernel: o2net: connection to node prd1 (num 1) at 192.168.242.21:7777 has been idle for 30.0 seconds, shutting it down.
Oct 13 13:45:25 prd2 kernel: (0,1):o2net_idle_timer:1503 here are some times that might help debug the situation: (tmr 1255401895.85339 now 1255401925.80869 dr 1255401895.85328 adv 1255401895.85341:1255401895.85341 func (058d859d:505) 1255401881.84581:1255401881.84584)
Oct 13 13:45:25 prd2 kernel: o2net: no longer connected to node prd1 (num 1) at 192.168.242.21:7777
Oct 13 13:45:55 prd2 kernel: (27552,3):o2net_connect_expired:1664 ERROR: no connection established with node 1 after 30.0 seconds, giving up and returning errors.
.....

Oct 13 13:49:07 prd2 kernel: (7855,1):ocfs2_dlm_eviction_cb:98 device (253,38): dlm has evicted node 1
Oct 13 13:49:07 prd2 kernel: (7855,1):ocfs2_dlm_eviction_cb:98 device (253,81): dlm has evicted node 1
Oct 13 13:49:07 prd2 kernel: (24117,3):ocfs2_replay_journal:1183 Recovering node 1 from slot 3 on device (253,38)

Fence RCA

Questions to ask
  • Is the oracle install 32 bit or 64 bit
  • What kind of shared storage is being used
  • Is the shared storage multipathed
  • Is ASM in use
  • Are sql/database and heartbeat traffic separated to different networks
  • Is bonding in use and what mode
  • Do the fence events occur at specific times
  • Do fencing times correlate to scheduled batch jobs or other cronjobs / maintenance
initial checks
  • check messages for errors prior to reboot
  • check sar
  • load spikes
  • free mem
  • packet drops
  • proper distribution of packets on the slaves according to bonding mode
  • check for hugepage usage
  • unused hugepages reduce efficiency and possibly lead to cpu consumption + swap
  • RHEL5 can free and dynamically allocate hugepages, so not much of an issue
  • check heartbeat threshold: http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#HEARTBEAT
  • O2CB_HEARTBEAT_THRESHOLD = (((timeout in secs) / 2) + 1)
  • cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
  • check ocfs fencing method
  • cat /proc/fs/ocfs2_nodemanager/fence_method
storage checks
  • check /etc/multipath.conf for proper settings
  • check if asm is correctly scanning for the multipathed virtual devices instead of the underlying paths: /etc/sysconfig/oracleasm | fgrep SCAN
  • ORACLEASM_SCANBOOT: ’true’ means scan for ASM disks on boot.
  • ORACLEASM_SCANORDER: Matching patterns to order disk scanning
  • ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
network checks
  • check connectivity constantly by using tcpdump
  • check bond mode settings
  • check that switch can handle the mode properly
  • check sar data for appropriate distribution of messages on each slave
  • check if db traffic and heartbeat traffic are on the same network
  • init.ora parameter called cluster_interconnects which specifies the interconnect IP address to use.
  • Oracle recommends increasing the interconnect bandwidth capacity before using another interconnect,
  • Oracle assumes there is no more than a couple of DB’s on a RAC cluster, so a single network won’t necessarily handle a huge number of instances
  • check network settings against metalink 811306.1
  • net.core.rmem_max = 2097152 (we have it set at 262144 which is the default and minimum)
  • net.core.wmem_max=1038576 (set at 262144 as above)
  • e1000 flow control defaults to none in the 2.6 kernel and oracle request RX flow control be set to on.
  • e1000: modprobe e1000 FlowControl=1
  • e1000e: ethtool -A eth0 autoneg off tx off rx on
collect oracle/other logs
  • check RAC logs for fence messages
  • init.ora for every SGA/$ORACLE_HOME
  • oprocd
  • crsd
  • ccsd
  • oswatcher or IPD/os logs
  • Oracle clsomon failed with fatal status 13 – will cause a fence
  • Oracle CSSD failure 134. – will cause a fence
  • install OSwatcher from oracle to collect more info about the system
  • if already installed, check system status right before the fencing
  • replacing OSwatcher with IPD/OS – better information and uses fewer system resources
  • check network switch logs for port down messages
  • compare times to make sure port went down before or after the node is fenced
preparing for another fence
  • check sysrq command in /etc/init.d/init.cssd
  • use systemtap script [crashme] to force a panic whenever emergency_shutdown() is called
  • setup, configure, test kdump
  • set kernel.panic to 30 seconds instead of the default recommendation of 10, so that kdump has time to kick in
  • turn on all the sysctl -a | fgrep panic settings besides the unknown_nmi
  • change OCFS2 fencing method to panic instead of restart
  • echo 1 /proc/fs/ocfs2_nodemanager/fence_method
  • double the OCFS timeout settings
  • look at the hangcheck_timer settings in either rc.local or modprobe.conf
  • change the defaults to higher settings to slow down polling
  • hangcheck_reboot =0, so that it doesn’t reboot the machine
  • hangcheck_dump_tasks=1
  • capture and record process listing. http://www.m00t.net/wiki/index.php?title=Linuxnotes#Profiling_resource_usage_with_cron_and_ps
  • check for applications using 100% cpu for long periods of time before the fence
  • check for zombie pids
  • segfaults may be related to zombie pids not being able to contact parent
  • if plist increases, try distributing load better amongst the nodes
  • check the number of LMS (user_dlm?) processes http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#PROCESSES
  • too many can starve the cpu and prevent RAC daemons from communicating … fence.
  • By default, all LMS processes run with an elevated priority as these processes are responsible for the RAC messaging between the cluster nodes.
  • As a rule of thumb the number of LMS processes running with elevated priority should not exceed the number of CPUs (or CPUs * # of cores).
  • reduce the number of databases or lower the LMS process priority