Checks to be
performed at the machine level
run queue should
be ideally not more than the number of CPU’s on the machine. At the maximum it should never be more than twice the
number of CPU’s.This is denoted by the column ‘r’ in the vmstat output shown
below vmstat – 5
CPU idle% < 10 ( id column) could indicate a machine that is having CPU resource issues
Note:
How to find number of CPU’s on a LINUX machine?
cat
/proc/cpuinfo |grep -w “processor” |wc –l
Swap columns si and so should ideally be
0 to indicate no swapping activity
We should be looking at the free and used values in the row denoted by “-/+ buffers/cache”
The ‘top’ command will help us identify the load average on the
machine as well as any process that is consuming excessive CPU
$top
A load average greater than 5 or 10 could indicate a heavily utilized machine
CPU information for each CPU is also provided via the top command as well as information on the physical as well as virtual Memory that is available on the machine.
Information
on the top CPU as well as Memory consuming processes is also provided along
with the Process ID (PID). In a later section we will use this PID as a
parameter for a SQL query to identify the SQL being executed by the same CPU
consuming process.
Identify if any single PID is constantly
appearing in the top output
The iowait column can also help us identify if there’s any
resource contention from the IO perspective.
A value above 40-50% would indicate I/O
resource issues and require
further investigation into the process that is causing this high I/O – or it
could indicate a case of inefficient disk sub system or file layout in the
database.
We
can also view the state of the machine at a particular time of the day by
running the sar command which will provide the system utilisation report since
12:00 AM on that particular day.
We
can use the sar command to identify the machine state even for a particular day
of the month
For
example sar -f /var/log/sa/sa03 will
report for the 3rd of the month
Checks to be performed at the database level
Identify
with the user if the problem is a slow response or a hanging situation.
Establish
a connection via SQL*NET using a non SYSDBA account to confirm that the
listener is accepting client connections and the hanging is not due to the
archive area getting 100% full.
Examine
the alert log file for ‘Checkpoint Not Completed’ errors recorded at the time
the performance problem is reported – this could indicate an I/O contention
issue or inadequately sized redo log files which can also cause an application
hang while the checkpoint completes.
Ensure
that the mount point on which the Oracle software is not 100% full or the disks
holding the controlfiles are also not 100% full.
Check
the listener.log file if it exists and ensure that it is not > 2GB – on some
Operating Systems like LINUX, there is a file size upper limit for the
listener.log file after which client connections will not be accepted by the
listener.
Check
for locked sessions (see script check_lock.sql).
If a PID has been identified as a top CPU consuming process,
check the SQL being executed by that particular PID (see script check_pid_sql.sql)
If the user provides a particular SID where a possible
performance issue exists, check the SQL being executed by that SID ( see script
check_sid_sql.sql)
If the user provides a particular Oracle username where a
possible performance issue exists, check the SQL being executed by that Oracle
user ( see script check_username_sql.sql)
Very Important – check the major wait events in the database
(see script wait_events.sql)
check the SID along with the events that each SID is waiting
on (see script wait_events_sid.sql)
– based on the SID, the SQL being executed by the waiting sessions can be
obtained as well ( see script check_sid_sql.sql)
Check for any sessions continuously waiting for on a
particular latch (see script check_latch.sql)
What has changed?
Is
there a measurable baseline regarding the “problem” query – when did it last
perform well?
Has
the database been upgraded recently?
Has
any modifications been done to the database in terms of init.ora parameters?
Have
any new indexes been added to the table or has the table structure changed?
Has
the platform or database version changed?
Is
this a period of unusual business activity? – like a monthly data load or
one-off batch job
check_pid_sql.sql
SET
PAGESIZE 500
set
long 500000
set
head off
select
s.username su,
substr(sa.sql_text,1,540) txt
from
v$process p,
v$session s,
v$sqlarea sa
where p.addr=s.paddr
and s.username is not null
and s.sql_address=sa.address(+)
and s.sql_hash_value=sa.hash_value(+)
and
spid=;
check_sid_sql.sql
SET
PAGESIZE 500
PROMPT=============================================================
PROMPT
Current SQL statement this session executes
PROMPT=============================================================
col
sql_text for a70 hea "Current SQL"
select
q.sql_text
from
v$session s
, v$sql
q
WHERE
s.sql_address = q.address
and s.sql_hash_value + DECODE
(SIGN(s.sql_hash_value), -1,
POWER( 2, 32), 0) = q.hash_value
AND s.sid= ;
check_username_sql.sql
set
long 500000
SET
PAGESIZE 500
select
s.username su,
substr(sa.sql_text,1,540) txt
from
v$process p,
v$session s,
v$sqlarea sa
where p.addr=s.paddr
and s.username is not null
and s.sql_address=sa.address(+)
and s.sql_hash_value=sa.hash_value(+)
and
s.username=upper('&username');
check_lock.sql
set
linesize 500
SET
PAGESIZE 500
col
waiting_session format 99999 heading 'Waiting|Session'
col
holding_session format 99999 heading 'Holding|Session'
col
mode_held format a20 heading 'Mode|Held'
col
mode_requested format a20 heading 'Mode|Requested'
col
lock_type format a20 heading 'Lock|Type'
prompt
blocked objects from V$LOCK and SYS.OBJ$
set
lines 132
col
BLOCKED_OBJ format a35 trunc
select
/*+ ORDERED */
l.sid
, l.lmode
, TRUNC(l.ctime/60) min_blocked
, u.name||'.'||o.NAME blocked_obj
from
(select *
from v$lock
where type='TM'
and sid in (select sid
from v$lock
where block!=0)) l
, sys.obj$ o
, sys.user$ u
where
o.obj# = l.ID1
and o.OWNER# = u.user#
;
prompt
blocked sessions from V$LOCK
select
/*+ ORDERED */
blocker.sid blocker_sid
, blocked.sid blocked_sid
, TRUNC(blocked.ctime/60) min_blocked
, blocked.request
from
(select *
from v$lock
where block != 0
and type = 'TX') blocker
, v$lock blocked
where
blocked.type='TX'
and
blocked.block = 0
and
blocked.id1 = blocker.id1
;
prompt
blockers session details from V$SESSION
set
lines 132
col
username format a10 trunc
col
osuser format a12 trunc
col
machine format a15 trunc
col
process format a15 trunc
col
action format a50 trunc
SELECT
sid
, serial#
, username
, osuser
, machine
FROM
v$session
WHERE
sid IN (select sid
from v$lock
where block != 0
and type = 'TX')
;
wait_events.sql
SELECT
count(*), event FROM v$session_wait
WHERE
wait_time = 0
AND
event NOT IN
('smon
timer','pmon timer','rdbms ipc message',
'SQL*Net
message from client')
GROUP
BY event ORDER BY 1 DESC
;
wait_events_sid.sql
col
username format a12
col
sid format 9999
col
state format a15
col
event format a45
col
wait_time format 99999999
set
pagesize 800
set
linesize 800
select
s.sid, s.username, se.event
from
v$session s, v$session_wait se
where
s.sid=se.sid
and
se.event not like 'SQL*Net%'
and
se.event not like '%rdbms%'
and
s.username is not null
order
by 3;
check_latch.sql
select
count(*), name latchname from v$session_wait, v$latchname
where
event='latch free' and state='WAITING' and p2=latch#
group
by name order by 1 desc;
No comments:
Post a Comment