|
 |
|
|
Oct 6th 1999. |
Mainframe Performance Tuning Guidelines
|
|
88/12/05, By John M McIntosh
In today's environment you may have noticed
that your job only takes a few 1/10s of a CPU second to run,
but can take many 10's of clock seconds to complete. Why is this?
After all isn't our 6 million dollar CPU really fast? The real
reason why jobs can take a long time to complete is no longer
due to The CPU requirements rather it's due to I/O problems and
poor use of memory.
Your system may not appear to perform poorly
during the testing and implementation phase, but it may show
up later in its life cycle, I.E. the difference of testing with
100's of records versus millions of production records. Therefore
you should strive to build better systems during the code generation
phase versus trying to fix things after the fact.
In creating new programs, or fixing old
programs there are just a few guidelines to follow:
Guidelines:
1) Avoid starting up of DC Motors.
Our current CPU is capable of 38 Million
Instructions per Second, but if you access physical disk then
you are causing electrical motors to rotate and physical disk
drive heads to move. This causes your programs to wait, and it
is measured in 3/10s of a second at minimum. Things to watch
for are:
a) Disk I/O
b) VGET/VPUT on Profile variables
c) Program Fetching from load libraries.
d) Execution of TSO CLISTS
e) Excessive Paging. Hard to judge, but
has relation to memory used.
2) Do not misuse Memory.
Do you use too little memory, or too much
memory?
3) If you need to do I/O, at least tune
it!
Let's look at these items in more detail,
but first a few CPU from the PL/I Language Reference Manual,
Appendix B:
1) Data Conversion, review your compiler
listings for IEL906I messages. Verify if you really need to convert
the data from one data type to another, this action is really
expensive. Also consider using PL/I PIC varibles instead of CHAR
variables when dealing with character numbers.
2) Consider using the PLIXOPT variable
to pass PLI/I run time data (ISASIZE...) . Not sure if this saves
any time, but it's good programming practice.
3) Use ISASIZE , HEAP and ISAINC run time
options to correctly configure the amount of storage that your
program will use. The default values cause PL/I to use 1/2 of
the available memory in the address space, which will cause the
program to wait for MVS to get the storage, also under heavy
system loads your program maybe targeted to be swapped out (wait
a long time) due to its heavy memory size. You can use the PL/I
REPORT option to correctly calculate the ISASIZE. Read on for
more information on ISASIZE.
4) When creating array structures you should
code them as:
DCL 1 RECORD
3 NAME(3000) CHAR(100)
3 NUMBER(3000) BIN FIXED(15)
DCL 1 RECORD(3000)
3 NAME CHAR(100)
3 NUMBER BIN FIXED(15)
In this case NAME(X) and NUMBER(X) will
be on different storage pages, if there is a memory problem then
you can expect a page in to occur much more frequently, thus
your program will take longer to run. In fact in really bad environments
it could talk minutes/hours for your program to run.
5) Global Optimization, IEL 0919I message.
Pl/I variable optimization is only done only done on the first
255 variables, To ensure that optimization is done on the more
frequently used variables you should declare them in the last
declare block in the main line program. Only arithmetic variables
versus character variables are optimized.
6) Use the REORDER option on the outermost
PROC statement. This option permits the optimizing compiler to
move invariant expressions out of loop, use machine registers
to hold values, and optimize subscript calculations. One word
of warning, you may need to understand the effects of this option
if you attempt error recovery and change variables in an ON block
since the variable could be in a register versus memory.
7) Setting of variables in ON blocks inhibits
common expression elimination, optimization, and register usage.
Try not to do it. In fact the PL/I manual states that the usage
of ON blocks should be avoid, try to check for errors before
doing the condition that will cause the error.
8) Use of ON STRINGSIZE and STRINGRANGE
can inhibit optimization, and cause an extra subroutine calls
when you use SUBSTR.
9) A DO WHILE block will inhibit optimization
of invariant expressions. A DO UNTIL is OK. (Minor issue in my
opinion unless you write crummy code).
10) When assign data between identical
structures use a straight assignment versus assignment BY NAME.
11) If you are dealing with CHARACTER NUMBERS,
use PIC DECLARES versus CHAR. This will save an additional conversion
process when converting to DECIMAL or FIXED.
12) Use BIN FIXED for FLAGS instead of
BIT. Use of BIT variables require more instruction to be generated
just to check its state. Using a BIN FIXED and setting it to
0 or 1 is much more efficient.
13) When working with Arrays consider the
use of ARRAY arithmetic versus a DO LOOP structure.
DCL A(10) BIN FIXED,
B(10) BIN FIXED,
I BIN FIXED;
USE DO I = 1 to 10;
A=B; VERSUS A(I) = B(I);
END;
The PL/I compiler can generate much better
code for you.
14) When using INDEX, TRANSLATE, VERIFY
subroutines, try to code the second argument as a constant.
15) When using TRANSLATE,and VERIFY subroutines,
try to code the first argument as fixed length variable versus
a variable length variable.
16) Consider using BUFNO (in JCL) for sequential
FILES. Read more about this later on.
17) CODE one OPEN or CLOSE statement versus
multiple ones.
18) Use of PUT SKIP EDIT((A(I) DO I =1
to 100)(f(6)); is more efficient than coding a DO loop.
19) Consider using RECORD I/O versus STREAM
I/O.
Unfortunately, none of the above can save
you from poor programming. For CPU bound jobs you may need to
determine where the problem is.
More information comes from the PL/I Optimizing
Compiler: Programmer's Guide.
A NOTE from PAGE 28.
"Generally, it is unwise to rely
on default settings (whether IBM-supplied or supplied by your
local systems programming staff). Inappropriate settings of these
options can adversely affect both the function and the performance
of your programs."
Told you so... It's your program....
A NOTE from PAGE 29.
Fairly heavy stuff, remember default values
are not there to make your program run better, rather they are
magic/old/stupid numbers.
Your first objective should be is to configure
your PL/I program to correctly use the available memory. This
is done by using the ISASIZE option in conjunction with the ISAINC,
and HEAP options!
To determine the correct ISASIZE value
you can pass "ISASIZE(###k),REPORT/" to your program
at run time, where ### is 128? or larger. Upon program termination
a report detailing the required ISASIZE will be printing to your
PLIDUMP DD statement. The objective of this exercise is to reduce
the amount of MVS GETMAINS and FREEMAINS that can be invoked
to run your program. When you call MVS for storage, millions!
of instructions can be executed, you also give up control and
wait for MVS to restart you, this can take time.
If you are coding ISASIZE(128k) for example
because it "feels good", or because it's always been
done that way, you may be having serious storage and performance
problems.
If you are using CONTROLLED variables,
or dynamically allocating BASED variables (allocate/free statements
in your code), you need to choose a correct HEAP size. The information
require to pick this value also comes from the REPORT option.
Beware of overspecifing ISASIZE, remember
you need to share a limited amount of memory with subroutines
too. Thus a few extra GETMAINS which are the result of reducing
ISASIZE by a few 100K of memory is worth the cost.
I/O performance:
Never turn DC motors on...
If you really need to do I/O then at least
do it as neatly as possible.
1) Use of BUFNO for sequential Datasets:
When dealing with sequential DASD datasets,
MVS does some performance tuning behind your back. Instead of
dealing in 1 block chunks it now deals in 5 block chunks. What
this means is that for the following blocked datasets you actually
deal with larger chunks of data.
BLKSIZE BUFNO=5
22000 110,000 bytes gotten per access
6160 30800 bytes gotten per access
Thus when reading a 22,000 byte blocked
dataset, MVS really works with 110,800 bytes chucks by default.
The objective being met here is to minimize the amount of requests
made to MVS's I/O subsystem, an expensive option.
But is 5 buffers enough? In today's environment,
the answer is simple => NO. You should consider coding more.
Another factor that comes into play at this time is if you exceed
multiples of 31 buffers or 249,856 total bytes, then MVS will
break your request into 2 or more parts and do Parallel I/O thus
reducing the amount of time required to run I/O bound jobs. For
example a job that read 100,000 240 byte records blocked at 24,000
took 8 seconds to run, but with 33 buffers it took only 6.5 seconds.
If this job did a lot of process between request for data records
the response time would decrease more!
You must also recognize that by increasing
the BUFNO your job now require extra MEMORY, which can have an
impact on your run time. In fact coding too many buffer can slow
things down. Never code BUFNO < 5 unless you really understand
the implications, and finally do not over code BUFNO, if you
dataset is only 100K in size, don't code 200K of buffer space.
Also don't bother with this parm for SORTIN,
SORTWK, or SORTOUT (sort datasets). Sort does its own special
I/O processing to reduce EXCP and coding BUFNO will only confuse
it.
2) Better Blocking of datasets.
Yes 6160 is another magic number, recommended
10 odd YEARS ago. It no longer as any value other than a magic
number. Try and get as close as possible to 22,340 for DASD datasets.
This rule does not apply to TAPE where 32767 block area always
optimal. As a side point watch blocksizes when you concatenate
datasets.
3) PL/I RECORD I/O versus STREAM I/O.
Use RECORD I/O in conjunction with READ
SET which is much more efficient than READ INTO. As converse
LOCATE which is like WRITE FROM for output doesn't seem to make
a difference in run time. You can save up to 26% of the CPU which
is dedicated to FILE I/O, by converting to READ SET.
3) Never call CLISTS from PL/I to allocate
datasets or invoke LOGIT.
Calling a CLIST to allocate a dataset will
cost you 1.5 clock seconds! and cost you 50 I/Os. Calling LOGIT
will take almost 4 clock seconds. Both of these tasks can be
performed in PL/I using dynamic allocation without the associated
costs.
4) VSAM datasets?
NEVER NEVER
use the default BUFI and BUFD values for VSAM, these default
values give you the WORST possible case in terms of performance.
Check with the Software Group on the use
of VIOPLUS, Coding a special DD statement can improve a batch
jobs CPU and run time performance. You can reduce the number
of EXCP to a VSAM dataset by up to 50% with correct choices,
along with reducing run time or response time by 70%. In fact
the best performance can be gained by adding the follow statement
to any JCL step that uses VSAM.
//VIOCTL DD DISP=SHR,AMP='AMORG',VOL=REF=SYS1.LINKLIB,
// DSNAME=ACTIVATE.NLSR512
Remember to add about 600K to the region
size for the job. For Online VSAM systems check BUFI and BUFD
for correct numbers.
STROBE?
PMO?
VIO:
If you need a work dataset that's under
16 cyls in size then consider coding UNIT=VIO versus UNIT=SYSDA/WORK/DASD.
(VIO stands for Virtual I/O). MVS will keep 1 track (47K) of
data in memory for you making access much faster,. For writing
you can save upto 50% of the DASD clock time. In the next release
of MVS the entire dataset will attempt to stay in memory, making
things MUCH faster. You avoid moving those DC motors....
DIV: Data in Virtual.
Keep this buzzword in mind, along with
HYPERSPACES (no kidding). These new file types allow you to keep
small files in memory versus on DASD, hopefully someday you can
keep DATA/ CLISTS/ PANELS/ LOAD Modules in memory for immediate
retrieval.
SORTING:
When invoking via a program running under
TSO, avoid allocating the SORTWRK datasets in a CLIST, instead
code the dynamic allocate request on the SORT command or in the
$ORTPARM.
Efficient sorting? Little did you know
but SORT is only using 1024k at maximum to sort your data, even
if you code huge region sizes. To increase the amount of memory
that sort will use, you need to code VSCORE=####K in your $ortparm
dataset. Beware that during prime time your job could take longer
to run due to excessive storage requirements. Some testing will
tell you what the optimum value is. Testing has shown that asking
for too much memory, can make your sort run slower, even under
ideal conditions, try values from 2048k to 5120k.
|