Oct 6th 1999.

Mainframe Performance Tuning Guidelines


88/12/05, By John M McIntosh

In today's environment you may have noticed that your job only takes a few 1/10s of a CPU second to run, but can take many 10's of clock seconds to complete. Why is this? After all isn't our 6 million dollar CPU really fast? The real reason why jobs can take a long time to complete is no longer due to The CPU requirements rather it's due to I/O problems and poor use of memory.

Your system may not appear to perform poorly during the testing and implementation phase, but it may show up later in its life cycle, I.E. the difference of testing with 100's of records versus millions of production records. Therefore you should strive to build better systems during the code generation phase versus trying to fix things after the fact.

In creating new programs, or fixing old programs there are just a few guidelines to follow:

Guidelines:

1) Avoid starting up of DC Motors.

    Our current CPU is capable of 38 Million Instructions per Second, but if you access physical disk then you are causing electrical motors to rotate and physical disk drive heads to move. This causes your programs to wait, and it is measured in 3/10s of a second at minimum. Things to watch for are:

a) Disk I/O

b) VGET/VPUT on Profile variables

c) Program Fetching from load libraries.

d) Execution of TSO CLISTS

e) Excessive Paging. Hard to judge, but has relation to memory used.

 

2) Do not misuse Memory.

Do you use too little memory, or too much memory?

3) If you need to do I/O, at least tune it!

 

    Let's look at these items in more detail, but first a few CPU from the PL/I Language Reference Manual, Appendix B:

    1) Data Conversion, review your compiler listings for IEL906I messages. Verify if you really need to convert the data from one data type to another, this action is really expensive. Also consider using PL/I PIC varibles instead of CHAR variables when dealing with character numbers.

    2) Consider using the PLIXOPT variable to pass PLI/I run time data (ISASIZE...) . Not sure if this saves any time, but it's good programming practice.

    3) Use ISASIZE , HEAP and ISAINC run time options to correctly configure the amount of storage that your program will use. The default values cause PL/I to use 1/2 of the available memory in the address space, which will cause the program to wait for MVS to get the storage, also under heavy system loads your program maybe targeted to be swapped out (wait a long time) due to its heavy memory size. You can use the PL/I REPORT option to correctly calculate the ISASIZE. Read on for more information on ISASIZE.

    4) When creating array structures you should code them as:

DCL 1 RECORD
3 NAME(3000) CHAR(100)
3 NUMBER(3000) BIN FIXED(15)

    Usually NAME(X) and NUMBER(X) will be on the same storage PAGE in memory, thus no additional paging required.

    Versus

DCL 1 RECORD(3000)
3 NAME CHAR(100)
3 NUMBER BIN FIXED(15)

    In this case NAME(X) and NUMBER(X) will be on different storage pages, if there is a memory problem then you can expect a page in to occur much more frequently, thus your program will take longer to run. In fact in really bad environments it could talk minutes/hours for your program to run.

    5) Global Optimization, IEL 0919I message. Pl/I variable optimization is only done only done on the first 255 variables, To ensure that optimization is done on the more frequently used variables you should declare them in the last declare block in the main line program. Only arithmetic variables versus character variables are optimized.

    6) Use the REORDER option on the outermost PROC statement. This option permits the optimizing compiler to move invariant expressions out of loop, use machine registers to hold values, and optimize subscript calculations. One word of warning, you may need to understand the effects of this option if you attempt error recovery and change variables in an ON block since the variable could be in a register versus memory.

    7) Setting of variables in ON blocks inhibits common expression elimination, optimization, and register usage. Try not to do it. In fact the PL/I manual states that the usage of ON blocks should be avoid, try to check for errors before doing the condition that will cause the error.

    8) Use of ON STRINGSIZE and STRINGRANGE can inhibit optimization, and cause an extra subroutine calls when you use SUBSTR.

    9) A DO WHILE block will inhibit optimization of invariant expressions. A DO UNTIL is OK. (Minor issue in my opinion unless you write crummy code).

    10) When assign data between identical structures use a straight assignment versus assignment BY NAME.

    11) If you are dealing with CHARACTER NUMBERS, use PIC DECLARES versus CHAR. This will save an additional conversion process when converting to DECIMAL or FIXED.

    12) Use BIN FIXED for FLAGS instead of BIT. Use of BIT variables require more instruction to be generated just to check its state. Using a BIN FIXED and setting it to 0 or 1 is much more efficient.

    13) When working with Arrays consider the use of ARRAY arithmetic versus a DO LOOP structure.

DCL A(10) BIN FIXED,
B(10) BIN FIXED,
I BIN FIXED;
USE DO I = 1 to 10;
A=B; VERSUS A(I) = B(I);
END;

    The PL/I compiler can generate much better code for you.

    14) When using INDEX, TRANSLATE, VERIFY subroutines, try to code the second argument as a constant.

    15) When using TRANSLATE,and VERIFY subroutines, try to code the first argument as fixed length variable versus a variable length variable.

    16) Consider using BUFNO (in JCL) for sequential FILES. Read more about this later on.

    17) CODE one OPEN or CLOSE statement versus multiple ones.

    18) Use of PUT SKIP EDIT((A(I) DO I =1 to 100)(f(6)); is more efficient than coding a DO loop.

    19) Consider using RECORD I/O versus STREAM I/O.

     

Unfortunately, none of the above can save you from poor programming. For CPU bound jobs you may need to determine where the problem is.

More information comes from the PL/I Optimizing Compiler: Programmer's Guide.

 

A NOTE from PAGE 28.

    "Generally, it is unwise to rely on default settings (whether IBM-supplied or supplied by your local systems programming staff). Inappropriate settings of these options can adversely affect both the function and the performance of your programs."

    Told you so... It's your program....

A NOTE from PAGE 29.

    "It is a waste of time to undertake serious performance measurement or performance-oriented modifications of a PL/I program until the execution-time options have been set appropriately"

Fairly heavy stuff, remember default values are not there to make your program run better, rather they are magic/old/stupid numbers.

Your first objective should be is to configure your PL/I program to correctly use the available memory. This is done by using the ISASIZE option in conjunction with the ISAINC, and HEAP options!

To determine the correct ISASIZE value you can pass "ISASIZE(###k),REPORT/" to your program at run time, where ### is 128? or larger. Upon program termination a report detailing the required ISASIZE will be printing to your PLIDUMP DD statement. The objective of this exercise is to reduce the amount of MVS GETMAINS and FREEMAINS that can be invoked to run your program. When you call MVS for storage, millions! of instructions can be executed, you also give up control and wait for MVS to restart you, this can take time.

If you are coding ISASIZE(128k) for example because it "feels good", or because it's always been done that way, you may be having serious storage and performance problems.

If you are using CONTROLLED variables, or dynamically allocating BASED variables (allocate/free statements in your code), you need to choose a correct HEAP size. The information require to pick this value also comes from the REPORT option.

Beware of overspecifing ISASIZE, remember you need to share a limited amount of memory with subroutines too. Thus a few extra GETMAINS which are the result of reducing ISASIZE by a few 100K of memory is worth the cost.

 

I/O performance:

Never turn DC motors on...

If you really need to do I/O then at least do it as neatly as possible.

    1) Use of BUFNO for sequential Datasets:

    When dealing with sequential DASD datasets, MVS does some performance tuning behind your back. Instead of dealing in 1 block chunks it now deals in 5 block chunks. What this means is that for the following blocked datasets you actually deal with larger chunks of data.

    BLKSIZE BUFNO=5

    22000 110,000 bytes gotten per access

    6160 30800 bytes gotten per access

    Thus when reading a 22,000 byte blocked dataset, MVS really works with 110,800 bytes chucks by default. The objective being met here is to minimize the amount of requests made to MVS's I/O subsystem, an expensive option.

    But is 5 buffers enough? In today's environment, the answer is simple => NO. You should consider coding more. Another factor that comes into play at this time is if you exceed multiples of 31 buffers or 249,856 total bytes, then MVS will break your request into 2 or more parts and do Parallel I/O thus reducing the amount of time required to run I/O bound jobs. For example a job that read 100,000 240 byte records blocked at 24,000 took 8 seconds to run, but with 33 buffers it took only 6.5 seconds. If this job did a lot of process between request for data records the response time would decrease more!

    You must also recognize that by increasing the BUFNO your job now require extra MEMORY, which can have an impact on your run time. In fact coding too many buffer can slow things down. Never code BUFNO < 5 unless you really understand the implications, and finally do not over code BUFNO, if you dataset is only 100K in size, don't code 200K of buffer space.

    Also don't bother with this parm for SORTIN, SORTWK, or SORTOUT (sort datasets). Sort does its own special I/O processing to reduce EXCP and coding BUFNO will only confuse it.

    2) Better Blocking of datasets.

    Yes 6160 is another magic number, recommended 10 odd YEARS ago. It no longer as any value other than a magic number. Try and get as close as possible to 22,340 for DASD datasets. This rule does not apply to TAPE where 32767 block area always optimal. As a side point watch blocksizes when you concatenate datasets.

    3) PL/I RECORD I/O versus STREAM I/O.

    Use RECORD I/O in conjunction with READ SET which is much more efficient than READ INTO. As converse LOCATE which is like WRITE FROM for output doesn't seem to make a difference in run time. You can save up to 26% of the CPU which is dedicated to FILE I/O, by converting to READ SET.

    3) Never call CLISTS from PL/I to allocate datasets or invoke LOGIT.

    Calling a CLIST to allocate a dataset will cost you 1.5 clock seconds! and cost you 50 I/Os. Calling LOGIT will take almost 4 clock seconds. Both of these tasks can be performed in PL/I using dynamic allocation without the associated costs.

    4) VSAM datasets?

    NEVER NEVER use the default BUFI and BUFD values for VSAM, these default values give you the WORST possible case in terms of performance.

    Check with the Software Group on the use of VIOPLUS, Coding a special DD statement can improve a batch jobs CPU and run time performance. You can reduce the number of EXCP to a VSAM dataset by up to 50% with correct choices, along with reducing run time or response time by 70%. In fact the best performance can be gained by adding the follow statement to any JCL step that uses VSAM.

    //VIOCTL DD DISP=SHR,AMP='AMORG',VOL=REF=SYS1.LINKLIB,

    // DSNAME=ACTIVATE.NLSR512

    Remember to add about 600K to the region size for the job. For Online VSAM systems check BUFI and BUFD for correct numbers.

    STROBE?

    PMO?

    VIO:

    If you need a work dataset that's under 16 cyls in size then consider coding UNIT=VIO versus UNIT=SYSDA/WORK/DASD. (VIO stands for Virtual I/O). MVS will keep 1 track (47K) of data in memory for you making access much faster,. For writing you can save upto 50% of the DASD clock time. In the next release of MVS the entire dataset will attempt to stay in memory, making things MUCH faster. You avoid moving those DC motors....

    DIV: Data in Virtual.

    Keep this buzzword in mind, along with HYPERSPACES (no kidding). These new file types allow you to keep small files in memory versus on DASD, hopefully someday you can keep DATA/ CLISTS/ PANELS/ LOAD Modules in memory for immediate retrieval.

    SORTING:

    When invoking via a program running under TSO, avoid allocating the SORTWRK datasets in a CLIST, instead code the dynamic allocate request on the SORT command or in the $ORTPARM.

    Efficient sorting? Little did you know but SORT is only using 1024k at maximum to sort your data, even if you code huge region sizes. To increase the amount of memory that sort will use, you need to code VSCORE=####K in your $ortparm dataset. Beware that during prime time your job could take longer to run due to excessive storage requirements. Some testing will tell you what the optimum value is. Testing has shown that asking for too much memory, can make your sort run slower, even under ideal conditions, try values from 2048k to 5120k.