> ... and assembly language programming is largely about a culture and
> a mode of expression shared by a group of specialized people.
Well said. Human readability is more important than mere assemblability.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> That said, what do people think about things like:
> BCTR R6,R0
> vs.
> BCTR R6,0
> I prefer, nay insist on, the latter because you're not really using
> register 0 and it should not be counted by the assembler (or the
> editor's FIND command) as a "reference".
There we agree.
> Problems: you can't use them intelligently in instructions like BXLE
> or MR which use register pairs;
You can if the author has done things systematically instead of
haphazardly.
Like any tool, EQU is a good servant but a poor master. Used properly,
it can save you a lot of time any make the code easier to maintain and more
readable; used improperly, it can sink you in a morass.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> Maybe, but those numbers look awfully bare without the preceeding "R"
> to me. What difference does one extra character make especially when
> assembly source is typically 80 column "card" image.
If you don't like the constraint of 80-character "card" images, you can
use the input exit ASMAXINV shipped (as sample code) with High Level
Assembler Release 2. It allows you to create V-format input to the
assembler; the exit then takes care of converting this user-friendly
input to the traditional fixed format the assembler digests. (See p.324
of the HLASM Programmer's Guide for details.)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
How about clearing registers? Again, the performance difference
won't be dramatic in any case, but as obvious choices we have
LA R15,0(0,0)
XR R15,R15
SR R15,R15
SLR R15,R15
which can be used interchangeably if the value of the CC does not
matter. I dislike the first because I have a hard time doing in
4 bytes what can be done in 2. Of the remaining choices, I prefer
XR because it is like XC, which can be similarly used to initialize
a field to binary zeros. On the other hand IBM code I've read usually
chooses SR or occasionally SLR. I recall reading somewhere that XR
should be fastest, but I have never tested this and doubt there is
much of a difference.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> matter. I recall reading somewhere that XR should be fastest
Once upon a time, a long long time ago IBM used to publish a "Functional
Characteristics" book on each of its processors and in this book the published
instruction timings. I recall that SLR was the fastest way.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Speaking of style let me add my own element. I prefer NOT to put labels on
instructions. Labels always go on a statement themselves along with a DS 0H
(e.g. RETURN DS 0H). This insures that a label doesn't get deleted when
deleting the instruction it is attached to and also insures halfword alignment
unlike "RETURN EQU *". It also makes it easier to comment out sections of
code because column 1 is always blank so you can just put an asterisk there.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> All-in-all style, like other things is in the eye of the beholder, >IMHO the
idea is to make the program work.
I can't agree with you on that point. The only time
that is true is when there is only a single programmer
working on the system and the program will be run once and then
thrown away.
> BAL is my first love also, but I take the other view. It's lots
> of relative addressing and registers. I guess it's my scientific
> background, lots of table searching does that to you.
> The code is the only true document.
> Back in the 360 days I would LPSW rather than a B.
>
Many of things mentioned in "style" also have efficiency
implications, if you write an operating system exit that is executed
hundreds or thousands of times each day you had better make sure that
you have given some thought to efficiency.
Code readability is also important, many of the "one time programs"
that I have written have survived longer than those written to be
part of a major application system. When there is a problem in the
program and you need to look at the source, documentation and
consistency make the job much much easier. While using LPSW instead
of a branch may seem obvious to us dinosaurs some less experienced
person may have to pick up the code when there is a problem, they
don't know much about the PSW other than that's where to look
in a dump to see where the program blew up.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I use not only DS 0H, but also (in large programs, when I'm up to the
additional effort) DS 0Y. The DS 0Y go on labels which are only
referenced "locally", i.e., within a few lines of them. For example,
the oh-so-common construct:
TM SOMEFLAG,SOMEBIT
BO SKIPONE
SKIPONE DS 0Y
The label on the subroutine would be a DS 0H because it's referenced
elsewhere.
This isn't perfect, of course, but if followed, helps avoid extra
searching for "Who else might get us to this line?"
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>if the instruction stream is terminated by an unconditional branch, as
>it is almost certain to be. Sure, there may be a level at which bytes
>are blindly fetched ahead of the instruction counter, but it won't last
>long - the I- unit will soon realize that it's a bad place to be fetching
>from.
While the I-unit should be able to detect an unconditional branch,
that is only part of the problem. Cache lines are a good deal longer
than instructions are, so you end up with cache lines containing
both instructions and data. If the data changes, the entire
line is modified. This isn't really much of a problem for a set
of instructions that are only executed a few times, but within
loops this could be noticeable, especially if the data is modified.
An annoying aspect of this problem is that it can get better or worse
depending on how the code is aligned relative to cache line boundaries,
so making an unrelated change in the program or running on a different
processor may unexpectedly degrade performance.
>From: "Shmuel (Seymour J.) Metz"
> > Once upon a time, a long long time ago IBM used to publish a
> > "Functional Characteristics" book on each of its processors and
> > in this book the published instruction timings. I recall that SLR
> > was the fastest way.
>
> > Ken (kgunther@delphi.com)
>
>Only on specific models; on some XR was faster and on some LA was
>faster. BTW, they still publish the "funky specs", but they no longer
>contain timings. In the last one (370/168) that had timings, you had
>timing formulae rather than simple numbers, and I'm sure that if they
>published timing information on current models you wouldn't want to
>drop one of the manuals on your foot .
>
Style can't be answered definitively since it is subjective, but
times can be found. To see how LA, XR, SR, and SLR compare for
clearing a register, I wrote a program to generate then time execution
of N successive copies of each instruction. Varying N from 50000 to
1000000, I got the following results:
N 9021-580, CMS version 9672, MVS version
LA others LA others
50 000 27 15 16 14
100 000 33 15 21 13
200 000 32 16 22 16
500 000 32 16 22 17
1 000 000 32 17 22 17
All times are in ns, and hopefully no errors...
That is, XR, SR, and SLR are practically the same (the run-to-run
variation was more than the time differences), and about 1/2 to
3/4 of the time of LA. The increased time with longer series may
be due to cache delays. (Note that LA is 2X longer and is affected
first.)
I was planning to append the program here, but it grew to almost
500 lines, so I thought it might be better not to. However, if anyone
would like a copy, send me e-mail, and I'll send one back. The
program is conditionally assembled based on &SYSTEM_ID, and should
work in CMS or MVS without modification.