Microway MPI Link-Checker Supplemental Instructions (mpilc)
Copyright 2023 Microway, Inc.

The MPI program "mpilc" can be run as a standalone program for at
least three purposes:

1. To generate performance data to send to Microway for analysis.
   Anyone may do this (no license key required), and Microway will be
   pleased to analyze your cluster's performance.

2. To generate performance data that can be viewed later. The output
   is not human-readable, but with a valid MPI Link-Checker license it
   can be viewed using the MPI Link-Checker GUI program ("lc")

3. To generate ASCII data that you can read or parse using some other
   program. To generate ASCII output you must have a valid MPI
   Link-Checker license.

BUILDING MPILC

"mpilc" is distributed as an obfuscated source file, mpilc.c. You
should compile it as you usually compile MPI programs. For example:

mpicc -o mpilc mpilc.c

Then, as with any MPI program, you must distribute it to the nodes of
your cluster.

Note that this is exactly the same mpilc program used in normal
operation of MPI Link-Checker with its GUI.

The instructions in the comments at the beginning of the source file
refer only to generating data to send to Microway for analysis.

RUNNING MPILC

You run mpilc as you would run any MPI program. Generating data for
analysis by Microway requires no command line options beyond the usual
MPI options. In all cases, output is sent to standard output, so you
will almost certainly want to redirect it to a file. For example:

mpirun -np 16 /usr/local/microway/mpilc > /tmp/mpilc-output

You can limit the wall clock running time to a specified number of
seconds:

mpirun -np 16 /usr/local/microway/mpilc time 60 > /tmp/mpilc-output

You can specify a file containing the sequence of measurements to be
performed:

mpirun -np 16 /usr/local/microway/mpilc file my-sequence > /tmp/mpilc-output

You can request ASCII output:

mpirun -np 16 /usr/local/microway/mpilc ascii > /tmp/mpilc-output

These options can be combined in any order:

mpirun -np 16 /usr/local/microway/mpilc file my-sequence ascii time 120 > /tmp/mpilc-output

MEASUREMENT SEQUENCE INPUT FILE

The file that specifies the measurement sequence consists of one line
for each measurement. The measurements are "bandwidth", "latency", and
"accuracy", which may be abbreviated "bw", "lat", and "acc",
respectively. Each measurement must specify a message size in bytes.

A group of measurements may be repeated by enclosing them in a
"repeat"/"end" block. The repeat statement may be abbreviated
"rep". The repeat statement must be followed by a count.

Blank lines and extra spaces within a line are ignored.

Example:

--------------------
repeat 10
 latency 0
 bandwidth 4194304
end

accuracy 65536

rep 5
 lat 256
 bw 65536
 lat 128 
end
--------------------

ASCII OUTPUT FILE

The ASCII output consists of a 3-line header, one line containing a
list of the node names corresponding to the MPI processes, a "mac 1"
line, a list of the MAC addresses of the nodes in your cluster, one to
a line in the same order as the node names, and the performance data.

The performance data consists of one line describing the measurement
(type of measurement and message size), followed by one line for each
MPI process, as a receiver of data sent by each of the processes.
("Senders across, receivers down.") Data along the main diagonal
(northwest-to-southeast corner) is for a process sending data to
itself, which is more a function of your MPI implementation than of
the health of your cluster.

Latency (transfer time) is expressed in microseconds, and bandwidth is
expressed in megabytes per second. Within MPI Link-Checker, 1 megabyte
is exactly 1,000,000 bytes.

Accuracy measurements check that the received data is the same as the
transmitted data. An accuracy measurement will produce no output
unless there were unrecoverable data transmission errors. This almost
never happens, and in fact accuracy measurements can probably be
omitted.

As an example, in the following output, the first measurement (0-byte
latency) shows that the latency sending from node26 to master was
1.703 microseconds, and the latency from master to node26 was 1.578
microseconds.

-------------------------

Microway MPI Link-Checker
Test with 4 processes
2010-03-30 15:53:45
master node2 node25 node26
mac 1
00.30.48.C9.A0.F6
00.30.48.C9.A0.F4
00.17.31.3F.01.F1
00.17.31.3F.03.7F
latency 0
   0.125    1.406    1.656    1.703
   1.438    0.125    1.656    1.609
   1.656    2.938    0.406    1.828
   1.578    1.547    1.797    0.406
bandwidth 4191304
6749.282 2981.013 1465.491 1458.352
2976.778 7176.891 1465.491 1465.491
1715.638 1712.134  749.652 1450.780
1714.936 1714.235 1445.277 1959.469
latency 0
   0.125    1.375    1.656    1.797
   1.406    0.094    1.641    1.797
   1.625    1.766    0.438    2.188
   1.578    1.594    1.844    0.344
bandwidth 4191304
7733.033 2981.013 1193.764 1454.304
2962.052 7606.722 1076.626 1452.288
1713.534 1714.235  962.191 1389.229
 188.891 1714.936 1432.924 2268.022
latency 0
   0.125    1.422    2.172    2.188
   1.406    0.094    2.219    1.781
   2.141    1.594    0.406    2.188
   1.547    1.547    1.797    0.344
bandwidth 4191304
7471.130 3050.440 1448.274 1454.304
2993.789 7878.391 1434.396 1429.991
1713.534 1713.534 1078.288  851.026
1714.235  216.863 1063.243 2327.209
