InfiniScope™

InfiniBand connectivity and performance analysis tool

Microway, Inc.

Version 4.0.1

Copyright 2006-2025

Contents

Prerequisites

You must have a Linux-based cluster. You must have InfiniBand HCAs on your nodes, and one or more properly-functioning InfiniBand switches connecting the HCAs.

You must have the OFED (Open Fabrics Enterprise Distribution) software installed on the cluster. InfiniScope has been tested with both the open source OFED provided with Linux distributions and Mellanox OFED which Nvidia provides directly. Three things from the OFED distribution are required:

  1. The OpenSM subnet manager.
  2. The include files, typically found in /usr/include/infiniband.
  3. The libraries libibumad and libibmad typically found in /usr/lib64 or /usr/lib/x86_64-linux-gnu

Previous versions of InfiniScope required that Python and wxPython be installed, but this is no longer the case.

To run flop, the Fabric Loading Program, you must have MPI installed. Any version should work, but of course it must be an InfiniBand-capable version like mvapich if you want to exercise the InfiniBand fabric.

Installation

InfiniScope is distributed as a compiled Python application. After installation, it consists of a Python/wxPython-based interface program.

The InfiniScope collector program (named iscollect) must be installed on your cluster. It is distributed as obfuscated C source code, which is compiled on your cluster with a C compiler. You do not have to modify the source.

A separate “Fabric LOading Program” (named flop) is included in the InfiniScope distribution. It is a C/MPI program distributed as obfuscated source code, which is compiled on your cluster with an mpicc wrapper around a C compiler.

Installation procedure

MPI Link-Checker is distributed as a compiled Python application. The packages contains a README file with complete installation and configuration instructions. In summary:

  1. Install the Link Checker and/or Infiniscope rpm or deb packages for your distribution from Microway's repo at https://repo.microway.com. The packages "microway-mmds-infiniscope" and "microway-mmds-link-checker" provide the respective applications. They both depend on the "microway-mmds-common" package proving the libraries required for both applications.
  2. Copy your purchased license key file into /usr/local/microway/microway.key, owned by root:root with permissions 644
  3. The rpm/deb packages will automatically build the mpilc binary used by Link Checker, and the iscollect and flop binaries used by Infiniscope. If mpicc is not found during the package installation, you can build them manually in /usr/local/microway by running "make -f Makefile.mpilc" or "make -f Makefile.iscope"
  4. The mpirun command Link Checker uses to start mpilc is specified in /usr/local/microway/link-checker.conf. Most clusters require setting the parameter to be able to distribute a single process per system. See comments in the file for guidance.
  5. The default settings for Infiniscope do not usually require modification. If required, edit /usr/local/microway/infiniscope.conf

Installation notes

If using the multicollector mode for Infiniscope (not the default), make sure that /usr/local/microway/iscollect is available on all nodes of the cluster. This is not an MPI program, but it must run on all nodes of the cluster to collect data from the InfiniBand ports. It must be owned by root, with group "infiniscope", and protection 4750. Setuid root is required to read and write the port registers on the InfiniBand HCAs and switches.

If you didn't do so during the running of the installation script, add users to the infiniscope group by editing /etc/group or by creating the group in a cluster wide user database such as LDAP.

Put /usr/local/microway in the path for all users who are going to be using InfiniScope.

Uninstalling

Link Checker and Infiniscope can be uninstalled by removing the packages with your standard package management tool. Red Hat style distributions use yum or dnf and Debian/Ubuntu style distros use apt.

Starting InfiniScope

Start InfiniScope by typing "/usr/local/microway/is".

Several command line options may be used. They may be combined and used in any order, although some combinations do not make sense. You may but need not put a space between a single letter abbreviation and a following value. You may put a space or an equal sign between an extended option name and a following value.

Note that most of the command line options may be specified in the configuration file. For options that change only rarely, using the configuration file will probably be more convenient.

Option summary

The command syntax is as follows:

is -h | --help 
or:
is -v | --version 
or:
is [-i | --init-file <configuration file name>]
    [-d | --data-file <input data file name>]
    [-o | --output <output data file name>]
    [-A | --ascii]
    [-m | --map <name of file containing displayed name map>]
    [-g | --gui-hostname <GUI host name>]
    [-C | --collector-ssh-command "<command for starting InfiniScope data collector>"]
    [-F | --feeder-ssh-command "<command for starting InfiniScope data feeders>"]
    [-p | --collector-program-name <collector program name>]
    [-M | --multiple-collectors]
    [-x | --excluded-nodes "<node node ... (nodes not running collector)>"]
    [-a | --allow-multiples]
    [-e | --expansion <fabric display box expansion factor>]
    [-n | --no-logging]
    [-L | --looping]
Here is more detail on each of the options:
--help (or -h)
displays a program usage message
--version (or -v)
displays the InfiniScope version number
--init-file (or -i)
specifies the name of an alternate configuration file. On startup, the program will look for this file if specified. Otherwise, it will look for /usr/local/microway/infiniscope.conf. The options --init-file, --data-file, --output, --help, and --version may be specified only on the command line. All other options may be specified either in the configuration file or on the command line. If an option appears both on the command line and in the configuration file, the value on the command line will be used.
--data-file (or -d)
specifies the name of the data file if you are going to display existing data
--output (or -o)
specifies the name of the output file for all data for this run
--ascii (or -A)
produce data file output in ASCII instead of binary
--map (or -m)
specifies the name of the file that maps GUIDs, Microway HCA names, or node names to abbreviated host names in fabric display.
--gui-hostname (or -g)
specifies the name or IP address of the host on which the GUI is running.
--collector-ssh-command (or -C)
command to access the node on which the collector will run
--feeder-ssh-command (or -F)
command to remotely start data feeders on fabric nodes
--collector-program-name (or -p)
specifies the name of the collector program that reports the performance measurements
--multiple-collectors (or -M)
turns off default single collector mode, in which all performance queries are made on the GUI node
--excluded-nodes (or -x)
specifies the names of nodes that should not be queried or displayed
--allow-multiples (or -a)
do not terminate other instances of InfiniScope that are currently running
--expansion (or -e)
factor to expand or contract size of boxes in fabric display.
--no-logging (or -n)
turn off logging of console messages to a file.
--looping (or -L)
causes display of recorded data to loop.

The display

Top part of display

The top row of boxes shows the color and shape scale for the port bandwidth display. The next row of boxes (or more than one row if there are more than 24 HCA ports) shows the measured short term bandwidth for each port of each host. The remaining rows of boxes show the bandwidth for each port of each switch, one switch per line.

The boxes at both ends of a connection are outlined with a color marking the connection speed per lane.

RateColor
SDRRed
DDRBlue
QDRGreen
FDR10Green
FDRPurple
EDROrange
HDRTeal
HDR100Brown

The boxes at both ends of a 1X connection and the line between them will flash yellow and black. With the node selected, the status bar at the bottom of the window will also show the 1X status.

If a port has seen any hardware errors, there will be a red "E" over its box. Errors already present in the hardware port registers when you start InfiniScope are suppressed, as well as many errors that occur when the fabric topology changes. Infrequent errors are not problematic. Only frequent errors are cause for concern.

By moving the mouse over a box, you select the port corresponding to that box for the graph at the bottom. Instead of selecting a single port, you may select a switch by moving the mouse over its label on the left, or you may select "all hosts" by moving the mouse over the "Hosts" label on the left. The selected box (port, switch, or all hosts) will be outlined with a colored square or rectangle. If a selected port has seen hardware errors, details of the errors will be shown in the graph in the lower part of the display (see below).

If you click a box, that box will be locked as the selection. Clicking again anywhere unlocks the selection.

If a switch is selected, there will be colored lines showing all connections to that switch. Unless the fabric is too big, the colors of the lines change randomly so that they can be distinguished from each other more easily.

Similarly, if "All hosts" is selected, there will be colored lines showing all connections to HCAs. Again, the colors of the lines change randomly unless the fabric is too big.

If a single port is selected, there will be a colored line showing the connection to that port, and the port at the other end of the connection will be outlined with a gray square.

In an HCA port is selected, all switch ports on paths to the selected port (not from the port: almost every port beyond the first switch can occur on paths from a port) will be outlined with a gray square. This is just some extra information that might be helpful in determining the cause of unexpectedly poor behavior.

If a host HCA port is locked as the selection, you can move the mouse over a different host HCA port. Doing so will cause the path through the fabric from the selected port to the port under the mouse to be displayed as a set of colored lines.

Bottom part of display

The bottom part of the display shows the average transmit bandwidth from the selected port or group of ports. Measurements are made as often as specified, at intervals from 1 millisecond to 1 second. The bandwidth can be shown as measured, or averaged in groups of 2, 4, 8, …, 512, 1024 measurements. This means that the full time scale of the graph can range from about one second to nearly two weeks. The default measurement interval is 100 ms, displayed as measured with no averaging, leading to a full time scale of one or two minutes. If the full time scale is less than an hour, the time axis will be labeled with minutes and/or seconds into the past. If the time scale is an hour or more, the time axis will be labeled with time-of-day (and day-of-week for previous days).

If you click on the graph, the data for the selected port will be cleared and the graph will be restarted. This can be useful to rescale the graph. If you double-click on the graph, the data for all ports will be cleared.

If you right-click on the graph, it will be frozen. (Data collection will continue.) Right-clicking again will unfreeze the graph.

If a port is selected and the port has seen hardware errors, the type and number of errors will be shown at the bottom of the graph, using different colored text for different error types. The indication will be something like “Sym:12/561”, indicating that a total of 561 symbol errors were observed during 12 different queries.

The times of the errors will be indicated on the graph with colored marks, the color of a mark corresponding to the color of the text for the corresponding error count. Full details of errors will also be written to the console and the log file.

Menus

Control menu

Reset error indicators

You can clear the "E" error markings over port boxes by clicking on the "Reset error indicators" item in the Control menu. The errors are reported to the console and also appended to the log file, so you don't have to worry about losing the error reports. You can also clear the error indicators by typing Control-R.

Annotate log file

You can add an annotation to the log file by clicking on the "Annotate log file" item in the Control menu. A dialog window will pop up; just enter your annotation and click "OK". The annotation will be timestamped and placed in the log file. You can also bring up the annotation dialog by typing Control-A.

Stopping InfiniScope

You can stop InfiniScope by clicking on the "Exit" item in the Control menu, by clicking on the X at the top right of the window frame, or by typing Control-C over the display or in the console window where you started the program.

Save data menu

Record and End recording

You can begin recording all measurements by clicking the "Record" item of the Save data menu. Data will be saved for all ports, both bandwidth and error data. The data will be saved in the specified output file (or /tmp/infiniscope.data if no file was specified).

To stop recording, click the "End recording" item of the Save data menu. You can resume recording later if you wish. The recorded data will not show a gap, and the times on the graph scale may be wrong.

Control buttons

Reset

Clicking the Reset button resets the measurement interval, graph scale, and scan time to their default initial values, 100 ms, between 1 and 2 minutes, and 0.5 sec respectively.

Freeze/Gather

Clicking the Freeze button stops the gathering of data. You can look at the graph for any port or switch or for all hosts, you can change the graph time scale, and you can start or stop scanning or change the scan rate. When you click it, the Freeze button becomes a Gather button.

Clicking the Gather button unfreezes the display and resumes gathering data. All the data history will be reset. When you click it, the Gather button returns to being a Freeze button.

Changing the measurement interval or clicking the Reset button also unfreezes the display, resumes data gathering, and changes the Gather button to a Freeze button.

Measurement time

You can specify how often you want InfiniScope to query the ports in the fabric by clicking the measurement time '+' or '−' buttons. The measurement interval can be any of a large set of predetermined values between 1 millisecond and 1 second. The '−' button shortens the time between measurements, and the '+' button lengthens it. Changing the measurement interval discards all measurements made with a different measurement interval. When you change the measurement interval, the graph scale will change accordingly.

If you set a very short interval (a few milliseconds), it may be impossible to query the fabric often enough. In this case, some measurements will be skipped to try to keep up with the measurement clock. The bandwidth graphs may appear jumpy in this case.

Graph scale

You can specify how much averaging you want the bandwidth graphs to represent by clicking the graph scale '+' or '−' buttons. The graph scale is shown as the approximate time represented by the full graph scale, but internally it is just the number of data points (some power of 2) that are averaged to produce each point displayed on the graph. The '−' button halves the number of points being averaged and shortens the time represented by the full graph scale; the '+' button doubles the number of points being averaged and lengthens the time represented by the full graph scale. Changing the graph scale by itself will not reset the data.

If you change the measurement interval, the graph scale label will change accordingly, maintaining the same degree of averaging. The data will be reset.

Scan ports/Reverse/Stop scanning

If you click on the "Scan ports" button, the port selector will cycle through all the ports, spending a short time displaying the bandwidth graph for each one. The "Scan ports" button becomes a "Reverse" button. Scanning starts at the most recently locked port.

If you are scanning through the ports, clicking on the "Reverse" button will cause the scan to change direction and move backward through the ports. The "Reverse" button becomes a "Stop scanning" button.

If you are scanning the ports in reverse order, clicking on the Stop scanning button stops the scanning. The "Stop scanning" button becomes a "Scan ports" button again.

If you are scanning or reverse-scanning, all the ports' boxes will have small black check marks in them. If you right-click on any port box, the check mark will become a small red 'X', and the corresponding port will be excluded from the scan. You may exclude as many ports as you wish, except that there will always be at least one port being scanned. Right-clicking again will change the 'X' back to a check mark and will reengage scanning.

If you are scanning or reverse-scanning, right-clicking the "Hosts" box will toggle scanning for all HCA ports. Similarly, if you are scanning or reverse-scanning, right-clicking a switch's box will will toggle scanning for all ports on the switch.

If you are scanning or reverse-scanning, two buttons will appear next to the scan time buttons. Clicking the upper "SCAN ALL" button will cause all ports to be included in the scan. Clicking the lower "TOGGLE" button will toggle scanning (off-to-on and on-to-off) for each port, leaving at least one port being scanned.

Scan time

You can change the amount of time that each port stays selected during forward scanning by clicking the Scan time '+' or '−' buttons. The '−' button shortens the time between port selection changes, down to 0.1 second, and the '+' button lengthens it, up to 15 seconds.

The time for reverse-scanning is fixed at 0.75 second.

Messages

Messages are printed on the terminal when InfiniScope is started or stopped and when fabric errors occur. Whenever the fabric is redrawn because the subnet manager detected a change in the fabric, the connections from all switch ports are shown. In addition, a notice is printed once every hour, so that if the fabric dies, you will know approximately when it died. The same messages are appended to the log file /tmp/infiniscope.log.

Known restrictions

Sometimes when you start InfiniScope right after rebooting the cluster, the host names are not correct: some of them may appear as "25204", or InfiniScope may say that the subnet manager could not determine the correct names of all the hosts. If this happens, simply restart the subnet manager. You may also be able to get around the problem by using the name map (-m) command line option.

With a very short measurement interval (1 ms), the label on the Scan ports/Reverse/Stop scanning button may not be displayed, although it still works properly.

Troubleshooting

Occasionally InfiniScope gets into an incorrect state. To get it working again, stop it. Stop iscollect if any instances of it are still running on the cluster nodes. Restart the subnet manager, then restart InfiniScope.

If you have to restore a node (for example, because of a hard disk failure), be sure that the current versions of iscollect and flop are are copied onto the new disk.

Flop - the Fabric LOading Program

Flop is an MPI application that generates traffic to load the fabric in various ways. It must be compiled using mpicc and run using mpirun in the usual way. The mode of operation is specified using a command-line option.

The usage for flop is:

mpirun -np <n> /usr/local/microway/flop [-f|-h|-r|-a|-c|-s|-l]
   [ranks for -s option] [message_size]

If you normally start MPI programs in a different way, start flop the same way.

The options are:

-f
Full-duplex round robin
-h
Half-duplex round robin
-r
Ring
-a
All-to-all
-c
Cycle
-s
Select send/receive rank pairs
-l
List ranks and hosts

In all tests, you can specify the message size in bytes as the last argument. The default message size for all tests except the all-to-all test is 5,000,000 bytes. Each message is repeated 1,000 times in each round.

In the full-duplex round robin test, each node is both sending and receiving at the same time. (One node is inactive in each round if there are an odd number of nodes.)

In the half-duplex round robin test, half the nodes are sending and half are receiving in each round. (One node is inactive in each round if there are an odd number of nodes.)

In the ring test, each node is receiving from one neighbor and sending to another, in a ring covering the entire cluster. The bandwidth of this test is limited to that of the slowest connection.

In the all-to-all test, there is just a single MPI_Alltoall() covering the entire cluster. This test requires more memory than the others, so the default message size is 1,000,000 bytes. You may have to make it even smaller for a large cluster.

In the cycle test, only one node is sending to only one other node in each round, but eventually every node will send to and receive from every other node.

In the selected ranks test, you can specify precise send/receive pairs, according to their MPI ranks. The ranks are specified in pairs, alternating between send rank and receive rank. Of course, there must be an even number of ranks listed. A node can be sending and receiving at the same time, but not safely more than one of each. To find the MPI rank of each node, run flop with the "-l" option. If you want to specify the message size, put it after the list of ranks.

flop doesn't do anything useful; it just loads the fabric by sending random data around in the specified pattern. It doesn't report anything either, except for the name of the test it is performing.

About Microway

Microway designs, builds, and services high quality clusters, workstations, and networks for High Performance Computing. Our website is available at www.microway.com. Our telephone number is 1-508-746-7341.

Microway provides our customers with leading edge technologies for high performance computing solutions. We establish and maintain industry recognized products and expertise for cluster interconnect, cluster management and HPC storage solutions.

Microway's reputation as the world leader in innovative solutions for High Performance Technical Computing has been unchallenged since 1982, when our software made it possible to use the 8087 math coprocessor in the IBM PC. Our products consistently receive excellent reviews, our prices are competitive, and our service and technical support are outstanding. Microway's top notch Research and Development Staff keeps you on the leading edge of technology with timely, powerful new products. At Microway our customers are treated as our most valuable resource, which is why our customer base remains strong and continues to grow.

Microway's products include clusters, silent workstations, InfiniBand-based switches, multi-function HCAs and storage solutions. Our clusters incorporate AMD EPYC CPUs; Intel Xeon CPUs; NVIDIA Tesla, Quadro, and Geforce GPU products; and Mellanox, Netgear, and Cisco networking.

Designed and developed in-house, Microway software includes MCMS cluster management tools; InfiniScope™ InfiniBand diagnostic software; and MPI Link-Checker™ MPI diagnostic tool. Microway's Linux-based clusters and data solutions are used by customers in life sciences, academia, enterprise and government research laboratories.

Our industry recognized trademarks include Microway, FasTree, InfiniScope, MPI Link-Checker, Navion, TriCom, WhisperStation, NumberSmasher, NodeWatch, ServaStor, and Quadputer.

The technical staff at Microway is qualified to assist you in benchmarking and speeding up your existing code and enhancing your present software and hardware investment. Our staff has over 50 years combined experience in designing Linux cluster configurations. We offer white papers on this web site, as well as technical documentation of the hardware and software we design and integrate. To design your next custom system or cluster, please call our Sales Department at 1-508-746-7341. Our Technical Support Department can be reached at the same number or via email at tech@microway.com.

For more than twenty-six years, the employees at Microway have earned our reputation for excellence. We are proud of this reputation and totally committed to designing innovative products that provide state of the art solutions required to keep our customers on the leading edge of technology.

Microway … technology you can count on, since 1982.

Microway: Technology you can count on