Cray Performance Analysis Tool¶
Cray Performance Analysis Tool (CrayPat) is a performance analysis tool used to evaluate program behaviour on HPE Cray supercomputer systems like LUMI.
perftools-lite is a simplified and easy-to-use version of CrayPat that
provides basic performance analysis information automatically, with minimum
user interaction. In order to use
perftools-lite you must first load the
perftools-base module followed by
After these modules have been loaded, subsequent compiler invocations (
ftn) will automatically insert all necessary hooks for profiling.
You can then run your application as you would normally. The profiling information will be written to the standard output.
perftools-lite modules are available for users seeking information
other than that provided by the default
perftools-lite-events: event profile (tracing)
perftools-lite-gpu: GPU kernel and data movement events profiling
perftools-lite-loops: loop work estimates
perftools-lite-hbm: memory profiling
Once you have them loaded, these modules can be used in the same way as
CrayPat is the full-featured program analysis tool set. The typical workflow is
pat_buildto instrument a program
- run the instrumented executable
- use either
pat_reportor Cray Apprentice2 to view the resulting report.
Sampling is a statistical profiling. By taking regular snapshots of the applications call stack, we can create a statistical profile of where the application spends most of its time.
One of the main advantages of a sampling experiment is the low overhead that is fixed by the choice of sampling rate. On the other hand, sampling is non-deterministic and can only provide a statistical picture of the application behaviour.
pat_build tool is used to instruments your application. The first step to
use this tool is to load the
perftools modules and build
your application as normal.
The second step is to use
This command will create a new executable with name
<exec>+pat. In our
example, we will produce
app.x+pat. The name can be chosen by the user using
-o <output_exe> option. The default experiment is a sampling experiment.
The next step is to run the application. A directory with a name beginning with
the name of your application will be created as a result. This directory
contains the profiling information gathered during the run. You can change the
name of this output directory with the
variable. For example
You can use this directory to generate more detailed report with the
Tracing revolves around specific program events like entering or exiting a function. This allows the collection of accurate information about specific areas of the code every time the event occurs. This allows for a more accurate and more detailed information as data are collected from every traced function call not a statistical average. Tracing may require the program to be instrumented.
The main downside it that the instrumentation code inserted will be run every time an instrumented function is called in order to record the information. This may introduce significant profiling overhead.
Automatic program analysis (APA)¶
You can do a focused tracing experiment based on the results from the sampling
experiment. This is achieved by providing
pat_build with a
build-options.apa file generated with
pat_report from a previous sampling
This will build a new executable whose name ends with
+apa. You can then run
this executable in order to get tracing data and generate a report with
If the automatic program analysis is not sufficient, you have to manually
choose your profiling setup. The tracing of the entire program is made possible
by using the
-w option when building your application with
Another possibility is to select the function belonging to a particular trace function group. For example, for the MPI group functions
-g option is used to select a trace group. There is support for a
wide variety of predefined function groups. A full list can be obtained from
User-defined function can be traced with the
-T option and provide a list of
function names, or use the
-t option and provide a file listing the functions
Be careful when you specify the name of the function as the compiler may have
altered the name. For example, an underscore character may have been added to
the Fortran routine. You can use
nm <app> or
readelf -s <app> to read the
symbol table of your application. In addition, you can choose to trace all the
user-defined function, with the
Of course, you can combine the option presented above to match your needs. For example, you can choose to trace the MPI and OpenMP group and all the user-defined functions.