Skip to content

Cray Performance Analysis Tool

CrayPat is a performance analysis tool used to evaluate program behaviour on HPE Cray supercomputer systems.


The perftools-lite is a simplified and easy-to-use version of CrayPat that provides basic performance analysis information automatically, with minimum user interaction. In order to use perftools-lite you must first load the perftools-base module followed by perftools-lite.

module load perftools-base
module load perftools-lite

After these modules have been loaded, subsequent compiler invocations (cc, CC, ftn) will automatically insert necessary hooks for profiling.

$ cc -o app.x source.c 
WARNING: PerfTools is saving object files from a temporary directory into
directory '/home/olouant/.craypat/app.x/846040'
INFO: creating the PerfTools-instrumented executable 'app.x' 
(lite-samples) ...OK

You can then run your application as normal, the profiling information will be written to the standard output.

Other perftools-lite modules are available for users seeking information other than that provided by the default perftools-lite module.

  • perftools-lite-events: event profile (tracing)
  • perftools-lite-gpu: GPU kernel and data movement events profiling
  • perftools-lite-loops: loop work estimates
  • perftools-lite-hbm: memory profiling

Once you have them loaded, these modules can be used in the same way as perftools-lite.


CrayPat is the full-featured program analysis tool set. The typical workflow is

  • use pat_build to instrument a program
  • run the instrumented executable
  • use either pat_report or Cray Apprentice2 to view the resulting report.


Sampling is a statistical profiling. By taking regular snapshots of the applications call stack, we can create a statistical profile of where the application spends most of its time.

Sampling of an application
Sampling of an application. Snapshots of the applications call stack are captured at regular intervals to create a statistical profile.

One of the main advantages of a sampling experiment is the low overhead that is fixed by the choice of sampling rate. On the other hand, sampling is non-deterministic and can only provide a statistical picture of the application behaviour.

The pat_build tool is used to instruments your application. The first step to use this tool is to load the perftools-base and perftools modules and build your application as normal.

module load perftools-base
module load perftools

cc -o app.x source.c

The second step is to use pat_build.

pat_build app.x

This command will create a new executable with name <exec>+pat. In our example, we will produce app.x+pat. The name can be chosen by the user using the -o <output_exe> option. The default experiment is a sampling experiment.

The next step is to run the application. A directory with a name beginning with the name of your application will be created as a result. This directory contains the profiling information gathered during the run. You can change the name of this output directory with the PAT_RT_EXPDIR_NAME environment variable. For example

export PAT_RT_EXPDIR_NAME=apa_sample_exp.${SLURM_JOBID}
srun ./app.x+pat

You can use this directory to generate more detailed report with the pat_report command.

pat_report <perftool-output-dir>


Tracing revolves around specific program events like entering or exiting a function. This allows the collection of accurate information about specific areas of the code every time the event occurs. This allows for a more accurate and more detailed information as data are collected from every traced function call not a statistical average. Tracing may require the program to be instrumented.

Tracing of an application
Instrumentation of an application for tracing. The instrumentation code is inserted so that all events of interest are captured allowing for much more detailed information.

The main downside it that the instrumentation code inserted will be run every time an instrumented function is called in order to record the information. This may introduce significant profiling overhead.

Automatic program analysis (APA)

You can do a focused tracing experiment based on the results from the sampling experiment. This is achieved by providing pat_build with a build-options.apa file generated with pat_report from a previous sampling run.

pat_build -O <pertools-out-dir>/build-options.apa

This will build a new executable whose name ends with +apa. You can then run this executable in order to get tracing data and generate a report with pat_report.

Manual analysis

If the automatic program analysis is not sufficient, you to manually choose your profiling setup. The tracing of the entire program is made possible by using the -w option when building your application with pat_build

pat_build -w app.x

Another possibility is to select the function belonging to a particular trace function group. For example, for the MPI group functions

pat_build -g mpi app.x

where the -g option is used to select a trace group. There supported is a wide variety of predefined function groups. A full list can be obtained from the pat_build manpage.

User-defined function can be traced with the -T option and provide a list of function names, or use the -t option and provide a file listing the functions to trace.

pat_build -w -T func1,func2 app.x
pat_build -w -t tracefile app.x

Be careful when you specify the name of the function as the compiler may have altered the name. For example, an underscore character may have been added to the Fortran routine. You can use nm <app> or readelf -s <app> to read the symbol table of your application. In addition, you can choose to trace all the user-defined function, with the -u option.

pat_build -u app.x

Of course, you can combine the option presented above to match your needs. For example, you can choose to trace the MPI and OpenMP group and all the user-defined functions.

pat_build -g mpi,omp -u app.x