Cray compilers¶

The Cray Compiling Environment (CCE) provides the Cray Fortran and Cray C/C++ compilers. The Cray Fortran compiler supports the Fortran 2018 standard, with newer versions also supporting most of Fortran 2023 (and Fortran 2023 support almost complete in CCE 20.0.0). The C/C++ compiler is C17 and C++17 compliant. They are based on regular clang, so support the same language standards as the corresponding regular clang/LLVM compiler. An overview of language support in the regular clang/LLVM compiler is available on this page for C and on this page for C++. Invoking these compilers is done through the ftn, cc and CC compilers wrappers.

CCE has support for the full OpenMP 5.0 specification as well as partial support for OpenMP 5.1, 5.2 and 6.0. PGAS languages (UPC and Fortran coarrays) are also integrated.

Overview¶

Currently, the CCE compilers use a HPE Cray-developed Fortran frontend and the regular clang C/C++ frontend. As a result of this, compiler options for C/C++ and Fortran do not completely align as the following table shows:

Feature	Fortran	C/C++
HIP compilation	Not available	-xhip
Listing	-hlist=m	-fsave-loopmark
Free format	-ffree	N/A
Vectorization	-O1 and above	-O2 and above
Link Time Optimization	-hwp	-flto
Floating-point optimizations	-hfpN, N=0...4	-ffp=N, N=0...4
Suggested Optimization	default	-O3
Aggressive Optimization	-O3 -hfp3	-Ofast -ffp=3
OpenMP recognition	-fopenmp	-fopenmp
OpenACC recognition	-hacc	Not available
Variable sizes	-s real64 -s integer64	N/A
Debug	-g	-g

Choose the CCE version¶

The Cray Compiling Environment is available from the PrgEnv-cray module which is loaded by default. This module loads the default version of the compilers. If you wish to use an older or newer version, you can list the available version with

$ module avail cce

and then switch to the desired version using

$ module swap cce cce/<version>

or simply

$ module load cce/<version>

OpenMP Support¶

man intro_openmp or corresponding web page

OpenMP is turned off by default which is the opposite of how earlier versions the CCE compilers worked. It is turned on using the -homp (Fortran only) or -fopenmp flag.

The CCE Fortran compiler allows controlling the level of optimization of OpenMP directives with the -hthreadN (N = 0...3). A value N = 0 being off and N = 3 specifying the most aggressive optimization. The default value is N = 2.

OpenACC Support¶

man intro_openacc or corresponding web page

OpenACC is supported only by the Cray Fortran compiler. The C and C++ compilers have no support for OpenACC. To enable OpenACC, use the -hacc flag.

Debugging¶

To ease a debugging process, it is useful to generate an executable containing debugging information. For this purpose, you can use the -g option.

Most of the time, the debug information works best at low levels of code optimization, so consider using the -O0 level. The -g options comes with a penalty of larger binary size and slower execution, hence it is recommended using it only for debugging purposes.

Compiler feedback¶

The compilers can generate loopmarks which indicate the type of optimization performed. This feature is enabled by the -hlist=m option for the Fortran compiler, and the -fsave-loopmark in the case of the C/C++ compilers. For example

FortranCC++

$ ftn -fopenmp -hlist=m -o saxpy saxpy.f08

$ cc -fopenmp -fsave-loopmark -Ofast -o saxpy saxpy.c

$ CC -fopenmp -fsave-loopmark -Ofast -o saxpy saxpy.cpp

will produce a file called saxpy.lst where you can find a listing of your code with annotations indicating which optimizations were performed by the compiler.

FortranC/C++

    1.                   subroutine saxpy(n, a, x, y) 
    2.                     real :: x(n), y(n), a
    3.                     integer :: n, i
    4.                   
    5.    M----------<     !$omp parallel do
    6.    M mVr2-----<     do i=1,n
    7.    M mVr2             y(i) = a*x(i)+y(i)
    8.    M mVr2----->     enddo
    9.    M---------->     !$omp end parallel do
  10.                   end subroutine saxpy

The signification of the annotations can be found at the beginning of the listing file. In our example, we can see for example that the compiler did vectorized (V) and unrolled our loop (r).

3.            void saxpy(int n, float a, 
4.                float * restrict x, 
5.                float * restrict y) {
6. + I Vu--<>   #pragma omp parallel for
7. +   M----<   for(int i = 0; i < n; i++) {
8. +   M          y[i] = a*x[i] + y[i];
9.     M---->   }
10.            }

The signification of the annotations can be found at the beginning of the listing file. In our example, we can see for example that the compiler did vectorized (V) and unrolled our loop (u).

Compiler Messages¶

man explain or the corresponding web page

Use the explain command to display an explanation of any message issued by the Fortran compiler. This message will be identified with a code looking like ftn-<number>. You can pass this identifier as an argument to the explain command to find out more about the error. The command also works for errors generated by the LibSci libraries.

$ ftn -fopenmp -o saxpy saxpy.f08
    call saxpy(2**20, 2.0, x, y)
    ^                            
ftn-954 crayftn: ERROR MAIN, File = saxpy.f08, Line = 18, Column = 5 
  Procedure "SAXPY", defined at line 1 (saxpy.f08) must have an explicit
  interface because one or more arguments have the assumed-shape 
  DIMENSION attribute.

$ explain ftn-954
<explain output>

CCE Fortran Compiler¶

man crayftn or the corresponding web page and "Cray Compiler Fortran Reference" web page

Once the PrgEnv-cray module is loaded (by default) you can invoke the Cray Fortran compiler with the ftn command.

Optimization options¶

The default optimization level of the CCE Fortran compiler is -O2. Aggressive optimization can be enabled with the -O3 option.

Vectorization¶

The level of automatic vectorizing is controlled with the -hvectorN option (N = 0...3).

the default value is N = 2 enabling moderate vectorization and loop nests restructuring
setting N = 0 or N = 1 enable minimal and moderate automatic vectorization respectively
aggressive optimization is enabled by setting N = 3

Loop unrolling¶

Loop unrolling can be controlled with the -hunrollN flag with N = 0...2.

the default value is N = 2 for which the compiler will attempt to unroll all loops, except those marked with the NOUNROLL directive.
setting N = 0 requests that no loop unrolling is performed (also ignore the UNROLL directives).
if you only want to unroll loops that are marked by the UNROLL directive use N = 1.

Floating point optimizations¶

The Cray compiler is aggressive by default in the floating-point optimization. If your application is sensitive to the floating-point optimization, use the -hfpN flag with N = 0...4 to set the level of optimization.

the default value is N = 2 which performs various generally safe, nonconforming IEEE optimizations
most applications can benefit from more aggressive optimization with N = 3
use the value of N = 0 or N = 1 if the application you are compiling requires strong IEEE standard conformance

CCE C and C++ compilers¶

man craycc - man crayCC and the "HPE Cray Clang C and C++ Quick Reference" web page

Once the PrgEnv-cray module is loaded (by default) you can invoke the Cray C compiler with the cc command. The C++ compiler may be invoked with the CC command. These compilers are based on Clang/LLVM with Cray improvements. The Cray improvements can be turned off with the -fno-cray flag.

Clang does not apply optimizations unless they are requested. Most optimizations are enabled using the -O2 level. Recommended flags are

-Ofast to enable all the optimizations including aggressive optimizations that may violate strict compliance with language standards
-flto to enable aggressive link time optimizations

For applications that are sensitive to floating−point optimizations, it may be recommended to use -O3 instead of -Ofast. These floating−point optimizations can also be controlled with the −ffp=N flag with N = 0...4.

using −ffp=0, will generate code with the highest precision and grants the compiler minimal freedom to optimize floating−point operations. Using -ffp=0 will prevent the use of Cray math libraries.
requesting the highest level (−ffp=4) will grant the compiler maximal freedom to aggressively optimize but likely will result in lower precision