Lustre is a parallel distributed high-performance file system for clusters ranging from small to large-scale as well as multi-site systems. The role of Lustre is to chunk up files into data blocks and spread file data across multiple storage servers, which can be written to and read from in parallel.
Lustre Building Blocks¶
A Lustre file system is composed of three major functional units as shown in the simplified diagram below.
- One or more metadata servers (MDS) that has one or more metadata targets (MDT) per Lustre file system that stores namespace metadata, such as filenames, directories, access permissions, and file layout
- One or more object storage servers (OSS) that store file data on one or more object storage targets (OST)
- Clients are compute-, visualization- or login nodes that access and use the data
One of the main factors leading to the high performance of Lustre file systems is the ability to stripe data across multiple storage targets (OSTs). This means that files are split into chunks which are then distributed among the OSTs. Read and write operations on striped files will access multiple OSTs concurrently. As a result, I/O performance is improved since writing or reading from multiple OSTs simultaneously increases the available I/O bandwidth.
File striping will predominantly improve performance for applications doing serial I/O from a single node or parallel I/O to a single shared file from multiple nodes. This behavior is usually found in application using MPI-I/O, parallel HDF5 and parallel NetCDF.
Set the Striping Pattern¶
The striping for a file or directory can be set using the command
<count> sets the number of stripes, i.e. the number of OSTs and
<size> the size of the stripes. The argument of the
lfs setstripe command
is a path to a directory or a file:
- if a directory is used, it sets the default layout for new files created in
the directory. This default layout can then be changed for individual
files inside the directory by creating them with the
- if a file is used, a new file with the specified layout will be created.
Maximal striping can be achieved by setting the stripe count to
Get the Striping Pattern¶
Information about the striping of a directory or a file can be retrieved using
lfs getstripe command.
$ lfs setstripe --stripe-count=4 --stripe-size=2m /scratch/user/test
$ touch /scratch/user/test/file.txt
$ lfs getstripe /scratch/user/test/
stripe_count: 4 stripe_size: 2097152 pattern: raid0 stripe_offset: -1
$ lfs getstripe /scratch/e1000/olouant/test/file.txt
obdidx objid objid group
14 46050141 0x2beab5d 0
1 45873824 0x2bbfaa0 0
3 45885822 0x2bc297e 0
5 46052767 0x2beb59f 0
In the example above, we see that
file.txt inherited the layout of its parent
directory and that the file is striped on 4 OSTs (14, 1, 3 and 5).
Striping should be adapted to your application I/O pattern and the size of your files. The following section describes general considerations.
In theory, a larger number of stripes increases the I/O bandwidth and thus performance. In particular, applications which write to a single file from hundreds of nodes, may benefit from striping over as many OSTs. Moreover, striping is needed if you write a huge amount of data as a single OST may not have enough free space to store all the data. For applications creating a large number of small files as with a file per process scheme, large stripe counts can cause overhead and impede the performance.
- when multiple processes access the same large file in parallel set a stripe count >1 and an integral factor of the number of processes.
- with a file-per-process I/O pattern, avoid striping (stripe count of 1) to limit OST contention.
Stripe size has less of an impact on performance than the stripe count and no impact at all if the stripe count is 1. However, when dealing with large files, the stripe size may influence the performance:
- the smallest recommended stripe size is 512 KB.
- a good stripe size is between 1 MB and 4 MB in most situations.
- the maximum stripe size is 4 GB, but you should only use this value for huge files.
If your application writes to the file in a consistent and aligned way, make
the stripe size a multiple of the
write() size. The goal is to perform write
operations that go entirely to one server instead of crossing object