Full machine runs¶
No full machine runs in July, August and December
There will be no large-scale runs during July, August and December due to the difficulty to schedule such large scale runs with reduced personnel during the holiday seasons.
We provide the opportunity to perform runs on the entirety of LUMI the last Sunday every month (subject to change over time). This can either be all the CPU nodes, all GPU nodes or even a combination of CPU and GPU nodes if needed. For these runs, we can make changes to queue policies as needed. The slot we provide is at most 6 hours, but can be shorter if you either do not need the full 6h or there are a lot of other users that have requested time for the same day.
Provide at most a 1-page description of what you are intending to do with the full-machine access (see template below). Do not request access before you have the code and cases well-prepared, as the time slot is fixed and cannot be moved.
The deadline for the application is the Wednesday 11 days before the last Sunday every month, and acceptance notifications will be sent out on the Monday of the run week. Resource usage for any runs during this window will be billed as usual, so make sure you have enough resources to complete any runs planned. The applications are submitted via the LUMI helpdesk through the general contact form, as the "large-scale runs" category.
How to structure your proposal¶
- Project information: project number and who will be performing the runs. Provide direct contact details to everyone involved.
- Describe use case: code(s) that will be used, what are you trying to achieve.
- Requested resources: describe the resources needed in enough detail so that we can make any changes needed to the queue system etc. what nodes, how many nodes, how many jobs running/queuing at the same time, size and length of jobs, any other special considerations.
- Your preparations: Describe what preparations you have done for the runs. The idea here is to convince us that you are actually ready to utilize the time slot. For scaling runs we require at least one run with the code at the partition limits, so the largest run you can get through in normal production use. For workflows, we require at least a description of how they will fill the requested resources and how they will efficiently use them and achieve something that cannot be done in the normal production queues. As well as evidence that you already have it running on LUMI.
Post runs¶
We also require that you provide a short and simple summary of the runs you did after you completed them, along with any issues you encountered.