Full machine runs¶
No full machine runs in July and August
They will be no large-scale runs during the summer months of July and August due to the difficulty to schedules such large scale runs with reduced personnel.
We provide the opportunity to perform runs on the entirety of LUMI the last Sunday every month (subject to change over time). This can either be all the CPU nodes, all GPU nodes or even a combination of CPU and GPU nodes if needed. For these runs we can make changes to queue policies as needed. The slot we provide is at most 6 hours, but can be shorter if you either do not need the full 6h or there are a lot of other users that have requested time for the same day.
Provide at most a 1 page description of what you are intending to do with the full-machine access (see template below), do not request access before you have the code and cases well prepared, the time slot is fixed and cannot be moved. Deadline for the application is Wednesday 11 days before the last Sunday every month, and acceptance notifications will be sent out on the Monday of the week of the runs. Resource usage for any runs during this window will be billed as usual, so make sure you have enough resources to complete any runs planned. The applications are submitted via the LUMI helpdesk through the general contact form, as the "large-scale runs" category.
How to structure your proposal¶
- Project information: project number and who will be performing the runs. Provide direct contact details to everyone involved.
- Describe use case: code(s) that will be used, what are you trying to achieve.
- Requested resources: describe the resources needed in enough detail so that we can make any changes needed to the queue system etc. what nodes, how many nodes, how many jobs running/queuing at the same time, size and length of jobs, any other special considerations.
- Your preparations: Describe what preparations you have done for the runs. The idea here is to convince us that you are actually ready to utilize the time slot. For scaling runs we require at least one run with the code at the partition limits, so the largest run you can get through in normal production use. For workflows we require at least a description of how they will fill the requested resources and how they will efficiently use them and achieve something that cannot be done in the normal production queues. As well as evidence that you already have it running on LUMI.
We also require that you provide a short and simple summary of the runs you did after you completed them, along with any issues you encountered.