Slurm Job Accounting Gather Plugin API
Overview
This document describes Slurm job accounting gather plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own Slurm job accounting gather plugins.
Slurm job accounting gather plugins must conform to the Slurm Plugin API with the following specifications:
const char plugin_name[]="full text name"
A free-formatted ASCII text string that identifies the plugin.
const char
plugin_type[]="major/minor"
The major type must be "jobacct_gather." The minor type can be any suitable name for the type of accounting package. We currently use
- aix Gathers information from AIX /proc table and adds this information to the standard rusage information also gathered for each job.
- cgroupGathers information from Linux cgroup infrastructure and adds this information to the standard rusage information also gathered for each job. (Experimental, not to be used in production.)
- linuxGathers information from Linux /proc table and adds this information to the standard rusage information also gathered for each job.
- noneNo information gathered.
The programmer is urged to study src/plugins/jobacct_gather/linux and src/common/slurm_jobacct_gather.[c|h] for a sample implementation of a Slurm job accounting gather plugin.
API Functions
All of the following functions are required. Functions which are not implemented must be stubbed.
int init (void)
Description:
Called when the plugin is loaded, before any other functions are
called. Put global initialization here.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void fini (void)
Description:
Called when the plugin is removed. Clear any allocated storage here.
Returns: None.
Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().
int jobacct_gather_p_poll_data(List task_list, bool pgid_plugin, uint64_t cont_id)
Description:
Build a table of all current processes.
Arguments:
task_list (in/out) List containing
current processes
pgid_plugin (input) if we are
running with the pgid plugin
cont_id (input) container id of processes if not running with pgid
int jobacct_gather_p_endpoll(void)
Description:
Called when the process is finished to stop the
polling thread.
Arguments:
none
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacct_gather_p_add_task(pid_t pid, uint16_t tid)
Description:
Used to add a task to the poller.
Arguments:
pid (input) Process id
tid (input) slurm global task id
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
Job Account Gathering
All of the following functions are not required but may be used.
int jobacct_gather_init(void)
Description:
Loads the job account gather plugin.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacct_gather_fini(void)
Description:
Unloads the job account gathering plugin.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacct_gather_startpoll(uin16_t frequency)
Description:
Creates and starts the polling thread.
Arguments:
frequency (input) frequency of the polling.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacct_gather_change_poll(uint16_t frequency)
Description:
Changes the polling thread to a new frequency.
Arguments:
frequency (input) frequency of the polling
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacct_gather_suspend_poll(void)
Description:
Temporarily stops the polling thread.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacct_gather_resume_poll(void)
Description:
Resumes the polling thread that was stopped.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
jobacctinfo_t *jobacct_gather_stat_task(pid_t pid)
Description:
Gets the basis of the information of the task.
Arguments:
pid (input) process id.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
jobacctinfo_t *jobacct_gather_remove_task(pid_t pid)
Description:
Removes the task.
Arguments:
pid (input) process id.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacct_gather_set_proctrack_container_id(uint64_t id)
Description:
Sets the proctrack container to a given id.
Arguments:
id (input) id to set.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacct_gather_set_mem_limit(uint32_t job_id, uint32_t step_id,uint32_t mem_limit)
Description:
Sets the memory limit of the job account.
Arguments:
job_id (input) id of the job.
sted_id (input) id of the step.
mem_limit (input) memory limit in megabytes.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacct_gather_handle_mem_limit(uint32_t total_job_mem, uint32_t total_job_vsize)
Description:
Called to find out how much memory is used.
Arguments:
total_job_mem (input) total
amount of memory for jobs.
total_job_vsize (input) the
total job size.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
Job Account Info
All of the following functions are not required but may be used.
jobacctinfo_t *jobacctinfo_create(jobacct_id_t *jobacct_id)
Description:
Creates the job account info.
Arguments:
jobacct_id (input) the job
account id.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacctinfo_destroy(void *object)
Description:
Destroys the job account info.
Arguments:
object (input) the job that needs to be destroyed
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacctinfo_setinfo(jobacctinfo_t *jobacct, enum jobacct_data_type type, void *data)
Description:
Set the information for the job.
Arguments:
jobacct (input) job account
type(input) enum telling the plugin how to transform the data.
data (input/output) Is a void * and
the actual data type depends upon the first argument to this function (type).
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacctinfo_getinfo(jobacctinfo_t *jobacct, enum jobacct_data_type type, void *data)
Description:
Gets the information about the job.
Arguments:
jobacct (input) job account.
type (input) the
data type of the job account.
data
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacctinfo_pack(jobacctinfo_t *jobacct, uint16_t rpc_version, Buf buffer)
Description:
Packs the job account information.
Arguments:
jobacct (input) the job account.
rpc_version (input) the
rpc version.
buffer (input) the buffer.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacctinfo_unpack(jobacctinfo_t **jobacct, uint16_t rpc_version, Buf buffer)
Description:
Unpacks the job account information.
Arguments:
jobacct (input) the job account.
rpc_version (input) the rpc
version.
buffer (input) the buffer.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacctinfo_aggregate(jobacctinfo_t *dest, jobacctinfo_t *from)
Description:
Aggregates the jobs.
Arguments:
dest (input) New destination of the job.
from (input) Original location of job.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacctinfo_2_stats(slurmdb_stats_t *stats, jobacctinfo_t *jobacct)
Description:
Gets the stats of the job in accounting.
Arguments:
stats (input) slurm data base stat.
jobacct (input) the job account.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
Parameters
These parameters can be used in the slurm.conf to configure the plugin and the frequency at which to gather information about running jobs.
- JobAcctGatherType
- Specifies which plugin should be used.
- JobAcctGatherFrequency
- Time interval between pollings in seconds.
Versioning
This document describes version 2 of the Slurm Job Accounting Gather API. Future releases of Slurm may revise this API. A job accounting gather plugin conveys its ability to implement a particular API version using the mechanism outlined for Slurm plugins.
Last modified 8 May 2014