Maui Cluster Scheduler, the precursor to Moab Cluster SuiteĀ®, is an open source job scheduler for clusters and supercomputers. It is an optimized, configurable tool capable of supporting an array of scheduling policies, dynamic priorities, extensive reservations, and fairshare capabilities. It is currently in use at hundreds of government, academic, and commercial sites throughout the world. All of the capabilities found in Maui are also found in Moab, while Moab has added features including virtual private clusters, basic trigger support, graphical administration tools, and a Web-based user portal.
Contents |
I had two choices for a batch queuing system - Condor and PBS (Torque). After having deployed Condor on a previous cluster I went with Torque on the next one just for kicks. Maui just happened to be one of the popular schedulers to be used with PBS.
Maui was the default choice due to easy integration. After having it in production for over 3 months, I am quiet happy with it. Torque+MAUI is my default combination for getting a Cluster together for the moment.
The cluster grid should already have Torque configured with the default scheduler pbs_sched. pbs_sched is a no frills scheduler and requires minimal configuration changes. Have this working before switching to the maui scheduler.
Install from the source into /usr/local. Default ./configure works well over here. Create a start/stop script for maui in /etc/init.d/
# maui.cfg 3.2.6p13 #SERVERMODE TEST SERVERHOST master.XXX # primary admin must be first in list ADMIN1 maui cluster # Resource Manager Definition RMCFG[XXX] TYPE=PBS TIMEOUT=90 # Allocation Manager Definition AMCFG[bank] TYPE=NONE # full parameter docs at http://clusterresources.com/mauidocs/a.fparameters.html # use the 'schedctl -l' command to display current configuration RMPOLLINTERVAL 00:00:30 SERVERPORT 42559 SERVERMODE NORMAL # Admin: http://clusterresources.com/mauidocs/a.esecurity.html LOGFILE maui.log LOGFILEMAXSIZE 10000000 LOGLEVEL 1 LOGFILEROLLDEPTH 7 # Job Priority: http://clusterresources.com/mauidocs/5.1jobprioritization.html QUEUETIMEWEIGHT 1 # FairShare: http://clusterresources.com/mauidocs/6.3fairshare.html #FSPOLICY PSDEDICATED #FSDEPTH 7 #FSINTERVAL 86400 #FSDECAY 0.80 # Throttling Policies: http://clusterresources.com/mauidocs/6.2throttlingpolicies.html # NONE SPECIFIED # Backfill: http://clusterresources.com/mauidocs/8.2backfill.html BACKFILLPOLICY FIRSTFIT RESERVATIONPOLICY CURRENTHIGHEST # Node Allocation: http://clusterresources.com/mauidocs/5.2nodeallocation.html NODEALLOCATIONPOLICY MINRESOURCE # QOS: http://clusterresources.com/mauidocs/7.3qos.html # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE # Standing Reservations: http://clusterresources.com/mauidocs/7.1.3standingreservations.html # SRSTARTTIME[test] 8:00:00 # SRENDTIME[test] 17:00:00 # SRDAYS[test] MON TUE WED THU FRI # SRTASKCOUNT[test] 20 # SRMAXTIME[test] 0:30:00 # Creds: http://clusterresources.com/mauidocs/6.1fairnessoverview.html # USERCFG[DEFAULT] FSTARGET=25.0 # USERCFG[john] PRIORITY=100 FSTARGET=10.0- # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi # CLASSCFG[batch] FLAGS=PREEMPTEE # CLASSCFG[interactive] FLAGS=PREEMPTOR #NODEMAXLOAD 10.00 USERWEIGHT 1 USERCFG[cluster] PRIORITY=300 NODECFG[node1] MAXLOAD=10 MAXJOB=4 NODECFG[node2] MAXLOAD=10 MAXJOB=4 NODECFG[node3] MAXLOAD=10 MAXJOB=4 NODECFG[node4] MAXLOAD=10 MAXJOB=4 NODECFG[node5] MAXLOAD=10 MAXJOB=4 NODECFG[node6] MAXLOAD=10 MAXJOB=4 NODECFG[node7] MAXLOAD=10 MAXJOB=4 NODECFG[node8] MAXLOAD=10 MAXJOB=4 NODECFG[node9] MAXLOAD=10 MAXJOB=4 NODECFG[node10] MAXLOAD=10 MAXJOB=4 #NODEAVAILABILITYPOLICY UTILIZED #NODEACCESSPOLICY SHARED MAXJOBPERUSERPOLICY 40 MAXJOBPERGROUPPOLICY 40 MAXJOBPERACCOUNTPOLICY 40 MAXJOBPERUSERCOUNT 20 MAXJOBPERGROUPCOUNT 20 MAXJOBPERACCOUNTCOUNT 20 SMAXJOBPERUSERCOUNT 40 SMAXJOBPERGROUPCOUNT 40 SMAXJOBPERACCOUNTCOUNT 40 DEFERTIME 0
By default, maui starts N processes on each node where N equals CPU count. Sometimes it is desirable to have more processes per node to better utilize node resources like memory and CPU%.
Increase this limit by carefully studying average utilizations of CPU, Disk, RAM and Swap. Use SNMP based graphing/monitoring tools like Cacti for keeping track of system parameters.
The below cfg options permits each node (node1 to node10) to run a mximum of four jobs (2 jobs per CPU) subject to a maximum system load of 10. This utilizes node resources to the best in my setup.
NODECFG[node1] MAXLOAD=10 MAXJOB=4 NODECFG[node2] MAXLOAD=10 MAXJOB=4 NODECFG[node3] MAXLOAD=10 MAXJOB=4 NODECFG[node4] MAXLOAD=10 MAXJOB=4 NODECFG[node5] MAXLOAD=10 MAXJOB=4 NODECFG[node6] MAXLOAD=10 MAXJOB=4 NODECFG[node7] MAXLOAD=10 MAXJOB=4 NODECFG[node8] MAXLOAD=10 MAXJOB=4 NODECFG[node9] MAXLOAD=10 MAXJOB=4 NODECFG[node10] MAXLOAD=10 MAXJOB=4
After a scheduler restart, maui will put all the jobs previously submitted into a DEFER state which is one day by default IIRC. New jobs submitted get batched but the old ones dont. To immediately execute jobs in the queue after a restart add the below line:
DEFERTIME 0
Maui does a first come, fist serve approach to job execution. This is undesirable as a user who has submit 1000s of jobs will block a user with 10 jobs. The 10 jobs will have to wait till the first 1000 jobs are completed.
The scheduler allows for a rather elaborate controls on fair sharing the system resources. The below lines tries to achieve the follosing:
This allows for other users to squeeze in their jobs into the run queue.
USERWEIGHT 1 USERCFG[cluster] PRIORITY=300 MAXJOBPERUSERPOLICY 40 MAXJOBPERGROUPPOLICY 40 MAXJOBPERACCOUNTPOLICY 40 MAXJOBPERUSERCOUNT 20 MAXJOBPERGROUPCOUNT 20 MAXJOBPERACCOUNTCOUNT 20 SMAXJOBPERUSERCOUNT 40 SMAXJOBPERGROUPCOUNT 40 SMAXJOBPERACCOUNTCOUNT 40
define command{
command_name check_maui
command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 42559
}
define service{
use generic-service
host_name master
service_description MAUI
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups cluster-admins
notification_interval 120
notification_period 24x7
notification_options c,r
check_command check_maui
}