After installation update torque-4.2.10-10.el6.x86_64 service pbs_mom starts but node shown as down.
pbs_server log:
PBS_Server.6045;Svr;PBS_Server;LOG_ERROR::get_numa_from_str, Node isn't declared to be NUMA, but mom is reporting
That's because you don't have the bits in the nodes file on the server setup correctly. Try adding 'num_node_boards=1' to each node line in the nodes file on the system where the pbs_sched and pbs_server run.
This needs to get fixed as there is no current solution to get Torque working on Redhat6 until this regression is solved. The 4.2.10-5 RPM is no longer available for download so a downgrade doesn't seem possible.
This update has been submitted for testing by dmlb2000.
This update has been pushed to testing.
This update has been obsoleted.
After installation update torque-4.2.10-10.el6.x86_64 service pbs_mom starts but node shown as down. pbs_server log: PBS_Server.6045;Svr;PBS_Server;LOG_ERROR::get_numa_from_str, Node isn't declared to be NUMA, but mom is reporting
That's because you don't have the bits in the nodes file on the server setup correctly. Try adding 'num_node_boards=1' to each node line in the nodes file on the system where the pbs_sched and pbs_server run.
The NUMA enabled version does't not work correctly for a system with a few nodes. All ranks are run on the primary node:
[franek@nova ~]$ qsub -I -q staff -l nodes=6:ppn=2 qsub: waiting for job 264345.nova.XXX to start qsub: job 264345.nova.XXX ready
[franek@wn001 ~]$ ./pbsdsh -u uname -a Linux wn001 2.6.32-573.22.1.el6.x86_64 #1 SMP Wed Mar 23 03:35:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Linux wn001 2.6.32-573.22.1.el6.x86_64 #1 SMP Wed Mar 23 03:35:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Linux wn001 2.6.32-573.22.1.el6.x86_64 #1 SMP Wed Mar 23 03:35:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Linux wn001 2.6.32-573.22.1.el6.x86_64 #1 SMP Wed Mar 23 03:35:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Linux wn001 2.6.32-573.22.1.el6.x86_64 #1 SMP Wed Mar 23 03:35:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Linux wn001 2.6.32-573.22.1.el6.x86_64 #1 SMP Wed Mar 23 03:35:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux [franek@wn001 ~]$ ./pbsdsh -h wn002 uname -a Linux wn001 2.6.32-573.22.1.el6.x86_64 #1 SMP Wed Mar 23 03:35:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@nova ~]# awk '/264345.nova/&&/;S;/ {print $11}' /var/lib/torque/server_priv/accounting/20160414 exec_host=wn001-0/0+wn001-0/1+wn006-0/0+wn006-0/1+wn005-0/0+wn005-0/1+wn004-0/0+wn004-0/1+wn003-0/0+wn003-0/1+wn002-0/0+wn002-0/1
This problem does not exist in 4.2.10-5 release. Is is possible to prepare version without NUMA support?
Anonymous, please direct your attention to the #1321154 bug its got a lot of the information about how to get the older version of torque.
This needs to get fixed as there is no current solution to get Torque working on Redhat6 until this regression is solved. The 4.2.10-5 RPM is no longer available for download so a downgrade doesn't seem possible.