PacketFence - BTS - PacketFence
View Issue Details
0001053PacketFenceerror-handlingpublic2010-08-27 12:142012-06-14 12:16
obilodeau 
 
highmajorrandom
closedduplicate 
 
3.4.03.4.0 
0001053: restart code needs to be more aggressive if a daemon doesn't kill quickly enough
Here's a situation that happened at a client:

- a restart is issued in a high-load situation
for some reason, one daemon stays stuck and didn't finish (in our case one of the pfdhcplistener)
- the rest of the systems restarts but because that daemon is hung, the restart of the daemons of the same type (pfdhcplistener) is delayed indefinitely.

Another `pfcmd service pfdhcplistener restart` fixed it but it took days to realized that (IP to MAC stopped working).

Our restart code should handle that situation and start being more aggressive if it waited longer than 1 minute (for ex.).

Logs provided below.
Aug 26 16:45:12 pfcmd(0) INFO: packetfence restart ... executing stop followed by start (main::service)

...
Aug 26 16:45:12 pfcmd(0) INFO: Stopping pfdhcplistener with 'pkill pfdhcplistener' (pf::services::service_ctl)

Aug 26 16:45:12 pfdhcplistener(0) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)

Aug 26 16:45:12 pfdhcplistener(0) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)

Aug 26 16:45:12 pfcmd(0) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

Aug 26 16:45:12 pfdhcplistener(0) INFO: stopping pfdhcplistener for interface eth0 (main::END)
Aug 26 16:45:12 pfdhcplistener(0) INFO: stopping pfdhcplistener for interface eth1 (main::END)


but then one didn't stop:

Aug 26 16:45:14 pfcmd(0) INFO: pidof -x pfdhcplistener returned 21523 (pf::services::service_ctl)
Aug 26 16:45:14 pfcmd(0) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
...
Aug 26 17:58:38 pfcmd(0) INFO: pidof -x pfdhcplistener returned 21523 (pf::services::service_ctl)Aug 
26 17:58:38 pfcmd(0) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
...
Aug 27 11:25:43 pfcmd(0) INFO: pidof -x pfdhcplistener returned 21523 (pf::services::service_ctl)
Aug 27 11:25:43 pfcmd(0) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
No tags attached.
duplicate of 0001453closed obilodeau pfdhcplistener resists to being killed (hangs) on packetfence restart 
Issue History
2010-08-27 12:14obilodeauNew Issue
2010-09-15 10:54obilodeauTarget Version1.9.1 => 1.9.2
2010-09-22 16:02obilodeauTarget Version1.9.2 => 1.9.3
2012-04-30 09:32fgaudreaultNote Added: 0002691
2012-06-11 15:27obilodeauRelationship addedduplicate of 0001453
2012-06-11 15:27obilodeauNote Added: 0002755
2012-06-11 15:28obilodeauStatusnew => closed
2012-06-11 15:28obilodeauResolutionopen => duplicate
2012-06-11 15:28obilodeauFixed in Version => +1
2012-06-14 12:15obilodeauTarget Version1.9.3 => 3.4.0
2012-06-14 12:15obilodeauFixed in Version+1 => 3.4.0
2012-06-14 12:16obilodeauNote Added: 0002780

Notes
(0002691)
fgaudreault   
2012-04-30 09:32   
We have seen almost the same thing in 3.3.2:

Apr 29 03:06:33 pfdhcplistener(18353) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)
Apr 29 03:06:33 pfdhcplistener(18352) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)
Apr 29 03:06:33 pfdhcplistener(18353) INFO: stopping pfdhcplistener for interface eth1.851 (main::END)
Apr 29 03:06:33 pfdhcplistener(18352) INFO: stopping pfdhcplistener for interface eth1.852 (main::END)
Apr 29 03:06:33 pfdhcplistener(18354) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)
Apr 29 03:06:33 pfdhcplistener(18354) INFO: stopping pfdhcplistener for interface eth0 (main::END)
Apr 29 03:06:54 pfmon(0) FATAL: pfmon: caught SIGTERM - terminating (main::normal_sighandler)
Apr 29 03:06:54 pfmon(0) INFO: stopping pfmon (main::END)

But the service is not restarting, a old process is sticked:
Apr 30 08:08:25 pfcmd(15332) INFO: pidof -x pfsetvlan returned 32597 (pf::services::service_ctl)
Apr 30 08:08:25 pfcmd(15332) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)
Apr 30 08:08:25 pfcmd(15332) INFO: pidof -x pfdhcplistener returned 20956 (pf::services::service_ctl)
(0002755)
obilodeau   
2012-06-11 15:27   
fixed by 0001453, closing
(0002780)
obilodeau   
2012-06-14 12:16   
fix released in 3.4.0 yesterday