PacketFence - BTS - PacketFence
View Issue Details
0001453PacketFencedhcppublic2012-05-24 10:202012-06-14 12:16
obilodeau 
obilodeau 
normalminorsometimes
closedfixed 
LinuxRHEL / CentOS
 
3.4.03.4.0 
5959573725c347405c6a5e11cd1641289ff595f9
0001453: pfdhcplistener resists to being killed (hangs) on packetfence restart
Causes problems, especially on log rotations, because then the registration VLAN or the IDS isolation features stop working because we no longer have an accurate iplog.

Further investigation required.
No tags attached.
has duplicate 0001053closed  restart code needs to be more aggressive if a daemon doesn't kill quickly enough 
related to 0001520closed  pfdhcplistener hung in __lll_lock_wait_private 
patch pfdhcplistener-dont-hang-fix.patch (2,330) 2012-05-25 10:45
https://www.packetfence.org/bugs/file_download.php?file_id=145&type=bug
Issue History
2012-05-24 10:20obilodeauNew Issue
2012-05-24 10:20obilodeauStatusnew => assigned
2012-05-24 10:20obilodeauAssigned To => obilodeau
2012-05-24 12:06obilodeauNote Added: 0002729
2012-05-24 12:11obilodeauNote Added: 0002730
2012-05-24 13:18obilodeauNote Added: 0002731
2012-05-24 14:05obilodeauNote Added: 0002732
2012-05-24 14:13obilodeauNote Added: 0002733
2012-05-25 10:40obilodeauNote Added: 0002734
2012-05-25 10:45obilodeauFile Added: pfdhcplistener-dont-hang-fix.patch
2012-05-29 10:22obilodeaugit revision => 5959573725c347405c6a5e11cd1641289ff595f9
2012-05-29 10:22obilodeauNote Added: 0002736
2012-05-29 10:22obilodeauStatusassigned => resolved
2012-05-29 10:22obilodeauFixed in Version => +1
2012-05-29 10:22obilodeauResolutionopen => fixed
2012-06-11 15:27obilodeauRelationship addedhas duplicate 0001053
2012-06-14 12:15obilodeauTarget Version+1 => 3.4.0
2012-06-14 12:15obilodeauFixed in Version+1 => 3.4.0
2012-06-14 12:16obilodeauNote Added: 0002775
2012-06-14 12:16obilodeauStatusresolved => closed
2012-08-17 09:59obilodeauRelationship addedrelated to 0001520

Notes
(0002729)
obilodeau   
2012-05-24 12:06   
May 17 11:12:28 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener stop (pf::services::service_ctl)

May 17 11:12:28 pfcmd(17974) INFO: Stopping pfdhcplistener with 'pkill pfdhcplistener' (pf::services::service_ctl)

May 17 11:12:28 pfdhcplistener(15616) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)

May 17 11:12:28 pfdhcplistener(15615) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)

May 17 11:12:28 pfdhcplistener(15617) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)

May 17 11:12:28 pfdhcplistener(15616) INFO: stopping pfdhcplistener for interface eth0.1370 (main::END)

May 17 11:12:28 pfdhcplistener(15617) INFO: stopping pfdhcplistener for interface eth2 (main::END)
May 17 11:12:28 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Layout::PatternLayo

ut::curly_action)
May 17 11:12:28 pfdhcplistener(15615) INFO: stopping pfdhcplistener for interface eth0.1380 (main::END)

May 17 11:12:28 pfdhcplistener(15614) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)

May 17 11:12:28 pfdhcplistener(15614) INFO: stopping pfdhcplistener for interface eth0 (main::END)
May 17 11:12:29 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

May 17 11:12:29 pfcmd(17974) INFO: pidof -x pfdhcplistener returned 18839 15617 15615 15614 (pf::services::service_ctl)

May 17 11:12:29 pfcmd(17974) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
May 17 11:12:31 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

May 17 11:12:31 pfcmd(17974) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)

May 17 11:12:31 pfcmd(17974) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
May 17 11:12:33 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

May 17 11:12:33 pfcmd(17974) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)

May 17 11:12:33 pfcmd(17974) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
May 17 11:12:35 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

May 17 11:12:35 pfcmd(17974) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)

May 17 11:12:35 pfcmd(17974) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
May 17 11:12:37 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

May 17 11:12:37 pfcmd(17974) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)

May 17 11:12:37 pfcmd(17974) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)


The fact that the signal is caught in Log::Log4perl::Layout::PatternLayout::curly_action concerns me. Looking at log4perl's changelog, did found some mentions of improper call stack depth reporting fixed. Hopefully it is only that.

Important to note that the listener not killed: 18839 wasn't active since quite a while but the others were. It is the eth0 listener and receives the traffic of all production DHCP.

Its creation:
May 16 13:29:21 pfdhcplistener(18821) INFO: pfdhcplistener_eth0 starting and writing 18839 to /usr/local/pf/var/run/pfdhcplis

tener_eth0.pid (pf::util::createpid)
May 16 13:29:21 pfdhcplistener(18839) WARN: Unable to open VLAN proc description for eth0: No such file 
or directory (pf::uti
l::get_vlan_from_int)
May 16 13:29:21 pfdhcplistener(18839) INFO: DHCP detector on eth0 enabled (main::)


First attempt at killing it:
May 16 14:18:57 pfdhcplistener(18839) INFO: 00:16:35:6e:a5:0d requested an IP. DHCP Fingerprint: OS::100 
(Microsoft Windows X
P). Modified node with last_dhcp = 2012-05-16 14:18:57,computername = 109-C112-003,dhcp_fingerprint = 
1,15,3,6,44,46,47,31,33
,249,43 (main::listen_dhcp)
May 16 14:18:57 pfcmd(20947) INFO: pidof -x pfdhcplistener returned 18842 18841 18839 18838 (pf::services::service_ctl)

May 16 14:18:59 pfcmd(20947) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)

May 16 14:19:01 pfcmd(20947) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)



Apparently it didn't even receive the signal..
(0002730)
obilodeau   
2012-05-24 12:11   
then several times it reports a signal:
May 16 14:54:09 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Layout::PatternLayout::render)

May 17 11:10:12 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Config::Watch::change_detected)

May 17 11:12:28 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Layout::PatternLayout::curly_action)

May 17 11:13:58 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Appender::String::log)

May 17 11:21:47 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Logger::__ANON__)



Every time in log4perl.. I'll check if there's something trapping signals in the log4perl code.
(0002731)
obilodeau   
2012-05-24 13:18   
"normal" backtrace:
(gdb) bt
#0  0x00baa402 in __kernel_vsyscall ()
0000001  0x00474f01 in recvfrom () from /lib/libpthread.so.0
0000002  0x0052d313 in ?? () from /usr/lib/libpcap.so.0.9.4
0000003  0x0052e3db in pcap_loop () from /usr/lib/libpcap.so.0.9.4
0000004  0x00d4727d in XS_Net__Pcap_loop (my_perl=0x9912008, cv=0x9b45140) at Pcap.xs:415
0000005  0x0446159d in Perl_pp_entersub () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000006  0x0445aa0f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000007  0x0440024e in perl_run () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
0000008  0x0804921e in main ()


corrupted state backtrace:

(gdb) bt
#0  0x00640402 in __kernel_vsyscall ()
0000001  0x003b8a23 in __lll_lock_wait_private () from /lib/libc.so.6
0000002  0x00347862 in _L_lock_13973 () from /lib/libc.so.6
0000003  0x00345ca7 in realloc () from /lib/libc.so.6
0000004  0x0444bd97 in Perl_safesysrealloc () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000005  0x04467193 in Perl_sv_grow () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000006  0x0446f948 in Perl_sv_catpvn_flags () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000007  0x0445d417 in Perl_pp_concat () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000008  0x0445aa0f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000009  0x043fb07e in ?? () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
0000010 0x043ff946 in Perl_call_sv () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000011 0x0444dcb8 in Perl_sighandler () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000012 <signal handler called>
#13 0x00640400 in __kernel_vsyscall ()
0000014 0x0036b8b4 in fork () from /lib/libc.so.6
0000015 0x004771a4 in fork () from /lib/libpthread.so.0
0000016 0x04449537 in Perl_my_fork () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000017 0x0444c497 in Perl_my_popen () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000018 0x044a61d6 in Perl_pp_backtick () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000019 0x0445aa0f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000020 0x043fb07e in ?? () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
0000021 0x043ff7e0 in Perl_call_sv () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000022 0x00a95714 in callback_wrapper (user=0x9141a34 "\250B\340\b\001", h=0xbfc012e0, 
    pkt=0x9147fc2 "\214{\235\005", <incomplete sequence \367\227>) at Pcap.xs:76
0000023 0x0052d46a in ?? () from /usr/lib/libpcap.so.0.9.4
0000024 0x0052e3db in pcap_loop () from /usr/lib/libpcap.so.0.9.4
0000025 0x00a9027d in XS_Net__Pcap_loop (my_perl=0x84b2008, cv=0x86e5cec) at Pcap.xs:415
0000026 0x0446159d in Perl_pp_entersub () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000027 0x0445aa0f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000028 0x0440024e in perl_run () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
0000029 0x0804921e in main ()
(0002732)
obilodeau   
2012-05-24 14:05   
First attempt at a fix in the fix/pfdhcplistener-reliable-death branch: https://github.com/inverse-inc/packetfence/tree/fix/pfdhcplistener-reliable-death [^]
(0002733)
obilodeau   
2012-05-24 14:13   
Also noticed in gdb that the process is forking like crazy.

Reproduce with:
# gdb sbin/pfdhcplistener <pid>
> c
(0002734)
obilodeau   
2012-05-25 10:40   
Made additional fixes yesterday to the forking situation. This was unrelated and only a minor improvement.

Attaching patch for the hang fix.
(0002736)
obilodeau   
2012-05-29 10:22   
Fix was merged in stable. Will be part of the next release.
(0002775)
obilodeau   
2012-06-14 12:16   
fix released in 3.4.0 yesterday