PacketFence
Bug Tracking System

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0001453PacketFencedhcppublic2012-05-24 10:202012-06-14 12:16
Reporterobilodeau 
Assigned Toobilodeau 
PrioritynormalSeverityminorReproducibilitysometimes
StatusclosedResolutionfixed 
PlatformLinuxOSRHEL / CentOSOS Version
Product Version 
Target Version3.4.0Fixed in Version3.4.0 
Summary0001453: pfdhcplistener resists to being killed (hangs) on packetfence restart
DescriptionCauses problems, especially on log rotations, because then the registration VLAN or the IDS isolation features stop working because we no longer have an accurate iplog.

Further investigation required.
TagsNo tags attached.
fixed in git revision5959573725c347405c6a5e11cd1641289ff595f9
fixed in mtn revision
Attached Filespatch file icon pfdhcplistener-dont-hang-fix.patch [^] (2,330 bytes) 2012-05-25 10:45 [Show Content]

- Relationships
has duplicate 0001053closed restart code needs to be more aggressive if a daemon doesn't kill quickly enough 
related to 0001520closed pfdhcplistener hung in __lll_lock_wait_private 

-  Notes
(0002729)
obilodeau (reporter)
2012-05-24 12:06

May 17 11:12:28 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener stop (pf::services::service_ctl)

May 17 11:12:28 pfcmd(17974) INFO: Stopping pfdhcplistener with 'pkill pfdhcplistener' (pf::services::service_ctl)

May 17 11:12:28 pfdhcplistener(15616) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)

May 17 11:12:28 pfdhcplistener(15615) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)

May 17 11:12:28 pfdhcplistener(15617) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)

May 17 11:12:28 pfdhcplistener(15616) INFO: stopping pfdhcplistener for interface eth0.1370 (main::END)

May 17 11:12:28 pfdhcplistener(15617) INFO: stopping pfdhcplistener for interface eth2 (main::END)
May 17 11:12:28 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Layout::PatternLayo

ut::curly_action)
May 17 11:12:28 pfdhcplistener(15615) INFO: stopping pfdhcplistener for interface eth0.1380 (main::END)

May 17 11:12:28 pfdhcplistener(15614) FATAL: pfdhcplistener: caught SIGTERM - terminating (main::normal_sighandler)

May 17 11:12:28 pfdhcplistener(15614) INFO: stopping pfdhcplistener for interface eth0 (main::END)
May 17 11:12:29 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

May 17 11:12:29 pfcmd(17974) INFO: pidof -x pfdhcplistener returned 18839 15617 15615 15614 (pf::services::service_ctl)

May 17 11:12:29 pfcmd(17974) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
May 17 11:12:31 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

May 17 11:12:31 pfcmd(17974) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)

May 17 11:12:31 pfcmd(17974) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
May 17 11:12:33 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

May 17 11:12:33 pfcmd(17974) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)

May 17 11:12:33 pfcmd(17974) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
May 17 11:12:35 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

May 17 11:12:35 pfcmd(17974) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)

May 17 11:12:35 pfcmd(17974) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)
May 17 11:12:37 pfcmd(17974) INFO: /usr/local/pf/sbin/pfdhcplistener status (pf::services::service_ctl)

May 17 11:12:37 pfcmd(17974) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)

May 17 11:12:37 pfcmd(17974) INFO: Waiting for pfdhcplistener to stop (pf::services::service_ctl)


The fact that the signal is caught in Log::Log4perl::Layout::PatternLayout::curly_action concerns me. Looking at log4perl's changelog, did found some mentions of improper call stack depth reporting fixed. Hopefully it is only that.

Important to note that the listener not killed: 18839 wasn't active since quite a while but the others were. It is the eth0 listener and receives the traffic of all production DHCP.

Its creation:
May 16 13:29:21 pfdhcplistener(18821) INFO: pfdhcplistener_eth0 starting and writing 18839 to /usr/local/pf/var/run/pfdhcplis

tener_eth0.pid (pf::util::createpid)
May 16 13:29:21 pfdhcplistener(18839) WARN: Unable to open VLAN proc description for eth0: No such file 
or directory (pf::uti
l::get_vlan_from_int)
May 16 13:29:21 pfdhcplistener(18839) INFO: DHCP detector on eth0 enabled (main::)


First attempt at killing it:
May 16 14:18:57 pfdhcplistener(18839) INFO: 00:16:35:6e:a5:0d requested an IP. DHCP Fingerprint: OS::100 
(Microsoft Windows X
P). Modified node with last_dhcp = 2012-05-16 14:18:57,computername = 109-C112-003,dhcp_fingerprint = 
1,15,3,6,44,46,47,31,33
,249,43 (main::listen_dhcp)
May 16 14:18:57 pfcmd(20947) INFO: pidof -x pfdhcplistener returned 18842 18841 18839 18838 (pf::services::service_ctl)

May 16 14:18:59 pfcmd(20947) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)

May 16 14:19:01 pfcmd(20947) INFO: pidof -x pfdhcplistener returned 18839 (pf::services::service_ctl)



Apparently it didn't even receive the signal..
(0002730)
obilodeau (reporter)
2012-05-24 12:11

then several times it reports a signal:
May 16 14:54:09 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Layout::PatternLayout::render)

May 17 11:10:12 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Config::Watch::change_detected)

May 17 11:12:28 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Layout::PatternLayout::curly_action)

May 17 11:13:58 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Appender::String::log)

May 17 11:21:47 pfdhcplistener(18839) FATAL: pfdhcplistener: caught SIGTERM - terminating (Log::Log4perl::Logger::__ANON__)



Every time in log4perl.. I'll check if there's something trapping signals in the log4perl code.
(0002731)
obilodeau (reporter)
2012-05-24 13:18

"normal" backtrace:
(gdb) bt
#0  0x00baa402 in __kernel_vsyscall ()
0000001  0x00474f01 in recvfrom () from /lib/libpthread.so.0
0000002  0x0052d313 in ?? () from /usr/lib/libpcap.so.0.9.4
0000003  0x0052e3db in pcap_loop () from /usr/lib/libpcap.so.0.9.4
0000004  0x00d4727d in XS_Net__Pcap_loop (my_perl=0x9912008, cv=0x9b45140) at Pcap.xs:415
0000005  0x0446159d in Perl_pp_entersub () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000006  0x0445aa0f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000007  0x0440024e in perl_run () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
0000008  0x0804921e in main ()


corrupted state backtrace:

(gdb) bt
#0  0x00640402 in __kernel_vsyscall ()
0000001  0x003b8a23 in __lll_lock_wait_private () from /lib/libc.so.6
0000002  0x00347862 in _L_lock_13973 () from /lib/libc.so.6
0000003  0x00345ca7 in realloc () from /lib/libc.so.6
0000004  0x0444bd97 in Perl_safesysrealloc () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000005  0x04467193 in Perl_sv_grow () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000006  0x0446f948 in Perl_sv_catpvn_flags () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000007  0x0445d417 in Perl_pp_concat () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000008  0x0445aa0f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000009  0x043fb07e in ?? () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
0000010 0x043ff946 in Perl_call_sv () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000011 0x0444dcb8 in Perl_sighandler () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000012 <signal handler called>
#13 0x00640400 in __kernel_vsyscall ()
0000014 0x0036b8b4 in fork () from /lib/libc.so.6
0000015 0x004771a4 in fork () from /lib/libpthread.so.0
0000016 0x04449537 in Perl_my_fork () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000017 0x0444c497 in Perl_my_popen () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000018 0x044a61d6 in Perl_pp_backtick () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000019 0x0445aa0f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000020 0x043fb07e in ?? () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
0000021 0x043ff7e0 in Perl_call_sv () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000022 0x00a95714 in callback_wrapper (user=0x9141a34 "\250B\340\b\001", h=0xbfc012e0, 
    pkt=0x9147fc2 "\214{\235\005", <incomplete sequence \367\227>) at Pcap.xs:76
0000023 0x0052d46a in ?? () from /usr/lib/libpcap.so.0.9.4
0000024 0x0052e3db in pcap_loop () from /usr/lib/libpcap.so.0.9.4
0000025 0x00a9027d in XS_Net__Pcap_loop (my_perl=0x84b2008, cv=0x86e5cec) at Pcap.xs:415
0000026 0x0446159d in Perl_pp_entersub () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000027 0x0445aa0f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

0000028 0x0440024e in perl_run () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
0000029 0x0804921e in main ()
(0002732)
obilodeau (reporter)
2012-05-24 14:05

First attempt at a fix in the fix/pfdhcplistener-reliable-death branch: https://github.com/inverse-inc/packetfence/tree/fix/pfdhcplistener-reliable-death [^]
(0002733)
obilodeau (reporter)
2012-05-24 14:13

Also noticed in gdb that the process is forking like crazy.

Reproduce with:
# gdb sbin/pfdhcplistener <pid>
> c
(0002734)
obilodeau (reporter)
2012-05-25 10:40

Made additional fixes yesterday to the forking situation. This was unrelated and only a minor improvement.

Attaching patch for the hang fix.
(0002736)
obilodeau (reporter)
2012-05-29 10:22

Fix was merged in stable. Will be part of the next release.
(0002775)
obilodeau (reporter)
2012-06-14 12:16

fix released in 3.4.0 yesterday

- Issue History
Date Modified Username Field Change
2012-05-24 10:20 obilodeau New Issue
2012-05-24 10:20 obilodeau Status new => assigned
2012-05-24 10:20 obilodeau Assigned To => obilodeau
2012-05-24 12:06 obilodeau Note Added: 0002729
2012-05-24 12:11 obilodeau Note Added: 0002730
2012-05-24 13:18 obilodeau Note Added: 0002731
2012-05-24 14:05 obilodeau Note Added: 0002732
2012-05-24 14:13 obilodeau Note Added: 0002733
2012-05-25 10:40 obilodeau Note Added: 0002734
2012-05-25 10:45 obilodeau File Added: pfdhcplistener-dont-hang-fix.patch
2012-05-29 10:22 obilodeau git revision => 5959573725c347405c6a5e11cd1641289ff595f9
2012-05-29 10:22 obilodeau Note Added: 0002736
2012-05-29 10:22 obilodeau Status assigned => resolved
2012-05-29 10:22 obilodeau Fixed in Version => +1
2012-05-29 10:22 obilodeau Resolution open => fixed
2012-06-11 15:27 obilodeau Relationship added has duplicate 0001053
2012-06-14 12:15 obilodeau Target Version +1 => 3.4.0
2012-06-14 12:15 obilodeau Fixed in Version +1 => 3.4.0
2012-06-14 12:16 obilodeau Note Added: 0002775
2012-06-14 12:16 obilodeau Status resolved => closed
2012-08-17 09:59 obilodeau Relationship added related to 0001520


Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker