multilog rotating issue

Started by LogicX, March 01, 2007, 07:08:41 AM

Previous topic - Next topic

LogicX

For months I've had the multilog rotating issue where /var/log/mail/send/current and /var/log/mail/smtp/current do not rotate correctly, hanging qmail, smtpd, rblsmtpd, etc.
Doing some further debugging of this issue, its possible to always cause it to hang during rotating when triggered with a kill -ALRM of the multilog process.

Doing a ktrace of this shows the following during the actual rotating:

Any thoughts/guidance?

14084 multilog CALL  sigprocmask(0x2,0xbfbfeaa8,0)
14084 multilog RET   sigprocmask 0
14084 multilog CALL  sigprocmask(0x2,0xbfbfeaa4,0)
14084 multilog RET   sigprocmask 0
14084 multilog CALL  read(0,0x804ca80,0x400)
14084 multilog RET   read -1 errno 4 Interrupted system call
14084 multilog PSIG  SIGALRM caught handler=0x80499da mask=0x0 code=0x0
14084 multilog CALL  sigreturn(0xbfbfe7b0)
14084 multilog RET   sigreturn JUSTRETURN
14084 multilog CALL  sigprocmask(0x1,0xbfbfea94,0)
14084 multilog RET   sigprocmask 0
14084 multilog CALL  sigprocmask(0x1,0xbfbfea90,0)
14084 multilog RET   sigprocmask 0
14084 multilog CALL  fchdir(0x4)
14084 multilog RET   fchdir 0
14084 multilog CALL  fsync(0x6)
14084 multilog RET   fsync 0
14084 multilog CALL  close(0x6)
14084 multilog RET   close 0
14084 multilog CALL  rename(0x804b3b6,0x804b32b)
14084 multilog NAMI  "current"
14084 multilog NAMI  "previous"
14084 multilog RET   rename 0
14084 multilog CALL  open(0x804b3b6,0x20d,0x180)
14084 multilog NAMI  "current"
14084 multilog RET   open 6
14084 multilog CALL  fcntl(0x6,0x2,0x1)
14084 multilog RET   fcntl 0
14084 multilog CALL  fchmod(0x6,0x1a4)
14084 multilog RET   fchmod 0
14084 multilog CALL  chmod(0x804b32b,0x1e4)
14084 multilog NAMI  "previous"
14084 multilog RET   chmod 0
14084 multilog CALL  fork
14084 multilog RET   fork 14151/0x3747
14084 multilog CALL  wait4(0x3747,0xbfbfeab0,0,0)
14084 multilog RET   wait4 14151/0x3747
14084 multilog CALL  write(0x2,0x804d2e0,0x43)
14084 multilog GIO   fd 2 wrote 67 bytes
       "multilog: warning: processor failed in /var/log/mail/smtp, pausing
       "
14084 multilog RET   write 67/0x43
14084 multilog CALL  gettimeofday(0xbfbfea40,0)
14084 multilog RET   gettimeofday 0
14084 multilog CALL  gettimeofday(0xbfbfea40,0)
14084 multilog RET   gettimeofday 0
14084 multilog CALL  poll(0xbfbfea58,0,0x139b)
14084 multilog RET   poll 0



Starting with a fresh /var/log/mail/smtp -- here are the files generated by this run:

root:envy|8:52:35am|/var/log/mail/smtp: ls -la
total 20
drwxr-xr-x   2 qmaill  qnofiles   512 Mar  1 08:52 .
drwxr-xr-x  10 qmaill  qnofiles   512 Mar  1 08:48 ..
-rw-r--r--   1 qmaill  qnofiles     0 Mar  1 08:52 current
-rw-------   1 root    qnofiles  5042 Mar  1 08:52 ktrace.out
-rw-------   1 qmaill  qnofiles     0 Mar  1 08:50 lock
-rw-r--r--   1 qmaill  qnofiles     0 Mar  1 08:52 newstate
-rwxr--r--   1 qmaill  qnofiles   161 Mar  1 08:51 previous
-rw-r--r--   1 qmaill  qnofiles     0 Mar  1 08:52 processed
-rwxr-xr-x   1 qmaill  qnofiles  7946 Mar  1 08:48 smtplog
-rw-r--r--   1 qmaill  qnofiles     0 Mar  1 08:50 state


As can be seen -- the file sits in previous, and doesn't get to create a new timestamped file

when I kill -9 multilog then start services again, it looks like this:

root:envy|8:53:56am|/var/log/mail/smtp: ls -la
total 32
drwxr-xr-x   2 qmaill  qnofiles    512 Mar  1 08:53 .
drwxr-xr-x  10 qmaill  qnofiles    512 Mar  1 08:48 ..
-rwxr--r--   1 qmaill  qnofiles    161 Mar  1 08:51 @4000000045e6daf4184a1de4.u
-rw-r--r--   1 qmaill  qnofiles      0 Mar  1 08:52 @4000000045e6daf418db3284.u
-rw-r--r--   1 qmaill  qnofiles    411 Mar  1 08:53 current
-rw-------   1 root    qnofiles  16102 Mar  1 08:53 ktrace.out
-rw-------   1 qmaill  qnofiles      0 Mar  1 08:50 lock
-rwxr-xr-x   1 qmaill  qnofiles   7946 Mar  1 08:48 smtplog
-rw-r--r--   1 qmaill  qnofiles      0 Mar  1 08:53 state


for reference:
root:envy|8:54:35am|/var/log/mail/smtp: uname -a
FreeBSD envy.logicx.net 6.2-STABLE FreeBSD 6.2-STABLE #6: Sun Feb 25 14:39:36 EST 2007     root@envy.logicx.net:/usr/obj/usr/src/sys/LNS-i386  i386
--- May this post be indexed by spiders, and archived for all to see as my internet epitaph.
http://fpux.com" target="_blank">http://fpux.com

tomierna

LogicX,

Did you ever resolve this problem?

This appears to be happening to me on a 5.4->6.2 upgrade which I've just upgraded all of the toaster components for.

I've re-installed daemontools, but that didn't seem to fix it.

I've removed the svscan log/run files and regenerated them with toaster_setup.pl -s qmailconf

When multilog gets confused, it stops smtp, rblsmptd, etc. until I kill -9 the multilog process.

Terribly aggravating since it seems to die every few minutes.

tomierna

The shower is the ideal place for troubleshooting epiphany.

I remembered this morning that during the toaster upgrade process, I'd gotten a warning to change my toaster-watcher.conf permissions to 600, because the new file had something more open.

At the time, it seemed rational to also change the toaster.conf permissions. That was the problem.

When smtplog or sendlog or pop3log were being called, they would fail silently because they couldn't get to their config.

I changed the toaster.conf to 644 perms, and now it's merrily rotating through those programs.

markus

I am experiencing this issue too.

However I change the mods to 644 and its still occuring.

ls -l /usr/local/etc/toaster*
-rw-r--r--  1 root  wheel  27826 Nov 14 11:39 /usr/local/etc/toaster-watcher.conf
-rw-r--r--  1 root  wheel   6766 Oct 11 15:36 /usr/local/etc/toaster.conf

matt


m

Well opening up old wounds...

We recently had an outage on our toaster, and fsck had to fix things on /usr/local and /var

We know that  /usr/local/var/couriersslcache was broken, but not what else was damaged.

We started getting the issue mentioned here, where multilog hangs hanging the server for port 25 logins, they just time out. This as people said before happens every 3-4 hours.

After reading the thread I had a look at perms and ownership on toaster-watcher.conf and it was

-rw-r-----  1 root  501  27401 Oct 29 11:27 /usr/local/etc/toaster-watcher.conf

so I changed it to 644 so everything could read it, but haven't fixed the group yet as I'm not sure what group this file should belong to, obviously 501 is a bit suspicious.

After reading the mail at http://mail.tnpi.net/toaster/?0::11137

I did a

diff /usr/local/sbin/maillogs /var/log/mail/smtp/smtplog

which returned no output. But just to be double sure I copied the /usr/local/sbin/maillogs over /var/log/mail/smtp/smtplog

This didn't seem to fix the problem.

I need a quick and dirty solution so I've cronned a script to kill the logging and restart the server every few hours. I'd obviously like to really fix it.

I've noticed a couple of processes...
qmaill    6635  0.0  0.2  5312  1640  ??  S     3:31PM   0:00.02 sh -c ./smtplog
qmaill    6636  0.0  0.3  9152  3264  ??  R     3:31PM   0:00.13 /usr/local/bin/perl ./smtplog

running from time to time, what are these meant to be doing and how do I test that it's ok?

Oh yeah and when I run ./smtplog

[root@vega /var/log/mail/smtp]# ./smtplog
^C

It sits there doing nowt until I kill it.

interestingly

[root@vega /var/log/mail/smtp]# diff /usr/local/bin/maillogs /usr/local/src/Mail-Toaster-5.09/bin/maillogs
1,4c1
< #!/usr/local/bin/perl
<
< eval 'exec /usr/local/bin/perl  -S $0 ${1+"$@"}'
<     if 0; # not running under some shell
---
> #!/usr/bin/perl

But that seem to suggest that the one in use a) uses the right path to perl and b) checks to see if it's running in some shell or no, which wouldn't seem to be an issue.

So does anyone have any ideas as to what I might be able to do to fix this issue.

thanks in advance

M


matt

Obviously something is not quite right. The best way to determine what is manually running smtplog. If you just run ./smtplog, then it'll just sit there waiting to read something in STDIN. So, give it something and see what happens:

cat @NNNN* | ./smtplog

and then see what happens.