signals and ulimit

I think a user got slapped by a ulimit I put in for the head node…

In /etc/security/limits.conf (which is a PAM facility, btw - doing it here avoids having to throw this stuff into the various shell profiles…)

* hard cpu 5

root hard cpu unlimited

named hard cpu unlimited

nobody hard cpu unlimited

This sets the max anything can run for a total of 5 minutes of cpu time, which is a significant amount of work given the speed of a modern cpu and (I thought) would be fine for most interactive work (I figured most other daemons would be covered on this particular machine)…

However it breaks stuff that runs for a long time (any daemon, or in this case, an opportunistic job scheduling script).

Apparently this sends a SIGXCPU (signal 24 according to /usr/include/asm-x86_64/signal.h - ‘stress’ doesn’t display the signal name, just the number, so this is useful to reference) to the process.

Not sure what exceeding as or rss sends, but I’d hazard to guess they’re not covered (I only see one other “exceed” signal, SIGXFSZ, in the list)…

I need to try trapping this signal to see if it circumvents the termination; if it does… wow.

Lessons learned:

1. ulimits are for suckers (a Cardinal Rule re-learned: the only winning move is not to play)

2. 

Now, is there a nice ‘traffic cop’ daemon that will renice or kill things that exceed a certain 1/5/15-minute load average?

0 notes