Dave has been working like a maniac to switch our database
code over from SQLite to PostgreSQL. PostgreSQL has two
main advantages: it is much faster, and we can open up
ODBC connections to the database for other uses that don’t
require a web interface. The change is now complete,
however it hasn’t been without some difficulties. One
problem that bit us was running out of file handles. If
you ever have a similar problem, here is how to debug it.
On Linux the /proc
filesystem reflects a
great many kernel resources. The particularly interesting
directories for our purposes are:
-
The files
file-nr
and file-max
in /proc/sys/fs
.
- The per process directories keyed by process ID
The first thing to check is the value of /proc/sys/fs/file-max
, which is the maximum number of file handles allowed on
your system. This shouldn’t be a problem, but just ensure
it isn’t something ridiculously small. On our system we
get:
$ cat /proc/sys/fs/file-max
89367
That should be plenty under any reasonable usage, but we
can check how many file handles are open by reading the
value of/proc/sys/fs/file-nr
. On our system
this is:
$ cat /proc/sys/fs/file-nr
920 0 89367
This first number is the number of file handles in use.
Definitely no problem there. It must be that a process is
exceeding the per-process limit on file handles. In our
setup this could be either PostgreSQL or MzScheme. We need
the process IDs to find out how many handles each is
using.
$ ps -A | grep postmaster
12936 ? 00:00:00 postmaster
12937 ? 00:00:00 postmaster
12939 ? 00:00:00 postmaster
12940 ? 00:00:00 postmaster
12941 ? 00:00:00 postmaster
$ ps -A | grep mzscheme
20382 ? 00:00:26 mzscheme
We can see how many handles are in use by looking in the
directory for each process ID. For example, for the first
PostgeSQL process:
$ sudo ls -l /proc/12936/fd/ | wc -l
4
So that PostgreSQL process is using 4 handles. The other
processes are using similar numbers. So it must be our
MzScheme process that is using up all the handles. We
check that in a similar way, and the result is:
$ sudo ls -l /proc/20382/fd/ | grep socket | wc -l
193
Looks like we’ve found our culprit.