Dave has been working like a maniac to switch our database code over from SQLite to PostgreSQL. PostgreSQL has two main advantages: it is much faster, and we can open up ODBC connections to the database for other uses that don’t require a web interface. The change is now complete, however it hasn’t been without some difficulties. One problem that bit us was running out of file handles. If you ever have a similar problem, here is how to debug it.
On Linux the /proc
filesystem reflects a great many kernel resources. The particularly interesting directories for our purposes are:
- The files
file-nr
and file-max
in /proc/sys/fs
.
- The per process directories keyed by process ID
The first thing to check is the value of /proc/sys/fs/file-max
, which is the maximum number of file handles allowed on your system. This shouldn’t be a problem, but just ensure it isn’t something ridiculously small. On our system we get:
$ cat /proc/sys/fs/file-max
89367
That should be plenty under any reasonable usage, but we can check how many file handles are open by reading the value of/proc/sys/fs/file-nr
. On our system this is:
$ cat /proc/sys/fs/file-nr
920 0 89367
This first number is the number of file handles in use. Definitely no problem there. It must be that a process is exceeding the per-process limit on file handles. In our setup this could be either PostgreSQL or MzScheme. We need the process IDs to find out how many handles each is using.
$ ps -A | grep postmaster
12936 ? 00:00:00 postmaster
12937 ? 00:00:00 postmaster
12939 ? 00:00:00 postmaster
12940 ? 00:00:00 postmaster
12941 ? 00:00:00 postmaster
$ ps -A | grep mzscheme
20382 ? 00:00:26 mzscheme
We can see how many handles are in use by looking in the directory for each process ID. For example, for the first PostgeSQL process:
$ sudo ls -l /proc/12936/fd/ | wc -l
4
So that PostgreSQL process is using 4 handles. The other processes are using similar numbers. So it must be our MzScheme process that is using up all the handles. We check that in a similar way, and the result is:
$ sudo ls -l /proc/20382/fd/ | grep socket | wc -l
193
Looks like we’ve found our culprit.