[SGVLUG] Couldn't send email. Race Conditions?

David Lawyer dave at lafn.org
Wed Oct 25 01:52:37 PDT 2006


The night before last and yesterday, my email which I tried to send
never got sent.  It remained in the exim queue in spite of various
queue runner processes that attempted to send it out to the mail
server at my ISP: LA Freenet.  This problem has happened before, but
never was the delay in sending out messages this bad.  So I used the
debug option with exim, to see what was happening (using -d+tls since
it seemed like tls was giving trouble).

The results showed that I connected Ok to the server at LA Freenet and
sent them: STARTTLS and they responded they were "Ready to start TLS".
Then my PC initialized GnuTLS as a client, generated an RSA key and
then every minute thereafter issue the debug message: "selecting on
subprocess pipes" but didn't send anything more to the LA Freenet
server.  I checked on the Internet and found someone else reported
this same problem too, but no fix was given.  "selecting" likely means
the select() function used in a C program to watch for changes in the
state of the pipe (bytes received, etc.).  The man page for select
mentions that pselect has race conditions, so perhaps that's what was
happening.

Eventually this evening all the backlog of messages got sent, even
though I didn't do anything to fix the problem.  Most of the messages
I send are just reports of spam I send to my ISP which helps them to
determine which subnets to block.

So what I did then was to download and install the latest version of
exim (mail transport agent) and reconfigure it.  I found that the
interactive configuration script is inadequate and I needed to modify
not only the configuration file but the script that starts exim when I
dial up for the Internet. 

The result is that exim doesn't check the queue at boot-time anymore.
This queue may contain many outgoing messages which exim formerly tried
in vain to send out at boot-time, but couldn't since I hadn't yet
dialed out to connect to the network.  When exim had a failure like
this, it then established a retry time which meant that I would often
see log reports about messages not being sent out since the retry time
hadn't been exceeded.  Exim has a simple algorithm that exponentially
increases retry times for repeated failures, but I don't consider them
failures since it just means that I was working on my computer
off-line and hadn't dialed out yet to the net.  So now I will not get
such "failures" since exim will hopefully run only when I am connected
to the network.

My ISP disconnects you from the Internet after an hour's use and I got
a warning letter for being connected too long (30+ hours with
automatic redial) when I was updating all my software via my 28.8K
modem.

			David Lawyer


More information about the SGVLUG mailing list