Index: [Article Count Order] [Thread]

Date:  Tue, 14 Jul 2009 11:38:22 +1000
From:  Greg Kuhnert <greg.kuhnert (at mark) theanchoragesylvania.com>
Subject:  [coba-e:15813] Re: DFIX information
To:  BQ List <coba-e (at mark) bluequartz.org>
Message-Id:  <4A5BE18E.5040400 (at mark) theanchoragesylvania.com>
In-Reply-To:  <200907131958.n6DJwX9I019572 (at mark) ana.xnet.com.mx>
References:  <200907131958.n6DJwX9I019572 (at mark) ana.xnet.com.mx>
X-Mail-Count: 15813

Rodrigo Ordonez Licona wrote:
> We have a aprticular BQ with a lot of database usage, suddeny a users
> repeated a query more thatn 10 times which clogged up 8 processors into
> running queries.
> This caused high cpu usage and all services went slow...
>
> As this happened mail users started to get queued on sending emails and
> timing out ... When dovecot count reached 100 (I have this setting at 
> 100,
> your might be 50 or lower)
>
> Dfix did what it is programmed to do stoped dovecot , killing all dovecot
> processes , and run dbrecover.
> Inmediately after this (as there are no dovecot processes anymore it 
> will start dovecot again)
>
> However in our case the problem was not mailrelated (mysql) the server 
> kept
> running dbrecover in an endless loop . Causing even more cpu overhead. 
> (and
> more customer disappointment )
>
> So what we did was to kill obvioously mysql runaway queries. But it 
> was not enough anymore, because dfix kept running dbrecover over and
> over.
>
> Luckily, this command as root, helped  mv /usr/local/sbin/dfix.sh 
> /usr/local/sbin/dfix.sh.tmp
>
> Restarted sendmail and dovecot once more.
>
> Waited until users were able to sendmails.(5.-10 minutes in my case - )
>
> Gave dfix a new threshold of 200. and renamed back to dfix.sh
>   
Hi Rodrigo.

I am a little confused however about your comments about mysql. DFix 
does not look at mysql at all. You may have had some MySQL queries that 
were failing, but that wont directly cause DFIX to fire up. Also, I 
would not expect high cpu load or MySQL stuck queries to cause the 
number of Dovecot processes to go so high.

Anyway... to try and find the problem, you can run dfix with some 
command line options:

dfix.sh list
and
dfix.sh trace

These will produce some diagnostic information about log entries that 
are causing dfix to trigger.

Or, have a look in your admin mailbox. Unless you have disabled logging, 
DFix will send email notices (via cron) to your admin mailbox, telling 
you exactly what was going on.

Anyway, let me know what you find, and/or let me know if you need 
further help.

Regards,
Greg.

-- 
+---------------------------------------------------------------------+
|   / \   Greg Kuhnert, gkuhnert (at mark) compassnetworks.com.au               |
| <  o  > Compass Networks - Pointing you in the right direction      |
|   \ /   Come see us for BlueQuartz / BlueOnyx modules & Support.    |
+---------------------------------------------------------------------+