Index: [Article Count Order] [Thread]

Date:  Mon, 31 Aug 2009 09:20:08 +1000
From:  Greg Kuhnert <greg.kuhnert (at mark) theanchoragesylvania.com>
Subject:  [coba-e:15966] Re: DFix update
To:  coba-e (at mark) bluequartz.org
Message-Id:  <4A9B0928.6080801 (at mark) theanchoragesylvania.com>
In-Reply-To:  <561910.34542.qm (at mark) web65614.mail.ac4.yahoo.com>
References:  <561910.34542.qm (at mark) web65614.mail.ac4.yahoo.com>
X-Mail-Count: 15966

Dan Kriwitsky wrote:
>>  * DFIX now looks source IP's for any hosts that are
>> generating too many "file not founds errors" in your web
>> server. These are very often caused by systems that are
>> looking for vulnerabilities in your websites.
>>     
>
> You might want to look at your error log and whitelist certain search engine bots. Unless you're very careful about things, having a site that changed e.g. from using .html to .php will cause the Googlebot to generate lots of 404 errors and you may not want to block the Googlebot from crawling your site.
>
>   
Bots and other engines will have a higher percentage chance of getting 
blocked... but whats the answer? The problem is that this data comes 
from the apache error log, and there is no easy way to match if an IP 
relates to a search engine.

For those that are concerned about blocking google, it is possible to 
tune or totally turn off the feature by adjusting the threshold values 
in the config file....

Regards,
Greg.

--

+---------------------------------------------------------------------+
|   / \   Greg Kuhnert, gkuhnert (at mark) compassnetworks.com.au               |
| <  o  > Compass Networks - Pointing you in the right direction      |
|   \ /   Come see us for BlueQuartz / BlueOnyx modules & Support.    |
+---------------------------------------------------------------------+