<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Debugging Nagios performance problems</title>
	<atom:link href="http://www.barkingseal.com/2009/04/debugging-nagios-performance-problems/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.barkingseal.com/2009/04/debugging-nagios-performance-problems/</link>
	<description>Applied Trust off-leash: IT infrastructure, security, and performance</description>
	<lastBuildDate>Thu, 29 Jul 2010 20:30:05 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Mike Kniaziewicz</title>
		<link>http://www.barkingseal.com/2009/04/debugging-nagios-performance-problems/comment-page-1/#comment-67</link>
		<dc:creator>Mike Kniaziewicz</dc:creator>
		<pubDate>Fri, 19 Jun 2009 00:14:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.barkingseal.com/?p=725#comment-67</guid>
		<description>Great article Ben! I like the way you draw the picture then present solutions.

We noticed an increase with our load averages over a period of time. My enterprise monitors over 1700 hosts and 3100 services and is growing. We developed the same type of solutions you propose in your article. 

Performing a &quot;lsof&quot; we were able to identify directories being hit the hardest, so we placed those directories on SAN disks, which reduced the load dramatically. Directories included /usr/lib/mysql and /var/opt/nagios.

To help reduce the load even further we developed time periods for monitoring certain aspects of the enterprise. We only monitor development and test servers during business hours.  We also only monitor services that impact production during the night time hours. We found that receiving a notification at 2 AM about a high CPU was not the answer, so we track these problems then work on solutions the following morning.

You may also want to turn-up monitoing on systems that the end users are reporting problems. You can create custom templates and even call them problem_child.cfg. You would add hosts to this template only when a need occurs to step up the monitoring.

Thanks Ben for the article and I found it quite insightful. Keep up the fine work.</description>
		<content:encoded><![CDATA[<p>Great article Ben! I like the way you draw the picture then present solutions.</p>
<p>We noticed an increase with our load averages over a period of time. My enterprise monitors over 1700 hosts and 3100 services and is growing. We developed the same type of solutions you propose in your article. </p>
<p>Performing a &#8220;lsof&#8221; we were able to identify directories being hit the hardest, so we placed those directories on SAN disks, which reduced the load dramatically. Directories included /usr/lib/mysql and /var/opt/nagios.</p>
<p>To help reduce the load even further we developed time periods for monitoring certain aspects of the enterprise. We only monitor development and test servers during business hours.  We also only monitor services that impact production during the night time hours. We found that receiving a notification at 2 AM about a high CPU was not the answer, so we track these problems then work on solutions the following morning.</p>
<p>You may also want to turn-up monitoing on systems that the end users are reporting problems. You can create custom templates and even call them problem_child.cfg. You would add hosts to this template only when a need occurs to step up the monitoring.</p>
<p>Thanks Ben for the article and I found it quite insightful. Keep up the fine work.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ben</title>
		<link>http://www.barkingseal.com/2009/04/debugging-nagios-performance-problems/comment-page-1/#comment-63</link>
		<dc:creator>ben</dc:creator>
		<pubDate>Mon, 01 Jun 2009 13:59:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.barkingseal.com/?p=725#comment-63</guid>
		<description>@Kumaran - Yes, I absolutely agree. Some service checks need to be more granular (2-5 minutes) and others can be monitored on the 10-15 minute scale. It really depends on the service&#039;s purpose and the amount of downtime that can be tolerated. Ultimately, I&#039;d say the interval should be up to the service owner.</description>
		<content:encoded><![CDATA[<p>@Kumaran &#8211; Yes, I absolutely agree. Some service checks need to be more granular (2-5 minutes) and others can be monitored on the 10-15 minute scale. It really depends on the service&#8217;s purpose and the amount of downtime that can be tolerated. Ultimately, I&#8217;d say the interval should be up to the service owner.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kumaran</title>
		<link>http://www.barkingseal.com/2009/04/debugging-nagios-performance-problems/comment-page-1/#comment-62</link>
		<dc:creator>Kumaran</dc:creator>
		<pubDate>Mon, 01 Jun 2009 10:34:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.barkingseal.com/?p=725#comment-62</guid>
		<description>I think moving the service check timings to an interval of 10-15mins would have had an impact in improved performance. 10-15 mins can be ideal for a few devices but not something that I would recommend in a more dynamic environment. For example, for the WAN infras that we configure nagios 5mins checks themselves are too big an interval as a network/interface flap for a couple of mins can cause serious issues as a lot of HA systems would kickin. So, sooner the engineers know is better. And sometimes you may miss a whole flap completely in these 10-15mins interval

CPU 30-50% isn&#039;t an impact again.</description>
		<content:encoded><![CDATA[<p>I think moving the service check timings to an interval of 10-15mins would have had an impact in improved performance. 10-15 mins can be ideal for a few devices but not something that I would recommend in a more dynamic environment. For example, for the WAN infras that we configure nagios 5mins checks themselves are too big an interval as a network/interface flap for a couple of mins can cause serious issues as a lot of HA systems would kickin. So, sooner the engineers know is better. And sometimes you may miss a whole flap completely in these 10-15mins interval</p>
<p>CPU 30-50% isn&#8217;t an impact again.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hendrik</title>
		<link>http://www.barkingseal.com/2009/04/debugging-nagios-performance-problems/comment-page-1/#comment-57</link>
		<dc:creator>Hendrik</dc:creator>
		<pubDate>Thu, 07 May 2009 08:34:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.barkingseal.com/?p=725#comment-57</guid>
		<description>I found something new for me! Thanks!</description>
		<content:encoded><![CDATA[<p>I found something new for me! Thanks!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
