Pages: [1]   Go Down
Author Topic: 4.0 Windows server agents become gray after a while  (Read 440 times)
0 Members and 2 Guests are viewing this topic.
Christer
Jr. Member
**

Karma: 0
Offline Offline

Posts: 14


View Profile
« on: October 17, 2011, 08:42:02 AM »

I am testing 4.0 agents on 6 Microsoft Server 2008.
I set them up to monitor the basic, disc space, cpu usage, memory. The default in conf.
Then I also add some monitoring of services, Exchange server and other.
They work fine and report to the server, which is 4.0 as well.
But after a day or so they become gray. Restarting the agent service on the servers seem to help though. But one server that had monitoring on all exchange services, with 2010 its quite alot, maybe close to 20 or so. They did not correct them self after just restarting the agent service.
So I was forced to remove monitoring of those services.

I can't understand why anyone would pay for the enterprise version when the software doesn't even work?
Logged

tpalacios
Administrator
Sr. Member
*****

Karma: 8
Offline Offline

Posts: 190


The Game


View Profile
« Reply #1 on: October 17, 2011, 09:18:14 PM »

Greetings.

Maybe it would help if you gave us some additional info like how are you trying to monitor these services (via plugin, via modules in the pandora_agent.conf, in that case the syntax used to perform that check, the configuration and load of those servers...), some screenshots about that problem...

Besides, I've got no problem in telling you how to monitor and what, when it comes to monitoring Exchange 2010 in Pandora... simply tell me which services do you want to monitor, how are you trying to do it and I'll tell you if that's the way to go or else. ;)

In the meanwhile, when these agents become gray, enter into one of those servers and make sure the Pandora FMS Agent service is up and running, and that the cpu load is in a normal level... if so, make sure the XML data files are reaching the server... in order to do that, simply stop Pandora Server, and check your /var/spool/pandora/data_in directory and search any incoming XML data from those gray agents.

If you don't receive any XML data and the agent is still working, you've got a communication problem between agent and server, due to the Tentacle client or server failing.

I hope this helps for now.

Regards.
Logged

Christer
Jr. Member
**

Karma: 0
Offline Offline

Posts: 14


View Profile
« Reply #2 on: October 18, 2011, 09:48:50 AM »

Thank you for reply.

Ok when I go into Pandora I see all 6 of the servers I am testing on are gray on every module.
I pick one of those to troubleshoot on.
I go into the server, the service is still running.

The pandora agent log on that server lists some funny stuff:

Code:
2011-10-18 08:30:44 getOSVersion error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:30:44 getSystemName error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 getSystemName error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 Pandora agent started
2011-10-18 08:31:14 getDiskFreeSpace error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 getDiskFreeSpace error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 getCpuUsagePercentage error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 getFreememory error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 isServiceRunning error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 isServiceRunning error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 isServiceRunning error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 isServiceRunning error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 getSystemName error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 getOSName error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 getOSVersion error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:14 getSystemName error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application
2011-10-18 08:31:44 getSystemName error. Member:   

Function: GetObject

Error In: GetObjectEx

Error:    Invalid syntax

Code:     800401e4

Source:   Application

I monitor services like this in the .conf:
Code:
module_begin
module_name SQL Server Browser
module_type generic_proc
module_service SQLBrowser
module_description SQL Server Browser
module_end

Transfer mode is tentacle on default 41121
Interval is 30.
The other stuff besides services monitored is just the default standard stuff thats in there.
CPU usage, memory etc. But with slight modification on disk space free depending on what and how many drives are on the server.
Logged

Christer
Jr. Member
**

Karma: 0
Offline Offline

Posts: 14


View Profile
« Reply #3 on: October 18, 2011, 10:02:36 AM »

I did service stop pandora_server, and no data came into
/var/spool/pandora/data_in

Btw I also have xml_buffer at 1
in the agent conf on the server. That is not default, but I turned it on before the other day to see if it would help.
The temp folder in pandora agent folder had a lot of file in it. I'm guessing it the data for the xml buffer.

I turned on the pandora_server on the pandora server again.
Then i restarted the pandora agent service on the server I am troubleshooting, and data started to appear in
/var/spool/pandora/data_in
again. Appearing and dissapearing, doing its magic.
Also after a while the temp folder was cleaned out in the pandora_agent\temp folder on the monitored server.

That server is green once again in Pandora, but I very certain it will become gray after some time. Perhaps by tommorow.
Logged

Christer
Jr. Member
**

Karma: 0
Offline Offline

Posts: 14


View Profile
« Reply #4 on: October 19, 2011, 10:41:09 AM »

The server in question is gray today again.
pandora agent service is running on the server, the process pandoraagent.exe is running, looking alright, not too much memory usage or anything special.
The server it self is under no strange load.
Logged

tpalacios
Administrator
Sr. Member
*****

Karma: 8
Offline Offline

Posts: 190


The Game


View Profile
« Reply #5 on: October 20, 2011, 08:18:16 PM »

Ok let's do something.

In the pandora_agent.conf, enable debug to 1. Then restart your pandora_agent.

Now, your agent should start storing the XML data files in its temp folder, without sending it to the Pandora Server... besides, a special debug error log will be generated in the pandora_agent folder. Make sure you check it from time to time so we can find out if one of your modules is responsible for "crashing" your agent.

Leave it running in debug level for a while. If what you've told me is happening that way, there will be a point (when in debug 0, the agent would stop sending data), where the agent stops generating XML datas, so make sure you take a look at the data creation of each XML file.

If your servers are Windows Server 2008 running Exchange 2010 I assume they are 64 bit OS right? Then take a look at this:

Problems running agent for Windows 2008 in 64 bits

And please, be aware that calling any 64 bit system application from a 32 bit compiled application, will result in a system redirection to its 32 bit system application, which could not exist, or where the check you want to perform may not work.

This means that trying to check anything using a system32 file. Instead, make sure you can do the check using the WOW64 application. Otherwise, a workaround would be to copy your system32 application to another location, and execute it there. It works with Powershell.exe, for instance, and it's pretty useful when it comes up to monitoring Exchange Servers via Exchange Management Shell.

Regards.
Logged

Christer
Jr. Member
**

Karma: 0
Offline Offline

Posts: 14


View Profile
« Reply #6 on: October 21, 2011, 10:14:59 AM »

As a start.
Went into a server to see if it was 64 bit, and happened to catch a prompt saying pandora agent had crashed.
Screenshot attached.


* pandoraserver2003-64.jpg (136.3 KB, 809x640 - viewed 14 times.)
Logged

tpalacios
Administrator
Sr. Member
*****

Karma: 8
Offline Offline

Posts: 190


The Game


View Profile
« Reply #7 on: October 21, 2011, 11:40:24 AM »

Interesting.

Did you get the 4.0 agent from Sourceforge or from somewhere else?

In the meanwhile, schedule a task to make those agents restart by themselves every hour or something like that, to prevent those crashes and to keep the agents reporting to Pandora Server.

Regards.

PD: That screenshot is from a 2003 Server... you are not running Exchange 2010 there, right?
Logged

Christer
Jr. Member
**

Karma: 0
Offline Offline

Posts: 14


View Profile
« Reply #8 on: October 21, 2011, 12:38:42 PM »

Interesting.

Did you get the 4.0 agent from Sourceforge or from somewhere else?

In the meanwhile, schedule a task to make those agents restart by themselves every hour or something like that, to prevent those crashes and to keep the agents reporting to Pandora Server.

Regards.

PD: That screenshot is from a 2003 Server... you are not running Exchange 2010 there, right?

Yup downloaded from Sourceforge.

Thats the only 2003 one in the group, and it's not the one with Exchange 2010.
Others are 4 x Server 2008 R2 Enterprise x64.

I atleast changed the pandora agent service on the service to "Restart the Service" on
first failure, second failure and subsequent failures in the "Recovery" tab on the service.
Hmm restart via scheduled task, Ill have to figure that out...


* pandoraservice.jpg (50.61 KB, 401x429 - viewed 10 times.)
Logged

Christer
Jr. Member
**

Karma: 0
Offline Offline

Posts: 14


View Profile
« Reply #9 on: October 24, 2011, 01:34:36 PM »

That restart on failure stunt didn't seem to help at all.
I created scheduled task running a .cmd batch file as:
Code:
@echo off
net stop PandoraFMSAgent
net start PandoraFMSAgent

And I set them to run every 30 min, on all 6 servers.
On one server 30 min didn't cut it, because it would fail earlier than that.
So first tried 5 min, and that wasn't too often enough either.
So then set that one to restart the service every 1 min.
That server is a server 2008 r2 enterprise, same as 4 others.

Now it seems stable.
Logged

Christer
Jr. Member
**

Karma: 0
Offline Offline

Posts: 14


View Profile
« Reply #10 on: November 07, 2011, 09:03:08 AM »

Ok, I've figured out what the problem is.
When you do a VMI query like I do to check if services are running, then this process called
WMIprve.exe lingers around for circa 1 m 40 s or so.
So, if you keep doing queries before this process has quieted then eventually the queries will start to fail.

I made my own VBscript to try do the same thing, monitor service.
But I got the same result. But when I tried running the script manually while the WMIprve.se was
overloaded I figured it out. Got some error msg's about not enough memory. I guess for WMIprve.se because the machine still had load of memory free.

So, I now run the check every 180s, or 3 minutes. Besides some random quick grayness, if the agent data hasn't entered the database after 182s (its very sensitive).
Here's the script I made and how I called it, if anyone is interested.
It's not needed though, because the built in check works just as well.

Code:
if WScript.Arguments.Count = 0 then
    wscript.quit
end if

Dim objWMIService, strWMIQuery, strComputer, strServiceName
strComputer = "."
strServiceName = WScript.Arguments(0)
strWMIQuery = "Select * from Win32_Service Where Name = '" & strServiceName & "' and state='Running'"
Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")

if objWMIService.ExecQuery(strWMIQuery).Count > 0 then
wscript.echo "1"
else
wscript.echo "0"
end if

I call it like this:
Code:
module_begin
module_name uberservice
module_type generic_proc
module_exec cscript servicemon.vbs "uberservice" //Nologo
module_description blablabla
module_end

I guess I should have done what the readme for the agent says :)

Quote
This is the time interval in seconds in which the agent will collect data from the host system and send the data packages to the server. The recommended value ranges from 300 (5 minutes) to 600 (10 minutes).

5 minutes feels like an eternity though.
Logged

Pages: [1]   Go Up
Print
 
Jump to:  


SourceForge.net Logo  This site is monitored by Pandora FMS   ArticaST