Pages: [1]   Go Down
Author Topic: strange results for GENERIC_DATA_INC do to inexact sleep  (Read 258 times)
0 Members and 2 Guests are viewing this topic.
daggett
Sr. Member
****

Karma: 0
Offline Offline

Posts: 109


View Profile
« on: January 12, 2007, 12:38:57 PM »

hi,
After some weeks testing, I regularely get incoherent data for GENERIC_DATA_INC data:
cpu _user or cpu_sys modules are sending the number of ticks parsing /proc/stat .
this number is an ever incrmenting value (I don't know what will happen if the counter resets though: how is it handled in Pandora?).
Sending only the actual_val/300 as an INC value, and let Pandora Server do the substraction, it gives oftenly data like 121%.

This is due to the use of sleep command in the Pandora Agent script to time execution: when there's a program consumming all the CPU on the monitored computer:
- the sleep command (e.g. sleep 300 seconds), will have a duration of more than 300seconds,
- the script becomes really long to execute, so Agent_execution_time can be above 300 seconds (sometimes near 360=6minutes instead of a few seconds)

So the program still counts on 300 seconds (5minutes) and divids by 300... In fact the whole cycle (execute and wait) can take more than 600seconds (10minutes), this is twice the expected time, so I could have had some near-200% values for cpu_sys!!

So every time the script is executed, it's sliding a bit, sometimes making the server think the data is missing.

so maybe we can do something else than use sleep? like cron, so it will be far more precise and optimize the code?

As a great majority of GENERIC_DATA_INC incremental data is strongly time-dependent, this situation makes them useless and meaningless.

bye for now!
« Last Edit: January 01, 1970, 01:00:00 AM by daggett » Logged

Sancho Lerena
Administrator
Expert member
*****

Karma: 24
Offline Offline

Posts: 1151


I can see everything... with my glasses :-)


View Profile WWW
« Reply #1 on: January 12, 2007, 08:55:57 PM »

Quote from: "daggett"
hi,

- the sleep command (e.g. sleep 300 seconds), will have a duration of more than 300seconds,
- the script becomes really long to execute, so Agent_execution_time can be above 300 seconds (sometimes near 360=6minutes instead of a few seconds)

So the program still counts on 300 seconds (5minutes) and divids by 300... In fact the whole cycle (execute and wait) can take more than 600seconds (10minutes), this is twice the expected time, so I could have had some near-200% values for cpu_sys!!

So every time the script is executed, it's sliding a bit, sometimes making the server think the data is missing.

so maybe we can do something else than use sleep? like cron, so it will be far more precise and optimize the code?

As a great majority of GENERIC_DATA_INC incremental data is strongly time-dependent, this situation makes them useless and meaningless.

bye for now!


You're right: in slow systems or systems with a high load, and using this with agents (not for network server) you can get some imprecissions.
For next version we will use agent contact time (local), who has much more precission.

Thanks for your observation, as usual you're testing Pandora in depth :-)
« Last Edit: January 01, 1970, 01:00:00 AM by nil » Logged

-- See you in the other screen.

daggett
Sr. Member
****

Karma: 0
Offline Offline

Posts: 109


View Profile
« Reply #2 on: January 15, 2007, 10:43:45 AM »

ok thanks,
but the problem will remain until we can put the exact time each data_inc was collected and put this time in the CDATA of the module or include at what exact time each module was executed in the XML data files.

I also saw that when processing old data, the alarms can be triggered, but the displayed date/time are the date/time when the server processed the XML data:
in my case, the XML data files can be processed several days later in the case of a connexion failure, so I get alarms triggered when processing those old data, and the dat/time displayed for these alarms is the data/time of the processing, not the date/time of the collected data.

bye for now!
« Last Edit: January 01, 1970, 01:00:00 AM by daggett » Logged

Pages: [1]   Go Up
Print
 
Jump to:  


SourceForge.net Logo  This site is monitored by Pandora FMS   ArticaST