Welcome to the Inedo Forums! Check out the Forums Guide for help getting started.
If you are experiencing any issues with the forum software, please visit the Contact Form on our website and let us know!
Otter - Time run cannot be longer than 23:59
-
When a Otter monitored server loses contact with the server hosting the Otter service (because of network problems), during a job run, the job runs indefinitely and will never cancel without manual intervention.
On Otter webpage, It will show the server as remediating the drift, but the status text will never update. Logging on to the server will show a Otter service running and associated PowerShell spawned script processes.
On one server it told me it had started on 19/01/2024 but had only run for 19:23 hours/mins. The job run time does not appear to have dealt with more than one day.
There appears to be a problem with Otter that once it has sent a command to the server needing remediating, it fails to check on the status of it running's and instead needs to force a cancel out if it loses contact. This callback home method would ensure that if connection from the remediating server to the host server has been lost for a long time (1 hour?) any scripts are cancelled. When connectivity is re-established, the run should be updated to show a failure and the server return to an "Error" state rather than "Remediating drift"
I have asked this before but was told to wrap my commands in a "maximum time" loop (which didn't work) and also that it was a result of user interaction being requested. However across over 50 servers I've seen them recently fail on 7 different tasks at different times. None of the scripts have any user interaction in, and when there is no infrastructure problems, they all run perfectly!
I fix this issue by restarting the Otter service via the Web App which clears the outstanding jobs. This isn't however very ideal! I have thought about writing a scheduled task to do this daily but it seems a bit of a fudge!
Can we look at this issue again because it's problem the only thing about Otter that makes it unattractive to other users when I tell them about it.
-
Hi @Jon ,
Unfortunately it's "not that simple", and this will require a bit of troubleshooting to figure out what the issue is, precisely. There are a lot of areas where this could occur, and we can't add a generalized "job killer" until we understand what the issue is.
You'll have to dig in behind the scenes (Admin > Executions) and identify exactly where things are freezing. The last log message will indicate that. Try to find as many examples as possible.
Keep in mind that Otter is not not really "connected" to a server, and a server does not (and cannot) "call home". Instead, Operations (i.e. OtterScript) opens a connection to a server, sends a command, then disconnects. All network errors we've ever seen in this process will yield a crash.
However, unless explicitly specified, a command will not timeout. So this means if you run a PowerShell script that basically just says "sleep indefinitely", then the Execution will never complete. Obviously no one would write that script, but some PowerShell scripts have a consequence of that. No built in Operation should ever cause that to happen, which is why we need to know precisely where this is happening.
Best,
Alana