13 Prevent jobs to never end because of a problem in an algorithm
16 Sometimes, because of a bug, an algorithm enters an infinite loop or a dead lock
17 occurs. In these cases the application will never terminate.
20 To prevent such cases, in particular for batch jobs that could waste a lot of
21 resources before the problem is detected, it is possible to use the special algorithm
22 `Gaudi::EventWatchdogAlg`.
24 `Gaudi::EventWatchdogAlg` starts a secondary thread that sleeps until a timeout is
25 reached. At that point it prints a warning message and optionally a stack trace of
26 the process on stderr, then it sleeps for another timeout period unless it's
27 configured to abort the process when the timeout occurs.
32 from GaudiConfig2
import Configurable
33 from GaudiConfig2
import Configurables
as C
38 Example configuration of a job with no input and a algorithm that looks stuck.
40 algorithms = [C.GaudiTesting.SleepyAlg(
"StuckAlg", SleepTime=3600)]
41 app = C.ApplicationMgr(
42 EvtSel=
"NONE", TopAlg=[C.Gaudi.Sequencer(
"MainSequence", Members=algorithms)]
44 return [app] + list(app.TopAlg) + algorithms
48 conf: list[Configurable], timeout_seconds: int
49 ) -> list[Configurable]:
51 Take a configuration and adds a check on events reaching a timeout.
54 app = next(c
for c
in conf
if c.name ==
"ApplicationMgr")
56 watchdog = C.Gaudi.EventWatchdogAlg(
57 EventTimeout=timeout_seconds,
63 wrapping_seq = C.Gaudi.Sequencer(
64 "SequenceWithTimeout", Sequential=
True, Members=[watchdog] + list(app.TopAlg)
67 app.TopAlg = [wrapping_seq]
69 return conf + [watchdog, wrapping_seq]