Failed Tasks

Sometimes tasks can fail. Let’s see how to deal with failed tasks in nornir.

Let’s start as usual with the needed boilerplate:

[1]:
import logging

from nornir import InitNornir
from nornir.core.task import Task, Result
from nornir_utils.plugins.functions import print_result

# instantiate the nr object
nr = InitNornir(config_file="config.yaml")
# let's filter it down to simplify the output
cmh = nr.filter(site="cmh", type="host")

def count(task: Task, number: int) -> Result:
    return Result(
        host=task.host,
        result=f"{[n for n in range(0, number)]}"
    )

def say(task: Task, text: str) -> Result:
    if task.host.name == "host2.cmh":
        raise Exception("I can't say anything right now")
    return Result(
        host=task.host,
        result=f"{task.host.name} says {text}"
    )

Now, as an example we are going to use a similar task group like the one we used in the previous tutorial:

[2]:
def greet_and_count(task: Task, number: int):
    task.run(
        name="Greeting is the polite thing to do",
        severity_level=logging.DEBUG,
        task=say,
        text="hi!",
    )

    task.run(
        name="Counting beans",
        task=count,
        number=number,
    )
    task.run(
        name="We should say bye too",
        severity_level=logging.DEBUG,
        task=say,
        text="bye!",
    )

    # let's inform if we counted even or odd times
    even_or_odds = "even" if number % 2 == 1 else "odd"
    return Result(
        host=task.host,
        result=f"{task.host} counted {even_or_odds} times!",
    )

Remember there is a hardcoded error on host2.cmh, let’s see what happens when we run the task:

[3]:
result = cmh.run(
    task=greet_and_count,
    number=5,
)

Let’s inspect the object:

[4]:
result.failed
[4]:
True
[5]:
result.failed_hosts
[5]:
{'host2.cmh': MultiResult: [Result: "greet_and_count", Result: "Greeting is the polite thing to do"]}
[6]:
result['host2.cmh'].exception
[6]:
nornir.core.exceptions.NornirSubTaskError()
[7]:
result['host2.cmh'][1].exception
[7]:
Exception("I can't say anything right now")

As you can see, the result object is aware something went wrong and you can inspect the errors if you so desire.

You can also using the print_result function on it:

[8]:
print_result(result)
greet_and_count*****************************************************************
* host1.cmh ** changed : False *************************************************
vvvv greet_and_count ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
host1.cmh counted even times!
---- Counting beans ** changed : False ----------------------------------------- INFO
[0, 1, 2, 3, 4]
^^^^ END greet_and_count ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* host2.cmh ** changed : False *************************************************
vvvv greet_and_count ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ERROR
Subtask: Greeting is the polite thing to do (failed)

---- Greeting is the polite thing to do ** changed : False --------------------- ERROR
Traceback (most recent call last):
  File "/home/dbarroso/workspace/dbarrosop/nornir/nornir/core/task.py", line 98, in start
    r = self.task(self, **self.params)
  File "<ipython-input-1-3ab8433d31a3>", line 20, in say
    raise Exception("I can't say anything right now")
Exception: I can't say anything right now

^^^^ END greet_and_count ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There is also a method that will raise an exception if the task had an error:

[9]:
from nornir.core.exceptions import NornirExecutionError
try:
    result.raise_on_error()
except NornirExecutionError:
    print("ERROR!!!")
ERROR!!!

Skipped hosts

Nornir will keep track of hosts that failed and won’t run future tasks on them:

[10]:
from nornir.core.task import Result

def hi(task: Task) -> Result:
    return Result(host=task.host, result=f"{task.host.name}: Hi, I am still here!")

result = cmh.run(task=hi)
[11]:
print_result(result)
hi******************************************************************************
* host1.cmh ** changed : False *************************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
host1.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can force the execution of tasks on failed hosts by passing the argument on_failed=True:

[12]:
result = cmh.run(task=hi, on_failed=True)
print_result(result)
hi******************************************************************************
* host1.cmh ** changed : False *************************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
host1.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* host2.cmh ** changed : False *************************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
host2.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can also exclude the hosts that are “good” if you want to with the on_good flag:

[13]:
result = cmh.run(task=hi, on_failed=True, on_good=False)
print_result(result)
hi******************************************************************************
* host2.cmh ** changed : False *************************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
host2.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To achieve this nornir keeps a set of failed hosts in it’s shared data object:

[14]:
nr.data.failed_hosts
[14]:
{'host2.cmh'}

If you want to mark some hosts as succeeded and make them back eligible for future tasks you can do it individually per host with the function recover_host or reset the list completely with reset_failed_hosts:

[15]:
nr.data.reset_failed_hosts()
nr.data.failed_hosts
[15]:
set()

Raise on error automatically

Alternatively, you can configure nornir to raise the exception automatically in case of error with the raise_on_error configuration option:

[16]:
nr = InitNornir(config_file="config.yaml", core={"raise_on_error": True})
cmh = nr.filter(site="cmh", type="host")
try:
    result = cmh.run(
        task=greet_and_count,
        number=5,
    )
except NornirExecutionError:
    print("ERROR!!!")
ERROR!!!

Workflows

The default workflow should work for most use cases as hosts with errors are skipped and the print_result should give enough information to understand what’s going on. For more complex workflows this framework should give you enough room to easily implement them regardless of the complexity.