Failed Tasks

Sometimes tasks can fail. Let’s see how to deal with failed tasks in nornir.

Let’s start as usual with the needed boilerplate:

[1]:
from nornir import InitNornir
from nornir.plugins.tasks import networking, text
from nornir.plugins.functions.text import print_result

nr = InitNornir(config_file="config.yaml")
cmh = nr.filter(site="cmh", type="network_device")

Now, as an example we are going to use a similar task group like the one we used in the previous tutorial:

[2]:
def basic_configuration(task):
    # Transform inventory data to configuration via a template file
    r = task.run(task=text.template_file,
                 name="Base Configuration",
                 template="base.j2",
                 path=f"templates/junos")

    # Save the compiled configuration into a host variable
    task.host["config"] = r.result

    # Deploy that configuration to the device using NAPALM
    task.run(task=networking.napalm_configure,
             name="Loading Configuration on the device",
             replace=False,
             configuration=task.host["config"])

Note that the path is hardcoded to templates/junos, this should cause an error when trying to apply the configuration to the EOS devices. Let’s see what happens:

[3]:
result = cmh.run(task=basic_configuration)

Let’s inspect the object:

[4]:
result.failed
[4]:
True
[5]:
result.failed_hosts
[5]:
{'spine00.cmh': MultiResult: [Result: "basic_configuration", Result: "Base Configuration", Result: "Loading Configuration on the device"],
 'leaf00.cmh': MultiResult: [Result: "basic_configuration", Result: "Base Configuration", Result: "Loading Configuration on the device"]}
[6]:
result['spine00.cmh'][1].exception

As you can see, the result object is aware something went wrong and you can inspect the errors if you so desire.

You can also using the print_result function on it:

[7]:
print_result(result)
basic_configuration*************************************************************
* leaf00.cmh ** changed : False ************************************************
vvvv basic_configuration ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ERROR
Subtask: Loading Configuration on the device (failed)

---- Base Configuration ** changed : False ------------------------------------- INFO
system {
  host-name leaf00.cmh;
  domain-name cmh.acme.local;
}
---- Loading Configuration on the device ** changed : False -------------------- ERROR
Traceback (most recent call last):
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/napalm/eos/eos.py", line 231, in _load_config
    self.device.run_commands(commands)
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/pyeapi/client.py", line 730, in run_commands
    response = self._connection.execute(commands, encoding, **kwargs)
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/pyeapi/eapilib.py", line 499, in execute
    response = self.send(request)
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/pyeapi/eapilib.py", line 418, in send
    raise CommandError(code, msg, command_error=err, output=out)
pyeapi.eapilib.CommandError: Error [1002]: CLI command 3 of 6 'system {' failed: invalid command [Invalid input (at token 1: '{')]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dbarroso/workspace/nornir/nornir/core/task.py", line 62, in start
    r = self.task(self, **self.params)
  File "/Users/dbarroso/workspace/nornir/nornir/plugins/tasks/networking/napalm_configure.py", line 32, in napalm_configure
    device.load_merge_candidate(filename=filename, config=configuration)
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/napalm/eos/eos.py", line 246, in load_merge_candidate
    self._load_config(filename, config, False)
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/napalm/eos/eos.py", line 238, in _load_config
    raise MergeConfigException(msg)
napalm.base.exceptions.MergeConfigException: Error [1002]: CLI command 3 of 6 'system {' failed: invalid command [Invalid input (at token 1: '{')]

^^^^ END basic_configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* leaf01.cmh ** changed : True *************************************************
vvvv basic_configuration ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
---- Base Configuration ** changed : False ------------------------------------- INFO
system {
  host-name leaf01.cmh;
  domain-name cmh.acme.local;
}
---- Loading Configuration on the device ** changed : True --------------------- INFO
[edit system]
-  host-name vsrx;
+  host-name leaf01.cmh;
+  domain-name cmh.acme.local;
^^^^ END basic_configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* spine00.cmh ** changed : False ***********************************************
vvvv basic_configuration ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ERROR
Subtask: Loading Configuration on the device (failed)

---- Base Configuration ** changed : False ------------------------------------- INFO
system {
  host-name spine00.cmh;
  domain-name cmh.acme.local;
}
---- Loading Configuration on the device ** changed : False -------------------- ERROR
Traceback (most recent call last):
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/napalm/eos/eos.py", line 231, in _load_config
    self.device.run_commands(commands)
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/pyeapi/client.py", line 730, in run_commands
    response = self._connection.execute(commands, encoding, **kwargs)
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/pyeapi/eapilib.py", line 499, in execute
    response = self.send(request)
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/pyeapi/eapilib.py", line 418, in send
    raise CommandError(code, msg, command_error=err, output=out)
pyeapi.eapilib.CommandError: Error [1002]: CLI command 3 of 6 'system {' failed: invalid command [Invalid input (at token 1: '{')]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dbarroso/workspace/nornir/nornir/core/task.py", line 62, in start
    r = self.task(self, **self.params)
  File "/Users/dbarroso/workspace/nornir/nornir/plugins/tasks/networking/napalm_configure.py", line 32, in napalm_configure
    device.load_merge_candidate(filename=filename, config=configuration)
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/napalm/eos/eos.py", line 246, in load_merge_candidate
    self._load_config(filename, config, False)
  File "/Users/dbarroso/.virtualenvs/nornir/lib/python3.7/site-packages/napalm/eos/eos.py", line 238, in _load_config
    raise MergeConfigException(msg)
napalm.base.exceptions.MergeConfigException: Error [1002]: CLI command 3 of 6 'system {' failed: invalid command [Invalid input (at token 1: '{')]

^^^^ END basic_configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* spine01.cmh ** changed : True ************************************************
vvvv basic_configuration ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
---- Base Configuration ** changed : False ------------------------------------- INFO
system {
  host-name spine01.cmh;
  domain-name cmh.acme.local;
}
---- Loading Configuration on the device ** changed : True --------------------- INFO
[edit system]
-  host-name vsrx;
+  host-name spine01.cmh;
+  domain-name cmh.acme.local;
^^^^ END basic_configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There is also a method that will raise an exception if the task had an error:

[8]:
from nornir.core.exceptions import NornirExecutionError
try:
    result.raise_on_error()
except NornirExecutionError:
    print("ERROR!!!")
ERROR!!!

Skipped hosts

Nornir will keep track of hosts that failed and won’t run future tasks on them:

[9]:
from nornir.core.task import Result

def hi(task):
    return Result(host=task.host, result=f"{task.host.name}: Hi, I am still here!")

result = cmh.run(task=hi)
[10]:
print_result(result)
hi******************************************************************************
* leaf01.cmh ** changed : False ************************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
leaf01.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* spine01.cmh ** changed : False ***********************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
spine01.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can force the execution of tasks on failed hosts by passing the argument on_failed=True:

[11]:
result = cmh.run(task=hi, on_failed=True)
print_result(result)
hi******************************************************************************
* leaf00.cmh ** changed : False ************************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
leaf00.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* leaf01.cmh ** changed : False ************************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
leaf01.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* spine00.cmh ** changed : False ***********************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
spine00.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* spine01.cmh ** changed : False ***********************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
spine01.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can also exclude the hosts that are “good” if you want to with the on_good flag:

[12]:
result = cmh.run(task=hi, on_failed=True, on_good=False)
print_result(result)
hi******************************************************************************
* leaf00.cmh ** changed : False ************************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
leaf00.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* spine00.cmh ** changed : False ***********************************************
vvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
spine00.cmh: Hi, I am still here!
^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To achieve this nornir keeps a set of failed hosts in it’s shared data object:

[13]:
nr.data.failed_hosts
[13]:
{'leaf00.cmh', 'spine00.cmh'}

If you want to mark some hosts as succeeded and make them back eligible for future tasks you can do it individually per host with the function recover_host or reset the list completely with reset_failed_hosts:

[14]:
nr.data.reset_failed_hosts()
nr.data.failed_hosts
[14]:
set()

Raise on error automatically

Alternatively, you can configure nornir to raise the exception automatically in case of error with the raise_on_error configuration option:

[16]:
nr = InitNornir(config_file="config.yaml", core={"raise_on_error": True})
cmh = nr.filter(site="cmh", type="network_device")
try:
    cmh.run(task=basic_configuration)
except NornirExecutionError:
    print("ERROR!!!")
ERROR!!!

Workflows

The default workflow should work for most use cases as hosts with errors are skipped and the print_result should give enough information to understand what’s going on. For more complex workflows this framework should give you enough room to easily implement them regardless of the complexity.