![]() ![]() Whenever watchdog timer determines that a queue is stopped, the handler mlx5e_tx_timeout is being called as decided in the mlx5e initialization routine mlx5e_build_nic_netdev. The below mentioned data flow provides insight on how a Mellanox interface transmission queue stall is being dealt with. In the Mellanox driver, watchdog timer is configured to wait for 15 seconds to determine whether any of the transmission queue’s corresponding to the Mellanox interfaces are stopped and need to be recovered. Oracle Linux 7 source code has been used to make this study. The sections below concentrate on how individual NIC device drivers work on recovering the interfaces handled by them whenever a timeout situation is seen. Congestion at the Switch or fabric issues can also result in the stalls in the transmission queues.Remote network card is busy and unable to handle the incoming packets, causing a congestion on the source.Network card to which the interfaces are attached either stops transmitting the packets or fails in receiving the acknowledgements for the sent packets.During situations when the CPUs to which the network interface interrupt request(irq) lines are mapped are busy, interrupts being received for the packet transactions go un-handled, which will lead to the queue getting stalled and eventually getting timed-out.Reasons which could cause a working networking interface transmission queue to get timed-out will include, but are not limited to the following: In this article, we will see how Mellanox, Broadcom and Intel device drivers handle the interface timeout. And each of them have their own way of handling this situation. Networking Interfaces attached to Network Interface Cards (NIC) produced by individual vendors experience timeout due to multiple reason’s. Details of these handlers are being discussed in the further sections of this article. In case the routine dev_watchdog() identifies that a transmission queue is disabled ( netif_queue_stopped() returns TRUE) and the time since last frame transmission has exceeded the amount of time to wait ( dev→watchdog_timeo), the timer’s handler invokes a routine registered by the device driver to perform recovery. This timer calls the method dev_watchdog() every dev->watchdog_timeo ticks to check that the network adapter is working properly. At the same time, the method dev_watchdog_up() starts a timer to detect transmission problems. During the initialization of the network device, the transmission queues and the scheduler of the network device are activated via dev_activate(). A transmission queue is a collection of data packets waiting to be transmitted by the corresponding networking device. A network interface transmission queue is said to be timed-out, if it is in stopped condition and the time since last transmission from this queue has exceeded the threshold which is being dictated by the driver to the watchdog timer. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |