RoCEv2 Protocol
rogue.protocols.rocev2 is the C++/ibverbs-backed receive server for RDMA
over Converged Ethernet v2 (RoCEv2). Core opens an ibverbs device and
allocates a shared Protection Domain (PD); Server creates the Completion
Queue, Queue Pair (type IBV_QPT_RC), and Memory Region, then polls for
incoming RDMA WRITEs and forwards them as Rogue stream frames.
The user-facing entry point is pyrogue.protocols.RoCEv2Server — a
pyrogue.Device wrapper that owns the C++ server and the surf-side
RoceEngine register block. It manages the QP handshake with the FPGA,
exposes 15 LocalVariable status/configuration fields, and integrates
cleanly into the standard PyRogue device tree via Root.addInterface().
Where RoCEv2 Fits In The Stack
Typical protocol stack:
FPGA RoceEngine (surf) <-> RoCEv2 NIC <-> RoCEv2Server (C++ ibverbs + Python wrapper)
|
Rogue stream graph (channels via getChannel())
The FPGA side runs a surf.ethernet.roce._RoceEngine.RoceEngine block that
drives a metadata bus. RoCEv2Server reads QP parameters from that metadata
bus during _start() to complete the RC connection setup. Once the QP
reaches IBV_QPS_RTS, the FPGA issues RDMA WRITEs directly into the host
Memory Region and the C++ completion-queue polling thread reassembles frames
for the Rogue stream graph.
Key Classes
rogue.protocols.rocev2.Core: opens the ibverbs device and allocates the shared Protection Domain (PD).rogue.protocols.rocev2.Server: inheritsCoreplus streamMasterandSlave. Owns the Completion Queue (CQ), Queue Pair (QP, typeIBV_QPT_RC), and Memory Region (MR), runs the RX completion-queue polling thread, and re-posts receive work requests as frames are returned by downstream consumers (zero-copy slab).pyrogue.protocols.RoCEv2Server: the PyRogueDevicewrapper. Adds theRoceEnginechild device (fromsurf.ethernet.roce._RoceEngine) and 15LocalVariablefields for status and configuration.
When To Use
You are talking to a RoCEv2-capable FPGA (e.g. the Simple-10GbE-RUDP-KCU105-Example
RoceEngineblock).You need higher throughput than UDP+RSSI provides on a 10/25/100 GbE link.
You have a host NIC that supports RoCEv2 (Mellanox/NVIDIA
mlx5_*) or you are developing against SoftRoCE (rxe0).
Operational Requirements
Jumbo-frame MTU (9000 bytes)
The RoCEv2 server defaults to a 9000-byte maximum payload per RDMA WRITE
(DefaultMaxPayload in include/rogue/protocols/rocev2/Core.h). The
host NIC and every switch in the path MUST be configured for a 9000-byte
(or larger) MTU — typically:
# Host NIC (replace with your interface)
sudo ip link set dev enp1s0f0 mtu 9000
If any hop along the path uses a 1500-byte MTU, the FPGA will see RDMA WRITEs silently dropped or NACK’d. There is no graceful auto-fallback.
SoftRoCE (rxe) development workflow
For development without a RoCEv2 NIC, use the in-kernel rdma_rxe
SoftRoCE driver. On Debian/Ubuntu install rdma-core and load the
module:
sudo apt install rdma-core ibverbs-utils
sudo modprobe rdma_rxe
sudo rdma link add rxe0 type rxe netdev <ifname>
ibv_devices # confirm rxe0 appears
Then pass deviceName='rxe0' to RoCEv2Server. SoftRoCE works for
functional bring-up and CI but does not deliver the throughput of a
hardware RoCEv2 NIC.
surf RoceEngine dependency
RoCEv2Server constructs a child device of type
surf.ethernet.roce._RoceEngine.RoceEngine to drive the FPGA-side
metadata-bus handshake. The surf Python package MUST be on the
Python path. Pass roceEngine=<existing instance> if your tree
already instantiates the engine elsewhere; otherwise pass
roceEngineOffset and roceMemBase so RoCEv2Server can build
its own.
Python Example
import pyrogue as pr
import pyrogue.protocols
import rogue.protocols.srp
class MyRoot(pr.Root):
def __init__(self):
super().__init__(name='MyRoot')
# SRP memBase for the RoceEngine register block
srp = rogue.protocols.srp.SrpV3()
self.add(pyrogue.protocols.RoCEv2Server(
name = 'RoCEv2',
ip = '10.0.0.1', # FPGA IP
deviceName = 'rxe0', # SoftRoCE or 'mlx5_0'
ibPort = 1,
gidIndex = 0,
maxPayload = 9000, # jumbo-frame MTU
rxQueueDepth = 256,
roceEngineOffset = 0x00000000,
roceMemBase = srp,
))
self.addInterface(self.RoCEv2)
# Channel-filtered view (channel id = immediate value bits [7:0])
data_ch = self.RoCEv2.getChannel(0)
Logging
RoCEv2 protocol-state logging is emitted by the C++ server and the PyRogue wrapper via Rogue logging.
Logger names:
C++ core/server loggers (created via
rogue::Logging::create):rocev2.Corerocev2.Server
PyRogue wrapper loggers are created via
pyrogue.logInitand follow thepyrogue.<family>.<ClassName>.<path>pattern.RoCEv2Serveris apyrogue.Device, so its logger is:pyrogue.Device.RoCEv2Server.<path>
For the example above (
name='RoCEv2'insideMyRoot), the wrapper logger ispyrogue.Device.RoCEv2Server.MyRoot.RoCEv2.Unified Logging API examples:
pyrogue.setLogLevel('rocev2.Server', 'DEBUG')pyrogue.setLogLevel('pyrogue.Device.RoCEv2Server.MyRoot.RoCEv2', 'DEBUG')
Legacy Logging API example:
rogue.Logging.setFilter('rocev2.Server', rogue.Logging.Debug)
These loggers are most useful when debugging QP bring-up, metadata-bus handshake failures, and RC connection state transitions.
Troubleshooting
Stale FPGA resources after a crash
If a previous session crashed or was killed without a clean Root.stop(),
the FPGA RoCEv2 engine may still hold a PD, MR, or QP from that session.
The next _start() call will fail with a rogue.GeneralError whose
message includes:
FPGA PD allocation failed — the FPGA RoCEv2 engine likely has stale
resources from a previous session that crashed without a clean teardown.
Reprogram the FPGA bitfile to reset its state.
The same guidance appears for MR and QP allocation failures. The firmware provides no way to enumerate existing resources without knowing the QPN in advance, so a reprogram is the only reliable recovery path.
MR length exceeds 32-bit metadata-bus field
RoCEv2Server._start() computes the FPGA-side Memory Region length as
maxPayload * rxQueueDepth. If this product is >= 232 (4 GiB),
the value cannot be represented in the 32-bit length field of the
metadata bus and _start() raises a rogue.GeneralError:
FPGA MR length ... bytes (... GiB) exceeds the 32-bit field of the
metadata bus. Reduce --roceMaxPay (...) or --roceQDepth (...).
The default configuration (maxPayload=9000, rxQueueDepth=256,
product ~2.3 MB) is four orders of magnitude below this limit. The guard
protects against accidentally passing very large queue depths.
QP state-machine enforcement
The Python metadata-bus encoders (_encode_modify_qp,
_encode_err_qp, _encode_destroy_qp) are stateless — they will
encode and send any QP transition without checking whether it is valid.
The FPGA state machine is the single source of truth: illegal
transitions return success=False, which is raised as a
rogue.GeneralError on the setup path and logged as a warning on the
teardown path. No Python-side state mirror is maintained because it would
become stale after a crash and block legitimate teardown attempts.
API Reference
Python (PyRogue wrapper): RoCEv2Server