PCIe DMA Model

This page explains how axi-pcie-core moves data between host memory and the FPGA application over PCIe. The central entity is AxiPcieDma, which wraps surf’s AxiStreamDmaV2 engine and presents per-lane AXI-Stream interfaces to the application.

Inbound and Outbound FIFOs

Inbound (IB) refers to data flowing from the host into the FPGA application. Outbound (OB) refers to data flowing from the FPGA application back to the host.

AxiPcieDma instantiates one AxiStreamDmaV2 lane for each DMA channel. Each lane contains a dedicated IB FIFO (host → FPGA application) and a dedicated OB FIFO (FPGA application → host). The number of lanes is set by the DMA_SIZE_G generic (maximum 8), which is bounded by the AxiPcieCrossbar slave-port budget: the crossbar provides 10 AXI4 slave ports — one descriptor port, up to eight DMA lane ports, and one user general-purpose port.

Descriptor Rings

Each AxiStreamDmaV2 lane uses a descriptor ring: a host-resident circular buffer whose entries each point to a host DMA buffer (physical address + byte count). The DMA engine fetches descriptors from the ring over PCIe using the descriptor AXI4 slave port on AxiPcieCrossbar. After the transfer completes, the engine writes a completion status word back to the descriptor entry, signalling the driver that the buffer is ready.

The descriptor address space is at most 40 bits wide, even though the internal AXI4 bus uses 64-bit addresses (ADDR_WIDTH_C = 64 in AXI_PCIE_CONFIG_C). This 40-bit limit is an architectural constraint of AxiStreamDmaV2; software must ensure descriptor ring memory is allocated within the low 1 TB of host physical address space.

Back-Pressure via tReady

tReady is the AXI-Stream handshake signal asserted by a slave to accept a data beat from its upstream master. When a downstream application sink de-asserts tReady it stalls AxiStreamDmaV2 IB delivery on that lane, which in turn stalls AxiPcieDma, and ultimately stalls PCIe completion scheduling for that DMA channel. Conversely, when an application source has no data to send it simply de-asserts tValid; this produces no OB beats and does not affect other lanes.

This is the canonical SLAC AXI-Stream back-pressure model: the FPGA application must keep its IB sink draining fast enough to absorb the expected PCIe throughput, or it must accept that the IB FIFO will fill and PCIe transfers will stall on that lane. Per-lane IB/OB traffic monitors (accessible at BAR0 offsets 0x0006_0000 and 0x0006_8000) can be used to observe stall conditions.

DMA IRQ Flow

When AxiStreamDmaV2 completes one or more transfers it asserts the dmaIrq output of AxiPcieDma. AxiPcieUltrascalePlusIrqFsm receives this level-sensitive signal and converts it to a rising-edge MSI request for the PCIe PHY IP (usrIrqReq). The FSM waits for the PHY’s usrIrqAck handshake before de-asserting the request and returning to idle. This ensures that a new interrupt cannot collide with one that is still being serviced.

The PCIe PHY delivers the MSI to the host. The host driver (PyRogue / rogue.hardware.axi.AxiStreamDma) handles the interrupt, reads the completion status words from the descriptor ring, recycles completed descriptors, and schedules new transfers.

End-to-End Data-Flow Diagram

The diagram below traces both the IB path (host → FPGA application) and the OB path (FPGA application → host), including the descriptor and IRQ paths:

Host (DMA buffers + descriptor ring in host physical memory)
  |
  |  PCIe lanes (Gen3 x16 or Gen4 x8, board-dependent)
  |
  v
PCIe PHY (per-board .dcp wrapper, 250 MHz recovered clock)
  |
  |  AXI4 (256-bit data, 64-bit addr, ID_BITS=4)
  |
  v
AxiPcieCrossbar  (DMA_SIZE_G + 2 slaves -> 1 master -> PCIe PHY)
  |
  +---> AxiPcieReg  (register path -> BAR0 AXI-Lite slaves)
  |
  +---> AxiPcieDma  (AxiStreamDmaV2 engine, DMA_SIZE_G lanes)
           |
           |  IB path (host -> FPGA application)
           +--- IB FIFO 0 --> appAxisMasters[0]  -> application lane 0
           +--- IB FIFO 1 --> appAxisMasters[1]  -> application lane 1
           +--- ...
           +--- IB FIFO N --> appAxisMasters[N]  -> application lane N
           |
           |  OB path (FPGA application -> host)
           +--- OB FIFO 0 <-- appAxisSlaves[0]   <- application lane 0
           +--- OB FIFO 1 <-- appAxisSlaves[1]   <- application lane 1
           +--- ...
           +--- OB FIFO N <-- appAxisSlaves[N]   <- application lane N
           |
           |  Descriptor path (one AXI4 slave port on AxiPcieCrossbar)
           +--- descriptor read/write <-> host descriptor ring (<=40-bit addr)
           |
           |  IRQ path
           +--- dmaIrq -> AxiPcieUltrascalePlusIrqFsm -> usrIrqReq (MSI)
                                                              |
                                                              v
                                                        PCIe PHY -> host MSI