RTL Entity Reference
This page documents the four top-level RTL entities in axi-pcie-core:
AxiPcieCore (the abstract board-core role), AxiPcieDma (the DMA
engine), AxiPipCore (PCIe Intercommunication Protocol), and
AxiGpuAsyncCore (GPU-Direct async data path). Per-board concrete
realisations of AxiPcieCore (e.g., XilinxKcu1500Core,
XilinxAlveoU200Core) are listed in the supported-boards reference;
internal surf primitives referenced below are documented in the surf library
and are not expanded here.
AxiPcieCore
Purpose
AxiPcieCore is the abstract name for the per-board top-level integration
entity — the single instantiation point for a given PCIe carrier board.
Concrete realisations are XilinxKcu1500Core
(hardware/XilinxKcu1500/rtl/XilinxKcu1500Core.vhd) and
XilinxAlveoU200Core
(hardware/XilinxAlveoU200/core/XilinxAlveoU200Core.vhd), among others.
All <Board>Core entities share the same logical port surface — DMA
AXI-Stream arrays, PIP AXI4 write masters, application AXI-Lite master pair,
and board-specific I/O; only the board I/O section differs between boards.
See Architecture for the structural overview and
Board Support for the board layering rationale.
Generics
Name |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Propagation delay for simulation. |
|
|
|
When |
|
|
|
Base TCP port number used by the rogue simulation stubs. |
|
|
|
Number of virtual DMA channels exposed in simulation. |
|
|
(required) |
Firmware build information record (version, timestamp, git hash)
inserted into |
|
|
(required) |
AXI-Stream bus configuration for the application-facing DMA ports. |
|
|
|
Number of DMA lanes. Range 1–8 is bounded by the 10-port AXI
crossbar in |
|
|
|
Maximum DMA burst size in bytes. |
|
|
|
Driver-type identifier exposed in |
|
|
|
Enables the GPU-Direct async data path ( |
DMA_SIZE_G sets the width of the DMA stream arrays dmaObMasters,
dmaObSlaves, dmaIbMasters, dmaIbSlaves at the top level. The
maximum value of 8 is a hard constraint: the AxiPcieCrossbar wraps a
pre-built Vivado AXI Interconnect DCP from
shared/ip/AxiPcie{16,32,64}BCrossbarIpCore/ that supports at most 10
slave ports (1 descriptor + up to 8 DMA lanes + 1 user GP), leaving no
room for additional lanes beyond 8.
The per-board AXI_PCIE_CONFIG_C constant — defined in
hardware/<Board>/rtl/AxiPciePkg.vhd — configures the shared RTL’s AXI
bus width (typically 256-bit data, 64-bit address, 4-bit ID, 8-bit length).
The roadmap term “transceiverClass” maps to this constant paired with the
per-board PCIe IP variant (Gen3x8 / Gen3x16 / Gen4x8).
Ports
Name |
Dir |
Type |
Description |
|---|---|---|---|
Clock / Reset (outputs) |
|||
|
out |
|
250 MHz DMA system clock (sourced from PCIe PHY recovered reference). |
|
out |
|
Synchronous reset on |
|
out |
|
Per-group buffer-full pause signals from the DMA engine. |
DMA Streams (Outbound — FPGA to Host) |
|||
|
out |
|
Outbound DMA stream masters (FPGA application to host). |
|
in |
|
Outbound DMA stream flow-control slaves. |
DMA Streams (Inbound — Host to FPGA) |
|||
|
in |
|
Inbound DMA stream masters (host to FPGA application). |
|
out |
|
Inbound DMA stream flow-control slaves. |
PIP AXI4 Interface (dmaClk domain) |
|||
|
out |
|
PIP inbound write master — peer FPGA write arriving over PCIe. |
|
in |
|
PIP inbound write slave (application accepts writes). |
|
in |
|
PIP outbound write master — local FPGA initiates write to peer. |
|
out |
|
PIP outbound write flow-control slave. |
User General Purpose AXI4 (dmaClk domain) |
|||
|
in |
|
User GP AXI4 read master — optional application PCIe read path. |
|
out |
|
User GP AXI4 read slave. |
|
in |
|
User GP AXI4 write master. |
|
out |
|
User GP AXI4 write slave. |
Application AXI-Lite (appClk domain) |
|||
|
in |
|
Application clock. May differ from |
|
in |
|
Synchronous reset on |
|
out |
|
Application-region AXI-Lite read master
(BAR0 offset |
|
in |
|
Application-region AXI-Lite read slave. |
|
out |
|
Application-region AXI-Lite write master. |
|
in |
|
Application-region AXI-Lite write slave. |
Board I/O (board-specific) |
|||
|
in |
|
PCIe fundamental reset (active-low). |
|
in |
|
PCIe 100 MHz differential reference clock. |
|
in |
|
PCIe serial receive lanes (N = 8 or 16 depending on board). |
|
out |
|
PCIe serial transmit lanes. |
AxiPcieDma
Purpose
AxiPcieDma is the board-agnostic AXI-Stream DMA engine at
shared/rtl/AxiPcieDma.vhd. It wraps surf.AxiStreamDmaV2 (scatter-
gather descriptor engine), an AXI4 crossbar (AxiPcieCrossbar), per-lane
inbound and outbound AXI-Stream FIFOs, and surf.AxiStreamMonAxiL traffic
monitors for both directions. See PCIe DMA Model for the
full data-flow narrative.
Generics
Name |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Propagation delay for simulation. |
|
|
|
When |
|
|
|
Base TCP port for rogue simulation; per-lane offset is
|
|
|
|
Virtual channel count per DMA lane in simulation. |
|
|
|
Enables simulation-mode timing relaxations in sub-components. |
|
|
|
Maximum AXI4 burst size in bytes issued by the descriptor engine. |
|
|
|
Number of DMA lanes. See the constraint note below. |
|
|
|
AXI-Stream configuration for the application-facing DMA ports. |
|
|
|
Internal pipeline stages in the IB/OB FIFOs. |
|
|
|
Output pipeline stages in the IB/OB FIFOs. |
|
|
|
Descriptor RAM synthesis mode passed to |
|
|
|
Descriptor RAM memory type ( |
|
|
|
Descriptor arbitration policy; |
DMA_SIZE_G sets the width of the DMA stream arrays exposed at the
entity boundary. The maximum of 8 is imposed by
AxiPcieCrossbar, which wraps a pre-built Vivado AXI Interconnect DCP
(shared/ip/AxiPcie{16,32,64}BCrossbarIpCore/) with a fixed
10-slave-port topology: 1 descriptor + up to 8 DMA lanes + 1 user GP =
10. Instantiating more than 8 DMA lanes is not possible without a new
DCP. The descriptor engine (surf.AxiStreamDmaV2) supports host
addresses up to 40 bits wide; the crossbar uses 64-bit addresses
internally (required for the GPU-Direct and PIP paths) with resizing at
the crossbar boundary.
Ports
Name |
Dir |
Type |
Description |
|---|---|---|---|
Clock / Reset |
|||
|
in |
|
250 MHz DMA system clock. |
|
in |
|
Synchronous active-high reset. |
PCIe AXI4 (axiClk domain) |
|||
|
out |
|
AXI4 read master to PCIe PHY (DMA host read). |
|
in |
|
AXI4 read slave from PCIe PHY. |
|
out |
|
AXI4 write master to PCIe PHY (DMA host write). |
|
in |
|
AXI4 write slave from PCIe PHY. |
PIP AXI4 (axiClk domain) |
|||
|
in |
|
PIP outbound write master (peer FPGA write, routed through crossbar slot 0). |
|
out |
|
PIP outbound write flow-control slave. |
User GP AXI4 (axiClk domain) |
|||
|
in |
|
User general-purpose AXI4 read master (crossbar slot DMA_SIZE_G+1). |
|
out |
|
User GP read slave. |
|
in |
|
User GP AXI4 write master. |
|
out |
|
User GP write slave. |
AXI4-Lite Control (axiClk domain) |
|||
|
in |
|
AXI-Lite read masters: [0] DMA descriptor engine, [1] IB monitor, [2] OB monitor. |
|
out |
|
AXI-Lite read slaves (same indexing). |
|
in |
|
AXI-Lite write masters. |
|
out |
|
AXI-Lite write slaves. |
DMA Streams (axiClk domain) |
|||
|
out |
|
Level-triggered DMA interrupt; the PCIe PHY asserts MSI on the rising edge. |
|
out |
|
Per-group buffer-full pause from |
|
out |
|
Outbound DMA stream masters (FPGA to host). |
|
in |
|
Outbound DMA stream flow-control slaves. |
|
in |
|
Inbound DMA stream masters (host to FPGA). |
|
out |
|
Inbound DMA stream flow-control slaves. |
AxiPipCore
Purpose
VHDL entity name: AxiPciePipCore (in file
protocol/pip/rtl/AxiPciePipCore.vhd); the PyRogue class drops the
Pcie infix to AxiPipCore.
AxiPciePipCore implements the PCIe Intercommunication Protocol (PIP),
which allows one FPGA to write directly into a peer FPGA’s address space over
PCIe without CPU involvement. The core packetises outbound AXI-Stream frames
into 256-byte AXI4 write bursts via surf.AxiStreamPacketizer2 and
reconstructs inbound AXI4 write bursts back into AXI-Stream via
surf.AxiStreamDepacketizer2.
Generics
Name |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Propagation delay for simulation. |
|
|
|
Number of independent AXI-Stream channels multiplexed over the single PIP AXI4 write path. |
|
|
(required) |
AXI-Stream configuration for the application-facing PIP stream
ports; must match the board |
Ports
Name |
Dir |
Type |
Description |
|---|---|---|---|
AXI4-Lite Control (axilClk domain) |
|||
|
in |
|
AXI-Lite clock. |
|
in |
|
AXI-Lite reset. |
|
in |
|
AXI-Lite read master for PIP control/monitoring registers. |
|
out |
|
AXI-Lite read slave. |
|
in |
|
AXI-Lite write master. |
|
out |
|
AXI-Lite write slave. |
|
out |
|
Per-channel transmit enable, controlled via AXI-Lite registers. |
AXI-Stream Interface (axisClk domain) |
|||
|
in |
|
AXI-Stream clock. |
|
in |
|
AXI-Stream reset. |
|
in |
|
Outbound stream masters (application writes to peer FPGA). |
|
out |
|
Outbound stream flow-control slaves. |
|
out |
|
Inbound stream masters (data received from peer FPGA). |
|
in |
|
Inbound stream flow-control slaves. |
AXI4 PCIe Interface (axiClk domain) |
|||
|
in |
|
AXI4 clock (typically same as |
|
in |
|
AXI4 reset. |
|
out |
|
PIP transmit-path ready status. |
|
in |
|
Inbound AXI4 write master — peer FPGA write arriving from PCIe. |
|
out |
|
Inbound write flow-control slave. |
|
out |
|
Outbound AXI4 write master — local FPGA write to peer over PCIe. |
|
in |
|
Outbound write flow-control slave. |
AxiGpuAsyncCore
Purpose
VHDL entity name: AxiPcieGpuAsyncCore (in file
protocol/gpuAsync/rtl/AxiPcieGpuAsyncCore.vhd); the PyRogue class drops
the Pcie infix to AxiGpuAsyncCore.
AxiPcieGpuAsyncCore implements a GPU-Direct async data path that bypasses
the CPU for FPGA-to-GPU memory transfers. It wraps
surf.AxiStreamDmaV2Write and surf.AxiStreamDmaV2Read engines,
dynamically demultiplexing inbound streams between the GPU path and the
standard CPU DMA path. An AXI-Lite register block controls path selection
and provides frame-level traffic monitoring.
Generics
Name |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Propagation delay for simulation. |
|
|
|
Power-on demux routing: |
|
|
|
AXI4 burst size (bytes) for DMA write and read engines. |
|
|
(required) |
AXI-Stream configuration for the application-facing stream ports. |
Ports
Name |
Dir |
Type |
Description |
|---|---|---|---|
AXI4-Lite Control (axilClk domain) |
|||
|
in |
|
AXI-Lite clock. |
|
in |
|
AXI-Lite reset. |
|
in |
|
AXI-Lite read master for GPU async control registers. |
|
out |
|
AXI-Lite read slave. |
|
in |
|
AXI-Lite write master. |
|
out |
|
AXI-Lite write slave. |
AXI-Stream Interface (axisClk domain) |
|||
|
in |
|
AXI-Stream clock. |
|
in |
|
AXI-Stream reset. |
|
in |
|
Inbound stream from application (GPU write source). |
|
out |
|
Inbound stream flow-control slave. |
|
out |
|
Outbound stream to application (GPU read destination). |
|
in |
|
Outbound stream flow-control slave. |
|
out |
|
CPU bypass stream — frames routed to CPU DMA path when demux
selects |
|
in |
|
Bypass stream flow-control slave. |
AXI4 PCIe Interface (axiClk domain) |
|||
|
in |
|
AXI4 clock (typically same as |
|
in |
|
AXI4 reset. |
|
out |
|
AXI4 write master to PCIe host memory (GPU write path). |
|
in |
|
AXI4 write slave. |
|
out |
|
AXI4 read master from PCIe host memory (GPU read path). |
|
in |
|
AXI4 read slave. |