Clock Domains
A typical PGP target runs three principal clock domains. Understanding which signals live in which domain — and where the synchronisers sit — is the only way to read the RTL without getting lost.
The Three Domains
dmaClk (250 MHz nominal)
Source: <Board>Core.dmaClk output. Derived from the PCIe PHY’s
recovered reference clock and exposed back to the rest of the design.
Lives here:
The DMA engine internals (
AxiPcieDma+AxiStreamDmaV2).The PCIe AXI4 interface to the PHY.
The application side of every
dmaObMasters/dmaIbMasterspair.
The DMA bus width (DMA_AXIS_CONFIG_G) is per-board; the shared
PgpLaneRx / PgpLaneTx auto-resize between this width and the
fixed protocol-layer width (PGP4_AXIS_CONFIG_C).
axilClk (typically 156.25 MHz)
Source: userClk156 from the board (gigabit reference clock divider),
optionally divided down by a ClockManagerUltraScale MMCM in the
target top-level.
Lives here:
The AXI-Lite register tree from BAR0 onwards.
The application-region crossbar.
Most per-lane control + monitor logic.
The boundary between dmaClk and axilClk is handled inside
axi-pcie-core (AxiPcieReg uses surf.AxiLiteAsync).
pgpClk (per-lane, GT-recovered)
Source: surf.Pgp4GtyUs recovered RX clock per lane (PGP3/PGP4
provide an explicit recovered clock per direction). The exact
frequency is line-rate-dependent — for example PGP4 @ 6.25 Gbps yields
156.25 MHz, PGP4 @ 25 Gbps yields 390.625 MHz.
Lives here:
PgpLaneRxingress path (per-VC FIFOs, Mux to a single stream).PgpLaneTxfinal stages after the DMA crossing (SsiInsertSof,AxiStreamDeMuxto VCs, the GT TX side).The per-lane PGP framer state in
surf.Pgp4GtyUs.
Crossings
Every domain crossing in the application layer uses one of two
surf primitives:
Stream crossings —
surf.AxiStreamFifoV2withGEN_SYNC_FIFO_G => false. Used forpgpClk ↔ dmaClk(PGP RX, PGP TX) and for anydmaClk ↔ axilClkstream crossing (rare).Single-bit / vector crossings —
surf.Synchronizerandsurf.SynchronizerVector. Used fordmaBuffGrpPausepropagation intopgpClkand forpgpTxIn.locData(0)toggles.
No combinational paths cross domain boundaries. Every crossing is explicit in the RTL via one of those two primitives.
Why dmaClk is 250 MHz
The 250 MHz figure is set by DMA_CLK_FREQ_C in
axi-pcie-core’s per-board AxiPciePkg.vhd. It is the highest
clock frequency that meets timing on all 14 supported boards with the
standard AxiStreamDmaV2 instantiation. Raising it would require
re-timing the DMA engine and is not a per-target decision.
Implications for Application Design
If you add a custom DMA sink in the application, it must run on
dmaClk(or cross into your own clock with asurf.AxiStreamFifoV2).If you add custom AXI-Lite registers, expose them on
axilClk;AxiPcieRegalready handles the BAR0 ↔axilClkcrossing.Do not introduce a fourth global clock unless you have a documented reason. Adding domains is the most common source of timing failures in this codebase.