Clock Domains

A typical PGP target runs three principal clock domains. Understanding which signals live in which domain — and where the synchronisers sit — is the only way to read the RTL without getting lost.

The Three Domains

dmaClk (250 MHz nominal)

Source: <Board>Core.dmaClk output. Derived from the PCIe PHY’s recovered reference clock and exposed back to the rest of the design.

Lives here:

  • The DMA engine internals (AxiPcieDma + AxiStreamDmaV2).

  • The PCIe AXI4 interface to the PHY.

  • The application side of every dmaObMasters / dmaIbMasters pair.

The DMA bus width (DMA_AXIS_CONFIG_G) is per-board; the shared PgpLaneRx / PgpLaneTx auto-resize between this width and the fixed protocol-layer width (PGP4_AXIS_CONFIG_C).

axilClk (typically 156.25 MHz)

Source: userClk156 from the board (gigabit reference clock divider), optionally divided down by a ClockManagerUltraScale MMCM in the target top-level.

Lives here:

  • The AXI-Lite register tree from BAR0 onwards.

  • The application-region crossbar.

  • Most per-lane control + monitor logic.

The boundary between dmaClk and axilClk is handled inside axi-pcie-core (AxiPcieReg uses surf.AxiLiteAsync).

pgpClk (per-lane, GT-recovered)

Source: surf.Pgp4GtyUs recovered RX clock per lane (PGP3/PGP4 provide an explicit recovered clock per direction). The exact frequency is line-rate-dependent — for example PGP4 @ 6.25 Gbps yields 156.25 MHz, PGP4 @ 25 Gbps yields 390.625 MHz.

Lives here:

  • PgpLaneRx ingress path (per-VC FIFOs, Mux to a single stream).

  • PgpLaneTx final stages after the DMA crossing (SsiInsertSof, AxiStreamDeMux to VCs, the GT TX side).

  • The per-lane PGP framer state in surf.Pgp4GtyUs.

Crossings

Every domain crossing in the application layer uses one of two surf primitives:

  • Stream crossingssurf.AxiStreamFifoV2 with GEN_SYNC_FIFO_G => false. Used for pgpClk dmaClk (PGP RX, PGP TX) and for any dmaClk axilClk stream crossing (rare).

  • Single-bit / vector crossingssurf.Synchronizer and surf.SynchronizerVector. Used for dmaBuffGrpPause propagation into pgpClk and for pgpTxIn.locData(0) toggles.

No combinational paths cross domain boundaries. Every crossing is explicit in the RTL via one of those two primitives.

Why dmaClk is 250 MHz

The 250 MHz figure is set by DMA_CLK_FREQ_C in axi-pcie-core’s per-board AxiPciePkg.vhd. It is the highest clock frequency that meets timing on all 14 supported boards with the standard AxiStreamDmaV2 instantiation. Raising it would require re-timing the DMA engine and is not a per-target decision.

Implications for Application Design

  • If you add a custom DMA sink in the application, it must run on dmaClk (or cross into your own clock with a surf.AxiStreamFifoV2).

  • If you add custom AXI-Lite registers, expose them on axilClk; AxiPcieReg already handles the BAR0 ↔ axilClk crossing.

  • Do not introduce a fourth global clock unless you have a documented reason. Adding domains is the most common source of timing failures in this codebase.