I did a new test with the clock wizard, but this time I used the Vivado independent clock FIFO IP to cross clock domains.
With a 200 MHz clock 28 MHz was rock solid in both directions including clock domain crossing.
At 300 MHz everything continued to work reliably up to and including 38 MHz.
At 400 MHz I got timing errors when building.
I had previously tried the asynchronous FIFO component for crossing the clock domain, but that proved too slow, both for the SPI speed and the clock domain crossing.
That’s why I tried the Vivado FIFO IP, and with success.
I likewise encountered problems with the Alchitry async IO component. It’s doing Gray code math for every operation, which I think results in quite a bit of stackup time. The block rams get to do it all with baked-in hardware which is much faster.
Doing a quick search on the latency of the Xilinx FIFO I don’t think it’s that different. Maybe one cycle.
You should see latencies of 3-4 cycles with the Lucid one. Which is entirely due to the synchronizers (Xilinx does the same). The Lucid one uses 3 stages and you could use 2 for reduced latency at the cost of a minor ding to reliability (no synchronizers are perfectly reliable).
It would be interesting to measure them at some point.
The major reasons to use the Lucid version are flexibility and portability. Having to regenerate cores whenever you change the depth is kind of annoying.
In general, any Xilinx IP should be more efficient than a generic version of the same thing.
I agree on the portability and flexability, but I also think that the Xilinx IP does stuff with the timings and the buffers and the constraints (that are above my head at the moment) that have a substantial impact on performance. Introducing the Lucid asynchronous Fifo for the clock domain crossing made the SPI max speed drop to 16 MHz, replacing it with the Xilinx Fifo gave me 38 MHz. But it is not portable and the source is not available. It would be interesting to repeat this experiment with a Cyclone FPGA.