Alchitry V2 Planning

gc74 · November 12, 2024, 6:18pm

Ok so if I really need such high-res or fps I may look into getting or making an active HDMI or DP board
It’s not in my plans for a close future but maybe one day I may need to catch some 1080p 280 fps, not even sure there are chips for this that would fit in a tinkerer’s budget

alchitry · November 13, 2024, 7:10pm

I opened up our new shop for pre-orders.

If you decide to order anything let me know how the process all goes and if you run into any issues.

Jflanagan · November 13, 2024, 8:34pm

The Pt’s list price is only $200? You weren’t kidding about being pleasantly surprised. The preorder discount is nice on top of that of course, but that’s pretty fantastic.

gc74 · November 14, 2024, 12:37am

I just saw it, sounds pretty insane
Did you got a mega-discount from Xilinx?
Or did the price dropped because the shortage ended?
Or is it because you took back the manufacturing?

Anyway, I’ll prepare my order!

Also price in euros reflects the price in dollars, there is no magic +30% out of nowhere which is awesome!

gc74 · November 14, 2024, 12:55am

I noticed that using the Paypal’s express checkout button looks like to bypass the shipping fees

Aside of this I didn’t had any issue. Order placed!

Jflanagan · November 14, 2024, 4:22am

I did have a question on the price of the Ft v2 board. All the other v2 boards are cheaper than the equivalent SparkFun versions except that one, which is pricier than the SparkFun version ($39.99 compared to $32.50). Is there a reason why it’s relatively expensive?

alchitry · November 14, 2024, 3:36pm

I think if you try to use PayPal express checkout it’ll take the address you enter there and come back to the store to have you select your shipping rate after.

According to Shopify…

The customer is taken to PayPal, where PayPal collects their shipping and billing addresses. After that, the shopper returns to Shopify where they pick a shipping rate and enter discount codes (if applicable). The price is updated accordingly and, on that same page, the customer gets to finalize their transaction.

Let me know if that doesn’t seem to be the case. PayPal won’t let me test it as the shop’s PayPal account is my PayPal account and it complains.

As for the Pt pricing… The Au+ was priced high but SparkFun dictated that. I don’t know if they just paid DigiKey’s standard price or what (which is quite high for all Xilinx parts).

I’m also able to source the FGG484 version for substantially less than the FTG256. The -2 speed grade also seems to consistently be available a bit cheaper than the -1. I believe this is because these parts are used much more so the production volume is much higher. This is why the Au+ won’t be coming back. It would be more expensive than the Pt.

For the Ft/Ft+, I haven’t been able to find a good source for the FT600Q/FT601Q. They’re expensive. I reached out to my contacts at SparkFun to see if they had a good source. If I’m able to source them for a better price the price will be updated (and any orders with them refunded).

That’s good to hear. This is feature of Shopify. If anyone ever finds it has a bad rate, I can disable it and force USD.

gc74 · November 14, 2024, 5:40pm

Ok I didn’t knew, it’s a bit weird because Paypal behave like it’s the final price, so the price shown by paypal is different than the one on the shop, which could feel misleading.
I didn’t used paypal for my order so I can’t relate the whole process.

gc74 · November 14, 2024, 5:43pm

So more pins and more speed is cheaper? Nice!
Maybe the FGG484 package being bigger makes it easier to manufacture for Xilinx too?
I wonder if it applies to other packages? Maybe we could get even more pins later

Jflanagan · November 14, 2024, 5:48pm

Thanks for the color! I figured it was either the FT600 chips being hard to come by. The other possibility I was wondering was if SparkFun was actively discounting the Ft because they had stock they wanted to clear out.

It’s possible that the Pt might be good enough by itself, if your 40MB/sec thing pans out on the main usb connection. My project’s initial use case needs about 4MB/sec so I needed something fast enough to handle that.

I’m sure I could find uses for more, but maybe I don’t absolutely need it? If I wind up needing more fabric, or more GPIOs, being able to skip the Ft would help reduce the price impact of upgrading. And done properly, the Ft could even become an optional add on to the project, with the internal logic being set up to direct traffic to the other hardware if available.

gc74 · November 14, 2024, 5:55pm

The choice was made to use the slower solution (20 MB/s) to keep Vivado compatibility, but it’s still more than enough if you only need 4 MB/s
So looks like you don’t need an Ft for your project if you use a Pt.

Jflanagan · November 14, 2024, 6:04pm

Indeed. Or another thing I was contemplating was to place and route the FT600Q on my board myself, although that’s quite a bit above my current skill level. I would still be facing the sourcing problems you do of course. It would be cheaper and more compact, assuming I could get it to work at all, which is far from assured.

That’s all a ways off. My prototype with the v1 Au and Ft is going to carry me quite a ways before I need to figure out how to migrate to v2 and figure out these kinds of hardware decisions.

gc74 · November 14, 2024, 6:25pm

AFAIK routing the USB side need to take care of impedance and length matching of the differential pairs.
The FT600 to FPGA side should be less ticklish as long as the lengths stays close to eachothers.

gc74 · November 15, 2024, 1:54pm

I just thought about something related to your project :
Instead of using USB to connect the FPGA to the RPi, did you tried to use the GPIO?
Using direct memory read/write, you can run it at frequencies ranging from 20 to 130 MHz depending on the RPi model.

There are 17 free pins (26 if you include the dual-function ones), so you could do 16 or 24-bits transfers if needed, which means at least 40 MB/s of bandwidth, and up to 390 MB/s on RPi4
Plus you could add the RPi to the sandwich directly

Also, if you implement the GPU part on the FPGA, you could use the Hd module instead, but maybe you use the RPi for something else in your project.

I guess the ultimate solution would be an Alchitry Zn with a ZINQ
It bundles and FPGA and an ARM CPU interconnected with an AXI bus in the same chip so you can run linux (or bare metal custom app) to combine CPU and FPGA workloads.

Jflanagan · November 15, 2024, 4:01pm

That’s an interesting idea! Last year, before I tried the FPGA, I tried using an iMX 8M Plus, which has 4x 64 bit ARM cores (roughly equivalent to a Pi4), and 1x 32 bit M7 ARM core at 800MHz, on the same die, with the idea being to use the M7 to do the GPIO reading, and shared memory to get data over to the Big Cores. This wound up failing though. The GPIOs were hooked up directly to the transceivers on the Apple II bus (i.e. no FPGA at all). This meant the M7 had an interrupt on every Apple II bus cycle (so basically every microsecond), with only about 70ns of permissible jitter before irretrievable data loss due to missing the bus timing window. But accessing the SDRAM shared memory over the AXI bus had much, much greater jitter than that! It was nonworkable.

This is opposed to my original design, which actually worked. That used a Pico (for PIO to do the precise Apple II bus control) and a Teensy 4.1 (to have enough GPIOs to manage the full Apple II bus, along with 100MHz Ethernet to push it out to a full Pi4/Pi5). The Teensy had a CPU equivalent to the M7, but it didn’t have anything else competing for the memory so it was easily able to meet the jitter requirement. But it was only a microcontroller rather than an FPGA so while it could do some interesting things even on top of just bus event capture, it was still limiting.

However, if it’s an FPGA’s GPIOs connected to the Pi instead of the raw Apple II bus, the FPGA could be providing plenty of buffering flexibility to make this work, rather than needing to service a hard-realtime 1MHz interrupt. So the idea is not impossible! I don’t think it’s the right way to go though.

If anything, I would lean the other direction, to stack an Hd card (or mount my own HDMI interface similar to how I am considering mounting my own FT600Q), and use the muscle available on the Pt to do the HDMI rendering instead of having the Pi5 do it. But the Pi5’s rendering capabilities are pretty awesome, my project partner is doing some pretty cool things with it! It would be very hard to compete with what the Pi5 can do with HDMI, using the FPGA directly. The main argument of eliminating the Pi5 is a cost argument, and it’s not a strong argument. It trades away display quality, which is meant to be one of our main benefits.

gc74 · November 15, 2024, 5:13pm

Yes, that’s the idea, the FPGA would handle the Apple bus on one side and the Pi bus on the other side, with the hability to do anything you want in beetween like registers, cache, buffers, and you could even make use of the board’s RAM.
So you can freely decide how the FPGA to Pi bus behave and the timings, and probably avoid a lot of latency introduced by the multiple conversions (FPGA => parallel bus => FT600 (parallel to USB conversion) => USB => FTDI driver => your app)
I don’t know if latency in a concern for your project thought.

Maybe you can even use a SPI bus beetween the FPGA and the Pi?

If the Zn ever exists, thanks to the embedded CPU power, you may be able to move the current Pi’s job to the internal CPU, ending with a single-board solution

Jflanagan · November 15, 2024, 5:31pm

Yeah, i could see where you were going with that. I have been thinking about this problem a lot for the last year and a half, and I’ve approached it from a couple of different angles. My very, very first implementation was just a Pico with an integrated Wiznet chip, which is supposedly 100Mbit ethernet, but since it communicates with the Pico over 1-bit SPI, in practice I was not able to coax it past about 25Mbit throughput. I did get bus capture working within that limit by doing a simple compression technique on my data, but there was no headroom for anything more.

Latency is a complicated topic. The latency over Ethernet or the FT600 is fine for things like the HDMI display and bus capture in general because there’s no feedback. It’s entirely fine to batch up 250 events or so to transmit them as a single UDP packet or USB block.

Anything which does have latency requirements basically needs to get done locally on the card. That was where I was hitting limits with the Teensy. It had a couple hundred Teensy clock cycles to do stuff with for each Apple II bus cycle, which goes a surprisingly long way but was still very limiting.

FPGA ought to be able to do basically anything that any legacy Apple II card could do, including perform as an accelerator card! That’s one of the goals, to implement a 65C02 core on the FPGA and see how fast we can get it to practically go. One of the real wins on this is that we would also have access to the accelerated memory. So if we were also acting as a solid state hard drive card, we could load from the card into the local accelerated memory directly instead of going over the 1MHz bus. Faster-than-lightspeed DMA!

alchitry · November 15, 2024, 6:32pm

For what it’s worth, the routing on for the FT600Q isn’t that crazy. Yes, you should control for USB impedance but if you’re not trying to get USB certified and the overall run is short, it doesn’t likely matter.

At the 100MHz bus rate, you likely have around 2.5ns (1/4 period) of delay tolerance. That’s about 38 cm of PCB trace length difference, quite forgiving. This of course assumes you change data on the falling edge and clock it in on the rising (what the FT600Q datasheet shows). I don’t think my Ft example did this and deserves a revisit.

gc74 · November 15, 2024, 8:51pm

Looks like a team of 3 people managed to do it and make it run at 100 MHz on a Spartan 6 (entry-grade previous generation of Xilinx FPGA, the Au and Pt have Artix 7 wich is one more grade and one generation newer).

And it’s fully open-sourced!

I have no idea how hard it would be to bring it to Artix 7 thought…

Jflanagan · November 15, 2024, 9:02pm

Oh, it’s a thing that has been done many times! That’s not the interesting part. The interesting part is in making the soft-CPU work as part of the entire system, including the peripheral cards we want to emulate. The end product is meant to be a single card that provides CPU acceleration, expanded memory. solid state hard drive, HDMI-out video, sound, mouse, real time clock, Ethernet, who knows what else.

All those things, somebody’s done before. But nobody’s done them all at the same time.