5/31/2011

Image Acquisition with Cypress FX-2 USB

Motivation:
As a student, I have had the opportunity to interface with a few camera boards. Typical microcontrollers do not have a dedicated interface for fast data storage, nor do they tend to have nearly enough memory to buffer a whole frame, without an external bus interface. As a result, without custom framegrabber electronics, an FPGA with external memory is almost the only option.

In a previous project, I did exactly this; the camera at the time had about 640x480 pixels, requiring about 300 kilobytes of memory. This worked just fine with the Spartan3 Starter Board from Digilent Inc, which features two 256k*16-bit SRAM chips. With RS232 serial, it takes anywhere from 15 seconds on up to transfer to the host PC, depending on the baud rate (30 seconds is about what you get at 115.2k baud). The FTDI-chip TTL232R cable datasheet says it should be able to work up to 1 megabit per second, but I could never really get it to work at that sort of speed without the transfer hanging up halfway through the image.
If you have enough spare pins for an FTDIchip UM245R, then baud rates don't matter, and you can grab a VGA image in less than one second (python's time module tells me about 0.6 seconds)!

This time, I wanted to grab a frame from an Aptina MT9P031 sensor (for testing and calibration purposes), which has a whopping 5 megabytes of data. I don't have an FPGA board handy which has enough memory to buffer this, plus enough free I/O pins for both the camera and a UM245R. Given that it takes half a minute to get a 300kilobyte image over a serial port at 115.2kbps, how long do you suppose it takes to grab 5 megabytes? Hint: a lot longer than I want to sit and wait.

Then, the golden solution occured to me. Some FPGA boards (unfortunately none that I currently have) come with a Cypress EZ-USB FX2 microcontroller, which is a convenient solution for both FPGA JTAG configuration, and data transfer. Theoretically, this controller should be able to pipe through continuous data at a nice rate in the 30-40MB/sec range. Furthermore, the Cypress chip has onboard FIFOs which can be connected directly to the parallel data bus on the pins, requiring little or no firmware intervention. While the UM245R is much simpler to use (Virtual COM Port driver, no firmware needed), the FX-2 is clearly more powerful. Could it be that this little chip is capable of piping an image to a host PC directly from the camera, with no FPGA, nowhere NEAR enough memory to buffer the whole thing? The short answer, yes! Though, not without a small amount of effort to get things figured out.

Details:
Thankfully, the FX2 is fairly well documented in a couple places on the net, which made this exercise a million times easier than it would have been with only manufacturer datasheets. I owe many thanks to Wolfgang Weiser, as his very simple example firmware served as a quick starting point. On the host PC side, I was able to grab a quick C source file from this UUUSB board webpage, and modify just a couple lines to serve my purpose.

A little bit of background on the FX2. There is endpoint 0, which is for control messages, and there are endpoints 2, 4, 6, and 8 for data transfers. In Wolfgang's example/benchmark firmware, endpoint 2 is used for data coming from the host PC side, and endpoint 6 is used for data coming from the peripheral side. This works just fine for me, so that's what I went with.

Now, there are a few possible ways to get data to/from the desired peripheral.
One would be to use the Cypress GPIF, a programmable state machine which handles the interface between the external world and the FIFOs inside the FX2. This makes sense when the FX2 is to be the bus master on this side.
In this case it's easier to consider the camera as the master, because the camera outputs a free-running pixel clock, with signals to mark the data on the port as valid or not, and during the "data valid" time, the data is coming continuously. The FX2 happens to have a mode where the external device acts as the master, and the FX2's FIFOs are the slave; aptly named, the Slave FIFO mode.

So the goal is to set up Endpoint 6 in slave FIFO mode, to automatically transfer whatever data it gets on the bus over to the host PC. First, the Aptina sensor's outputs are active high by default, so we can configure the FX2 FIFO enable signals to use this polarity (this is non-default). The camera also says data should be latched on the falling edge of pixel clock; we have the choice of either invert the clock on the camera side, or the FX2 side (I have chosen the FX2 side this time). According to the Aptina image sensor datasheet, the Line Valid signal is only high during the frame valid period, and not during horizontal or vertical blanking; this is much easier than a typical VGA hsync/vsync configuration, since we get a signal asserted specifically when the data on the port is actually valid. Then we can simply directly connect this signal to the FIFO's write enable.
Since we are only using Endpoint 6 for the data coming in from the camera, we can fix the FIFO output enable, read enable, and FIFOADR pins for this data flow.

Now we are left with the task of synchronizing with the image frame. The Frame Valid pin goes from low to high at the start of the frame, and stays high until all of the data in the frame has been transferred. So, my strategy is to simply poll this signal on a general purpose I/O pin, and when the host requests an image, we wait for the start of the next frame before allowing any data transfer back to the host, so we get a perfectly synchronized frame.

At this point we should enable Endpoint 6 AUTO IN, then disable it again when the frame is over. My first attempt was to simply enable/disable the auto in by setting the appropriate values in the EP6FIFOCFG register. For some reason, this doesn't seem to work correctly, and I don't get any data back with this.

Strategy two was to enable the SLCS (slave fifo chip select) function, and loop its dedicated pin back to an output pin to control it. When I tried this, I got no data back at all; possibly because it was causing Endpoint 2 (who receives the "grab frame" command, and is not AUTO OUT) to be completely disabled when SLCS was de-asserted? If this is not the case, then I'm not really sure why it doesn't work.

Then, a nice kludgey fix occurred to me. I am not using EP8 at all, so I don't care if it gets filled with junk data. So, my final solution was to un-fix the FIFOADR0 pin, and drive it with an output pin. Set it to '1', selecting EP8 when not wanting to grab data, then set it back to '0' to select EP6 when the host has made the request and the new frame starts. With this, I get an image!

Results:
The final block diagram is as follows:

Remember that I am configuring my FIFO controls as active-high; this is why SLOE and SLRD are tied low to disable.

And my setup looks like:


Here's an image that the FX2 took of itself (resized for uploading to blog):


Source Code:

FX2 Firmware
This needs the include files, which I got from Wolfgang's FX2 page I linked to above.

Simple host PC program
I redirect stdout to txt file to save list of pixel values. When the executable is saved as "spr" for example, I do
sudo ./spr > dat.txt
(For now I use sudo because I haven't bothered to manually add permissions for this USB device)


Finishing Thoughts:
With the simplest possible host software, the host is apparently not fast enough to do the full 30-odd MB per second that the FX2 should be able to push. This can cause data dropouts, resulting in corrupted images. Therefore, the pixel clock must be slowed down. With about 10MHz pixel clock or less, I can reliably grab frames in less than a second. Looking around the newsgroups and such, it looks like you need to do some multi-threading with multiple asynchronous read requests in order to achieve the maximum possible throughput. This evidently has to do with how the OS handles tasks; the process sleeps while idle during the blocking read, meaning that you can lose many milliseconds while the OS gives the CPU to other tasks. For me, the gain in transfer speed is not really going to be worth the effort of writing the more complicated software, so I'm quite happy with what I have.

With other camera sensors (such as OmniVision, Toshiba, Sony) which may have standard VGA/NTSC style hsync/vsync rather than Aptina's simple Line Valid signal, it may be possible to do this with some modification. One possibility would be a tiny 8-pin microcontroller, or a reasonably small CPLD which could run at the pixel clock frequency and use counter hardware to generate a DATA_VALID signal by counting how many clock cycles have passed since the hsync has been asserted.
Other solutions to grab frames from a camera using a simple MCU have been presented:
CMUcam by Carnegie Mellon University, and this very interesting computer vision project.

The former has a FIFO chip which can hold data from the camera before being read by the MCU, and the latter cannot grab all of the pixels as they come in, meaning that a full image transferred to the PC is actually bits of many sequential frames pieced together; Apparently during each frame, the timing is right for one pixel per line. The final purpose of these is not necessarily to grab full images, but rather to process a region of interest on a self-contained embedded platform, so they pose a very interesting low cost way to accomplish tasks that interface with camera hardware on a low level.
My solution here is probably nearly the cheapest and simplest way to quickly grab a full frame image with hardware accessible to a hobbyist.

A Quick estimate:
Cypress FX-2 EZ-USB microcontroller: On the order of $10 for just the chip in 56-SOP, and a couple dollars in surrounding parts. You need a crystal, some capacitors, and a USB connector. This time, I paid about 40 bucks for this breakout board from Strawberry Linux, which I ordered on a friday and got on sunday. Shipping in Japan is so nice and fast.

The Aptina image sensors (chip only) are available from Digikey for about 30 or less dollars per piece, so you could probably make a breakout board for them at a fairly cheap per-unit cost (in fact, somebody has done exactly this) if you can handle soldering the leadless packages. It has come to my attention that these sensors can also be interfaced directly with the OMAP processors such as the one on the BeagleBoard, which is a pretty nice and potentially very functional embedded solution as well, but still not as quick'n'dirty as the FX2!

Since this requires a USB host (generally a PC, though a single board computer could probably do it too), this isn't as interesting as a fully-embedded project as the cameras I have mentioned above, but it sure is a cheap way to evaluate an image sensor; look what Aptina's full dev kit costs! (Yes, it comes with much more fully functional software, but it's still a pretty penny)

4 comments:

  1. This is interesting! I enjoyed reading your great post.Thanks for the valuable information and insights you have shared here. Laptop Repair Service

    ReplyDelete
  2. Can you please provide me the GPIF waveforms details, because I am also working on the same project, but not with FX2, I am working with FX2L and Aptina MT9M114 Image sensor
    My mail ID: mars7285@gmail.com

    Regards,
    Mars

    ReplyDelete
  3. Can you share the code of FX2LP

    ReplyDelete