Image Acquisition with Cypress FX-2 USB

As a student, I have had the opportunity to interface with a few camera boards. Typical microcontrollers do not have a dedicated interface for fast data storage, nor do they tend to have nearly enough memory to buffer a whole frame, without an external bus interface. As a result, without custom framegrabber electronics, an FPGA with external memory is almost the only option.

In a previous project, I did exactly this; the camera at the time had about 640x480 pixels, requiring about 300 kilobytes of memory. This worked just fine with the Spartan3 Starter Board from Digilent Inc, which features two 256k*16-bit SRAM chips. With RS232 serial, it takes anywhere from 15 seconds on up to transfer to the host PC, depending on the baud rate (30 seconds is about what you get at 115.2k baud). The FTDI-chip TTL232R cable datasheet says it should be able to work up to 1 megabit per second, but I could never really get it to work at that sort of speed without the transfer hanging up halfway through the image.
If you have enough spare pins for an FTDIchip UM245R, then baud rates don't matter, and you can grab a VGA image in less than one second (python's time module tells me about 0.6 seconds)!

This time, I wanted to grab a frame from an Aptina MT9P031 sensor (for testing and calibration purposes), which has a whopping 5 megabytes of data. I don't have an FPGA board handy which has enough memory to buffer this, plus enough free I/O pins for both the camera and a UM245R. Given that it takes half a minute to get a 300kilobyte image over a serial port at 115.2kbps, how long do you suppose it takes to grab 5 megabytes? Hint: a lot longer than I want to sit and wait.

Then, the golden solution occured to me. Some FPGA boards (unfortunately none that I currently have) come with a Cypress EZ-USB FX2 microcontroller, which is a convenient solution for both FPGA JTAG configuration, and data transfer. Theoretically, this controller should be able to pipe through continuous data at a nice rate in the 30-40MB/sec range. Furthermore, the Cypress chip has onboard FIFOs which can be connected directly to the parallel data bus on the pins, requiring little or no firmware intervention. While the UM245R is much simpler to use (Virtual COM Port driver, no firmware needed), the FX-2 is clearly more powerful. Could it be that this little chip is capable of piping an image to a host PC directly from the camera, with no FPGA, nowhere NEAR enough memory to buffer the whole thing? The short answer, yes! Though, not without a small amount of effort to get things figured out.

Thankfully, the FX2 is fairly well documented in a couple places on the net, which made this exercise a million times easier than it would have been with only manufacturer datasheets. I owe many thanks to Wolfgang Weiser, as his very simple example firmware served as a quick starting point. On the host PC side, I was able to grab a quick C source file from this UUUSB board webpage, and modify just a couple lines to serve my purpose.

A little bit of background on the FX2. There is endpoint 0, which is for control messages, and there are endpoints 2, 4, 6, and 8 for data transfers. In Wolfgang's example/benchmark firmware, endpoint 2 is used for data coming from the host PC side, and endpoint 6 is used for data coming from the peripheral side. This works just fine for me, so that's what I went with.

Now, there are a few possible ways to get data to/from the desired peripheral.
One would be to use the Cypress GPIF, a programmable state machine which handles the interface between the external world and the FIFOs inside the FX2. This makes sense when the FX2 is to be the bus master on this side.
In this case it's easier to consider the camera as the master, because the camera outputs a free-running pixel clock, with signals to mark the data on the port as valid or not, and during the "data valid" time, the data is coming continuously. The FX2 happens to have a mode where the external device acts as the master, and the FX2's FIFOs are the slave; aptly named, the Slave FIFO mode.

So the goal is to set up Endpoint 6 in slave FIFO mode, to automatically transfer whatever data it gets on the bus over to the host PC. First, the Aptina sensor's outputs are active high by default, so we can configure the FX2 FIFO enable signals to use this polarity (this is non-default). The camera also says data should be latched on the falling edge of pixel clock; we have the choice of either invert the clock on the camera side, or the FX2 side (I have chosen the FX2 side this time). According to the Aptina image sensor datasheet, the Line Valid signal is only high during the frame valid period, and not during horizontal or vertical blanking; this is much easier than a typical VGA hsync/vsync configuration, since we get a signal asserted specifically when the data on the port is actually valid. Then we can simply directly connect this signal to the FIFO's write enable.
Since we are only using Endpoint 6 for the data coming in from the camera, we can fix the FIFO output enable, read enable, and FIFOADR pins for this data flow.

Now we are left with the task of synchronizing with the image frame. The Frame Valid pin goes from low to high at the start of the frame, and stays high until all of the data in the frame has been transferred. So, my strategy is to simply poll this signal on a general purpose I/O pin, and when the host requests an image, we wait for the start of the next frame before allowing any data transfer back to the host, so we get a perfectly synchronized frame.

At this point we should enable Endpoint 6 AUTO IN, then disable it again when the frame is over. My first attempt was to simply enable/disable the auto in by setting the appropriate values in the EP6FIFOCFG register. For some reason, this doesn't seem to work correctly, and I don't get any data back with this.

Strategy two was to enable the SLCS (slave fifo chip select) function, and loop its dedicated pin back to an output pin to control it. When I tried this, I got no data back at all; possibly because it was causing Endpoint 2 (who receives the "grab frame" command, and is not AUTO OUT) to be completely disabled when SLCS was de-asserted? If this is not the case, then I'm not really sure why it doesn't work.

Then, a nice kludgey fix occurred to me. I am not using EP8 at all, so I don't care if it gets filled with junk data. So, my final solution was to un-fix the FIFOADR0 pin, and drive it with an output pin. Set it to '1', selecting EP8 when not wanting to grab data, then set it back to '0' to select EP6 when the host has made the request and the new frame starts. With this, I get an image!

The final block diagram is as follows:

Remember that I am configuring my FIFO controls as active-high; this is why SLOE and SLRD are tied low to disable.

And my setup looks like:

Here's an image that the FX2 took of itself (resized for uploading to blog):

Source Code:

FX2 Firmware
This needs the include files, which I got from Wolfgang's FX2 page I linked to above.

Simple host PC program
I redirect stdout to txt file to save list of pixel values. When the executable is saved as "spr" for example, I do
sudo ./spr > dat.txt
(For now I use sudo because I haven't bothered to manually add permissions for this USB device)

Finishing Thoughts:
With the simplest possible host software, the host is apparently not fast enough to do the full 30-odd MB per second that the FX2 should be able to push. This can cause data dropouts, resulting in corrupted images. Therefore, the pixel clock must be slowed down. With about 10MHz pixel clock or less, I can reliably grab frames in less than a second. Looking around the newsgroups and such, it looks like you need to do some multi-threading with multiple asynchronous read requests in order to achieve the maximum possible throughput. This evidently has to do with how the OS handles tasks; the process sleeps while idle during the blocking read, meaning that you can lose many milliseconds while the OS gives the CPU to other tasks. For me, the gain in transfer speed is not really going to be worth the effort of writing the more complicated software, so I'm quite happy with what I have.

With other camera sensors (such as OmniVision, Toshiba, Sony) which may have standard VGA/NTSC style hsync/vsync rather than Aptina's simple Line Valid signal, it may be possible to do this with some modification. One possibility would be a tiny 8-pin microcontroller, or a reasonably small CPLD which could run at the pixel clock frequency and use counter hardware to generate a DATA_VALID signal by counting how many clock cycles have passed since the hsync has been asserted.
Other solutions to grab frames from a camera using a simple MCU have been presented:
CMUcam by Carnegie Mellon University, and this very interesting computer vision project.

The former has a FIFO chip which can hold data from the camera before being read by the MCU, and the latter cannot grab all of the pixels as they come in, meaning that a full image transferred to the PC is actually bits of many sequential frames pieced together; Apparently during each frame, the timing is right for one pixel per line. The final purpose of these is not necessarily to grab full images, but rather to process a region of interest on a self-contained embedded platform, so they pose a very interesting low cost way to accomplish tasks that interface with camera hardware on a low level.
My solution here is probably nearly the cheapest and simplest way to quickly grab a full frame image with hardware accessible to a hobbyist.

A Quick estimate:
Cypress FX-2 EZ-USB microcontroller: On the order of $10 for just the chip in 56-SOP, and a couple dollars in surrounding parts. You need a crystal, some capacitors, and a USB connector. This time, I paid about 40 bucks for this breakout board from Strawberry Linux, which I ordered on a friday and got on sunday. Shipping in Japan is so nice and fast.

The Aptina image sensors (chip only) are available from Digikey for about 30 or less dollars per piece, so you could probably make a breakout board for them at a fairly cheap per-unit cost (in fact, somebody has done exactly this) if you can handle soldering the leadless packages. It has come to my attention that these sensors can also be interfaced directly with the OMAP processors such as the one on the BeagleBoard, which is a pretty nice and potentially very functional embedded solution as well, but still not as quick'n'dirty as the FX2!

Since this requires a USB host (generally a PC, though a single board computer could probably do it too), this isn't as interesting as a fully-embedded project as the cameras I have mentioned above, but it sure is a cheap way to evaluate an image sensor; look what Aptina's full dev kit costs! (Yes, it comes with much more fully functional software, but it's still a pretty penny)


Turning Back The Clock: Summer 2007

Unfortunately, due to lack of time, and a hardware issue that I haven't figured out, HAC-1 has been dead for a while. I could get bare GCC programs to run just fine, as demonstrated in the previous post, but for some reason, only with the chip's internal oscillator. Switching to the external oscillator fails, and without the external oscillator and PLL we won't have the speed to run the display controller as intended.
I have yet to test this on Havard's board. Bummer.

In the meantime, I thought I would dig up some details on an old CPLD VGA framebuffer experiment I did, at the request of a commenter on a hackaday post I had commented on a couple days ago.
I put this together during summer vacation 2007, mostly for fun, though I was hoping to make a standalone framebuffer device that I could pair with a CPU to make my own full homebrew computer with display capabilities. Unfortunately I never got that far. HAC-1 on this blog is another attempt at basically the same thing, though it is more of an all in one solution.

The Digilent XC2-XL board consists of a Xilinx 9572 CPLD (not used) and a 256-macrocell Cool Runner II. This is fairly large in terms of CPLDs. On DigiKey these cost at least 12 dollars apiece, and come in a 144 pin .5mm pitch QFP package, not trivial for a hobby design if you don't want to use the pre-made eval board.
The analog voltages for each color are generated by a Digilent VGA expansion board I had kicking around, originally intended for the NEXYS FPGA board. Each color gets its own 4-bit binary-weighted DAC made out of one resistor for each bit.
The remaining components are two CY7C1019 (128k x 8-bit Static RAM) chips, one PIC16F690 microcontroller to write a pattern into the framebuffer, and a 50 MHz oscillator, which can be divided down with a single flip-flop to 25 MHz which is not exactly the right pixel frequency for VGA, but close enough. The entire setup is shown here:

There are a number of ways to approach the memory architecture. One would be to run the memory arbitration at double the pixel clock speed, and read/write from the memory on alternate clock cycles. Since the PIC microcontroller with its internal 4MHz oscillator runs at 1 MIPS, it's much slower than the framebuffer. So, if we do this, we will see the image change slowly as the micro writes data in.
Another approach (the one I chose here) is page switching. The idea to have double the necessary memory capacity, and write data into one page while reading (displaying) from the other. In this way, we can have a smooth transition from frame to frame. Since I have used two external memory chips each with their own dedicated bus to the CPLD, I can simply physically switch which one is read or written to at any given time. To save IO pins, it would be better to use a single RAM chip with double the memory, and do alternate read/write operations at double clock speed, and do the page switching strictly by address offset.
Unfortunately this doesn't do much about the fact that the microcontroller I'm using here is slow, so the frame update rate is quite painfully slow either way. To make things worse, the microcontroller has to latch X address, Y address, and 8-bit data separately to complete a write operation.
But this is just for fun, and to show how difficult it can be to fully build something of this nature, depending what tricks you are willing to use. At any rate, here's a simplified block diagram of the system:

And an incremental pattern written into the framebuffer by the PIC. Note that the output is binned in the VHDL code to 160x120 pixels, by shifting the address counters to the right by two bits each.

Full VHDL source code I used in the CPLD is available here.

110 macrocells are used in this design.

Here are the design issues I faced, when trying to think of how to boil this down to one compact unit, which I never got around to finishing.

  • CPLD: Want a device which can hold the required logic, with enough pins to interface the memory. If I were to design a board for one of these things, the fine-pitch QFP that the CoolRunner II comes in is difficult, though not impossible.
  • Memory: Wanted SRAM (for simplicity) with enough capacity (can be expensive!) in a package I could solder by hand. TSOP-II packages are not so bad, so ISSI devices are actually feasible, and available in up to 512k x 16-bit configurations. SDRAM would be much less expensive per bit, but a bit more complicated to interface, since a state machine of some sort would be required. This might go beyond the limits of what the CoolRunner-II can do.
  • CPU: The PIC 16F series microcontroller simply won't cut it, it's too slow, and the smaller ones don't have much IO. The ideal thing would be a CPU with an external memory interface, so we could do something cool like treat the framebuffer as a 2D array variable, and reading/writing a pixel could then become a single memory access cycle.


LEDs are blinking!

Today I have been successful in getting the two LEDs blinking on the board, as the video will show.

A recap of what I did:

  • Install arm-elf-gcc toolchain, roughly using instructions from Madox.net. I used GCC-4.3.3 instead of 4.3.2. And, instead of using "/usr/local" as the prefix, I chose to use "/usr/local/arm" so that the executables don't get mixed in with other things in my OS. This caused a problem when building newlib because it depends on the gcc executable that should be in the path. Basically, "sudo" commands don't take on the path set in ~/.bashrc so when using "sudo make all install" to install newlib, the arm-elf-gcc is not found. This is solved by changing to root using "su" and manually adding /usr/local/arm/bin to the path. Not the best way to do it, but it works.
  • Install Flash Magic in WINE. This is easy. Download the installer EXE from flashmagictool.com and then run it in Wine and let it install wherever it wants.
  • Add a symbolic link in ~/.wine/dosdevices/ as follows: link com1 to /dev/ttyUSB0 to use USB serial port. With this, Flash Magic successfully uses the serial port to communicate with the LPC's bootloader.
  • Download sample code (by Martin Thomas) for similar processor, available here. This is for LPC23XX but it's very similar to LPC24XX. To be safe, I found the LPC24xx.h header file in a sample application zip folder provided by NXP, and used that instead of the 23xx.h included in the project. I used this sample code to create a template for HAC-1. Martin's webpage says this is set up for WinARM, but the makefile works just fine as-is on a Linux command line as well.
  • For now, I have removed much of what was in Martin's main.c file, such as the timer and UART functions. Martin had some LED blink code in there, but it was set for a particular commercial board where the LED is on a different pin, so I moved it to the pin that one of our LEDs is on and duplicated it for the second LED.
  • In the Common/src folder of the template, the setup routine in target.c is supposed to enable the PLL and start up the external 12MHz oscillator. For some reason the program doesn't seem to work in this case, so for now I have commented out the call to the function ConfigurePLL() and left the CPU running on the internal oscillator. Hopefully we can get this kink ironed out before moving on to the display controller; The 4MHz internal RC oscillator will not be sufficient for that.

It's alive!

It's been a while since we have made a post; not much has been going on until the last couple few weeks, when we finally got our boards fabbed and all of the parts.

Here is a mostly assembled HAC-1 board:

What has happened so far:

  • Soldering of 208-QFP CPU chip, TSOP DRAM chip on backside. For IC mounting, a soldering iron temperature of 300C was used. One of three attempts at the CPU mounting was a failure due to pins getting bent when appying too much heat trying to wick away excess solder. Next time, 280C might work better.
  • Add 5V and 3.3V regulators, connector for FTDI Chip TTL232R cable.
  • 1st attempt at communicating with CPU. At the time, we hadn't ordered our surface mount resistors and caps, so we wanted to try verifying CPU operation without the passives. We tried connecting using the Flash Magic tool (unfortunately Windows-only, but it looks like it should work with little trouble in WINE under Linux), and sent an "Erase Flash" command. The CPU did not respond.

Here's where a little bit of sloppiness on our part comes in (lessons learned.. for the nth time). Our first idea was to check the oscillator so we probed that with the scope, and were discouraged when we saw no signal on the oscillator circuit. However, according to the LPC 2478 User Manual, the chip contains a 4MHZ internal RC oscillator which is automatically used on startup. The external oscillator circuit does not start up until it is told to do so in software.

After obtaining and mounting the passives (all 0805 size, soldered at 280C), the CPU did indeed respond to the Flash Magic erase command! My suspicion is now that the lack of the pull-up resistor on the reset line may have been the cause of failure before. My assumption was that unconnected lines should tend to float "high," but I guess this is not necessarily the case.

The plan now:

  • UPDATED: Precompiled binaries from gnuarm.com did not seem to work. Will have to build from source. Not too hard to do, though.
  • Write and test simple standalone CPU code: LED blink, serial port, external memory test, LCD driver.
  • Add SD card interface, figure out how to configure U-boot or some other bootloader for ucLinux. Linux image should be stored on SD card.


PCB development

This week we sat down and did some work on the board. Had a tough time deciding if we should use 4 or 2 layers, but due to costs we are probably going to use a 2 layer board and hope that will be fine.

We redid the processor-memory layout after finding a orientation that enabled us to trace the connection without to much trouble, and moved one of the switches to the far end of the board.

Here is the bottom layer
and the top layer

Before we order up the board we will probably probably make space for an breakout board for an Ethernet connection. At the moment we are concidering the following part:

So; We should be able to order the board during next week :)


PCB current status

Here is a picture showing the current status of the PCB. It is kind of messy, but as long as everything is connected we should be fine :)

Anyways, still got some work to do on it and as Steve said, there are plenty of room for additional components....


Getting closer to HAC-1

The main parts (CPU and memory chips) have been ordered, and we did some work on the PCB design today... lots of interesting (maybe?) updates to come, and hopefully in a few weeks we might have this thing up and running!

So far:

No JTAG connector since we don't have any JTAG hardware to begin with (maybe add one if really necessary).
Programming the CPU's internal flash may be done with the in-system programming option via UART. When "in-system programming enable" pin is asserted at reset, the hard-coded bootloader takes over the CPU for ISP.
Since max232 chips and related circuitry are a big waste of board space, we're going with 6-pin headers for FTDI chip TTL232R 3V3 cables for this.

VLSI MP3 chip for audio.
PS/2 port for keyboard input.
12-bit color video output from LCD controller.
SD Card connector added.

There's lots of room left on the board, so we may consider an ethernet interface, and we will definitely add some general purpose IO.

This gentleman down in Nagoya has very recently built something similar to what we are doing, and was kind enough to share schematics and PCB layout.

I was hoping for some insight on how to wire up the memory with the very inconvenient pin assignments, but Havard seems to have done OK with that anyway.