Foveon vs Bayer: theory and practice

My Sigma SD1 has a 14 megapixel Foveon sensor. Since is it rare to find such a sensor on a consumer camera, I am much interested in its performance and will report some of my findings here.

There seem to be a lot of misunderstandings about the Foveon. That's because it's the only digital camera sensor that does not produce a blurry, interpolated image. Unfortunately, some people don't seem to see the difference. Producing lots of blurry pixels rather than fewer, sharper pixels may sound like a cheap marketing ploy, but in the case of Bayer, this marketing ploy seems to work pretty well. The trouble is that a Bayer pixel can hardly be called a pixel. The Foveon advocates try to compensate for this by stating the "Bayer equivalent" number of pixels, which is in some respects somewhat optimistic, and in others, quite accurate. What happens next usually is a hot and not always well informed debate about lines per millimeter and such. It all reminds me of the recent microprocessor wars, where one type of microprocessor ran at fewer gigaherz but was actually faster. In this text I will try to clarify the issues by means of a both practical and theoretical approach.

It is often claimed that a 14 megapixel Foveon has about the same "resolution" as a 8-10 megapixel Bayer sensor. I will try to explain why this is a rather pessimistic estimate, and why it is not quite fair to make this comparison without taking into account the relative advantages and disadvantages of both sensing techniques. Let us first start with the most basic observation: 14 megapixel simply means that a sensor has 14 million photosites, which are the sites on the sensor that measure light of a particular colour (usually, red, green, and blue). This figure has the same meaning for both Foveon and Bayer sensors. A 14 megapixel Bayer and a 14 megapixel Foveon measure the same amount of information, and, looking at it this way, have the same resolution. The difference lies only in the way the photosites are arranged, which results in different trade-offs as regards the final image. There are also some technical details which give either technique particular advantages or disadvantages, which I will try to explain.

A more in-depth comparison between the sensors requires us to define what aspects of resolution we want to compare. There are different kinds of resolution relevant to a photographer:

  1. finest level of detail that can be resolved (spatial resolution)
  2. colour accuracy
  3. dynamic range (basically, the difference in luminance between the brightest and darkest value)

Detail level and colour accuracy

Basically, the Foveon trades off: (1) level of detail for colour accuracy, and (2) level of detail of greyish images for level of detail of colourful images. Let's try to see how this works out.

The "finest level of detail" type resolution can be expressed numerically as the number of (black-white or similarly contrasty) line pairs that can be fit on a photograph. Theoretically, a 100 pixel wide and high image can record only up to 50 line pairs horizontally and vertically (namely, a regular pattern of alternating white and black pixels). This means that a camera producing a 100x100 image can never produce more than 50 line pairs in either horizontal or vertical direction. In practice, cameras produce less, and it's this difference what this story is all about. A camera producing an image with more pixels than another may actually have less effective line pairs. To describe this relationship more accurately, we may speak of "line pairs per pixel pair", which can be computed by dividing the maximum number of line pairs of a camera image by half its width or height. The theoretical maximum line pairs per pixel pair is 1, which is true only for a perfectly sharp non-interpolated image.

For Bayer sensors, the final image that is obtained is interpolated. That is, at each pixel only one colour channel (red, blue, or green) is measured, and the other colour channels are interpolated from neighbouring measurements. So, for each pixel, one third of the information is measured, and the other two thirds are reconstructed. For Foveon sensors, no interpolation is done, and all colour channels of a pixel are measured directly. A Foveon pixel contains three times the information of a Bayer pixel, which means both higher spatial and colour resolution. The output image size of a Bayer sensor with the same number of photosites is therefore three times that of a Foveon sensor. If we translate this in terms of width and height, remember that area=width*height, so three times the area means sqrt(3) = 1.73 (square root of 3) times the width and height.

The larger number of pixels output by a Bayer camera with the same number of photosites makes it theoretically possible to have more total line pairs in the final image. However, the line pairs per pixel pair is effectively less than 1. We can compare the two if we can determine the line pairs per pixel pair of a typical Bayer sensor (see the figure below). The sensor has two green photosites for every red and blue photosite, and it is this non-even distribution of colours that helps the Bayer have a line pair advantage over the Foveon.
Typical Bayer grid

The excess of green pixels can be used to measure fine detail up to a higher accuracy, at least, when we assume there will be some green in our picture. Theoretically, we might also use the red and blue pixels to reconstruct even finer detail, but only if we make strong and increasingly unrealistic assumptions. An assumption under which this will work, for example, is that our scene contains greys only, and no colours, or the scene is all in a single, non-saturated colour. In real life such assumptions can not be made. In practice we see that, basically, cameras with this type of Bayer sensor use the green photosites to construct fine detail, and the red and blue ones to construct colour. This can be seen in the many camera images that can be found everywhere. My own experience is that Bayer cameras can at best produce one line pair per three pixels, and often less. See below for some examples.
Bayer output (Canon 300D) Note the line pairs are clear at one line pair per four pixels, and disappear gradually when going towards one line pair per three pixels and beyond. Foveon output (Sigma DP1). The camera has no trouble producing one line pair per pixel pair.
Bayer output (Canon 300D)Foveon output (Sigma DP1)
Bayer output (Canon 300D)Foveon output (Sigma DP1)

This amounts to about 0.667 line pairs per pixel pair for the Bayer, as compared to 1 line pair per pixel pair for the Foveon. If we look at it theoretically, we can say that a 14 megapixel Bayer has 7 megapixels of green photosites, which is an upper bound for the detail resolution. The effective resolution in terms of line pairs per pixel pairs is therefore 1/sqrt(2) = 0.707, which is pretty close to our measured value. It is in fact not likely that the theoretical value of 0.707 will be attained by real-life Bayer sensors, because of the troubles involved in interpolation, which usually involves (both optical and digital) smoothing. More precise measurements (see for example the lines/mm table featured here) show it's indeed just below 0.707. Now, we can compare the effective line pair resolution of the two sensors. In terms of green photosites, the Foveon has 2/3 the green photosites that the Bayer has per megapixel, and therefore the line pair resolution of a 14 megapixel Foveon is that of a 9.33 megapixel Bayer sensor, in the situation that is theoretically the most optimal for the Bayer. Our measurements show that it's slightly less in practice. This shows that the often cited figure of "8-10 megapixels" does the Bayer too much favour. "9.5 or more" would be closer to the mark.

The Foveon can also perform better than a Bayer in terms of lines per pixel in case colour is involved. To understand this requires us to delve deeper into the details of interpolation. I've been measuring differently coloured line pairs, and was surprised that the Bayer interpolation is able to reproduce near-optimal sharpness for more colour combinations that I previously thought. In fact, it appears that it can also attain the approx. 2/3 lines per pixel for contrasts not involving the green sensors (never more, however). How this is done is not clear to me; it necessarily involves making assumptions about colour changes. From an "information theoretic" viewpoint, I would expect three possible outcomes for coloured line pairs:

  1. high resolution (2/3 lines per pixel), colours accurate. Unless complex heuristics are made, I expect this to happen only if the colour contrast involves the green component only (i.e. black-green or red-yellow).
  2. high resolution (2/3 lines per pixel), colours not accurate because chromatic component cannot be measured accurately. I expect this to happen when contrast involves two different colours (neither one is either black or white) which involve toggling more than just the green component. Likely the colour that is given to the interpolated image is the averaged colour.
  3. lower resolution (1/2 lines per pixel or less). I expect this to happen when only red and/or blue are involved in the contrast.
I've tried some different colour contrasts, and have seen all three situations. Situation (1) obviously occurs in black-white contrasts, which are the most commonly tested. It would also work for black-green or white-magenta contrasts, which involves toggling the green component only. Surprisingly, I found it also works for white-green (for which only red and blue toggle). It doesn't seem to work for black-magenta, though. Weird.

Three contrasts which produce both good colour and resolution. The green-white contrast contradicts the theory that green photosites must be involved in the contrast.

The most obvious contrast for which situation (3) occurs is fully saturated red or blue, for which the Bayer only has so many photosites. It also occurs in red/blue contrasts, which seem to produce just about the worst results I've seen.

Resolution is bad, and there are colour problems and "jaggies".

Situation (2) is more interesting, as it reveals what assumptions the camera is making in terms of colour. Some combinations for which it would occur are blue (001) - yellow (110), and purple (101) - cyan (011).

The colour disappears where the lines converge. Apparently the sensor interprets blue-yellow (001-110) as black-white (000-111). Admittedly, it's hard to see the difference :-)

Foveon versions of the same test pattern (note: brightened by several stops).

Colour accuracy

It's kind of curious to see people refer to resolution as just line pairs per image, and nothing else. Unless you're a BW photographer, you probably care somewhat about colour not being totally off. The tests above show that you can have thousands of line pairs of garbled and ugly looking colours.

Personally, I find that subtle noise in colours has a strong visual effect. I don't know if it's just me, but I really hate the "cooltype" or other subpixel-based font rendering techniques. I find the resulting coloration at the edges around my fonts rather disturbing. The same goes for colour reproduction of photos. Bayer sensors tend to produce washed-out colour, strange rims around colour contrasts, inconsistent detail, and other artifacts most of us are so used to. When looking at Foveon images blown up to their Bayer pixel equivalent, I noticed they looked a little blurrier overall but also with visibly more accurate and more natural looking colours. A nice touch is that Foveon's noise also looks blurrier and therefore less conspicuous. It's kind of ironic in fact that Bayer noise sometimes looks sharper than the signal. Foveon just gives us another way of looking at image resolution.

Because colour is measured at the right location, Foveon has an advantage over Bayer in terms of interpolation. The colour reproduced by the Foveon is the actual colour, rather than a necessarily inaccurate reconstruction from nearby photosites. Because colours are more evenly measured, colour gradients and contrasts are more accurate.

However, Foveon has a less well documented disadvantage, namely that colour separation is less. Unlike Bayer, which uses well-optimised colour filters, it relies on the natural colour absorption characteristics of the semiconductor used, which is not quite the same as the desired image output. Basically, the colour image that comes out of a Foveon is less saturated that a normal-looking image, which means that colour has to be re-saturated. This involves amplifying colour information, which implies increasing chromatic noise. In other words, the signal/noise ratio of Foveon colours is relatively high. Luckily, the size and sensitivity of the photosites compensates for this (see next section).

See the Foveon colour response table here.

Dynamic range

Due to the "megapixel wars", the lines per image rating has been soaring in the last years, beyond that of film. However, one of the main problems for photographers, lack of dynamic range, has gradually deteriorated even further. The advent of cheap DSLRs with their large sensors helped make a more reasonable dynamic range affordable, but even there, cramming more pixels on the sensor exacerbates the dynamic range problem.

Having a specification for dynamic range is rarer than that of number of pixels, which is probably because it is hard to measure accurately, because signal-to-noise ratio is so hard to measure. For a linear device such as a camera sensor, dynamic range is simply the signal-to-noise ratio of the sensor's readout circuitry. It is typically specified as a decibel (dB) value. The DP1's Foveon sensor is specified as 62+ dB signal-noise ratio. Photography traditionally uses stops to specify dynamic range. Actually, stops is equivalent to bits. So, a 12 bits colour value has the potential of storing a maximum of 12 stops of dynamic range. 62 dB amounts to about 10.5 bits. The raw file stores these as 12 bits values. Leaving 1-2 bits slack is normal as it compensates for A/D conversion noise and quantisation noise. Measurements (for the SD14) come up with values ranging from slightly over 8 to more than 10. Unfortunately, the 8-10 stop range is just the critical range that make the difference between what is considered a relatively poor DSLR sensor versus a good one.

Dynamic range is related to the physical size of the photosites, which basically determines how many photons can be absorbed (converted to electricity) before the photosite overflows (aka well capacity). Photosite size is excellent for the Foveon, because they have three times the area of Bayer photosites. This is possible because the multilayer design can potentially catch all photons, rather than necessarily block some of them by colour filters. So, while my DP1 has 2.22 times the number of pixels of my 300D, and has a 10% smaller crop factor, the photosite spacing is actually something like 10% wider.

For a more precise comparison, the Foveon 14mp pixel pitch is specified as 7.8 microns. (see this summary). This is pretty large: see this pixel pitch table for a comparison of different cameras.

Even while the required colour channel amplification annihilates the advantage a Foveon has in terms of dynamic range of individual colours, if we look at the total luminance (the sum of red, green, and blue), the Foveon has optimal sensitivity and dynamic range. This means that, for BW photography, the Foveon will usually have less lines per image type of detail, but is better in terms of dynamic range. This can clearly be seen because the ugly red-green blotches found in the shadows of an overly pushed image will disappear when desaturated, and the resulting luminance noise is significantly more agreeable. This suggests the Foveon is probably well suited for BW photography as well, though its designers may consider that blasphemy :-)

There are aspects of well capacity which remain unclear to me. It is claimed that, when a photosite overflows, its electrons flow to nearby photosites. Some sensors have in fact special ditches or gullies to stop the electrons from doing this. This suggests (though I cannot be sure) that the physical volume of the photosite ultimately determines its well capacity. For the Foveon, the thickness of the photosite then becomes relevant, because it has to be thinner than a Bayer photosite. Actually, the different colour layers have different thicknesses (see also the Foveon X3 patent). This may imply that the blue photosite has less well capacity than the red one. It may also imply that electrons flow from one colour to another when one of the colour overflows. I've tried to see if I could measure any of this with my DP1. So far, the only thing I observed was that an overexposed blue image still has about 0.6 stop of luminance information in the highlights. These highlights are not blue, but grey. A similarly overexposed red image, in contrast, has no such "extra highlights" that can be pulled out. I found this very interesting, but am not sure how to interpret this. It could be that (1) the overflowing blue electrons flow to the green and red photosites; or (2) the larger red and green photosites absorb the remaining blue photons when the blue one is full. Or maybe it's something entirely different. Additionally, circumstances can be imagined under which photosite size is less (such as space required by the cmos readout circuitry), or tricks can be imagined by which photosite capacity can be increased (such as a deeper connected layer into which electrons can flow).

It's hard to find any hard information on any of this, as most of the technical details of the sensors cannot be simply downloaded. Though I've read many sweeping claims by people, some are obviously false, and many of the others cannot be substantiated. Even measurements cannot be trusted, because measuring is a difficult science that is full of snags, especially since there are arbitrary and secret algorithms being ran over the sensor data before it enters into a "raw" file. If I find more verifiable information I will try to include it here.

>>>>> Continue on to: 'why do Foveon highlights look better?'

Boris van Schooten