Digital Video and
This article was written in 2002, so some of the details may be a little
out of date.
There is a certain amount of confusion in the world of desktop video editing
Order”. Video newsgroups and bulletin boards attract a steady
flow of queries about problems that people are experiencing with
settings in their video projects.
Much of this confusion is caused by conflicting terminology - “upper-field-first”,
order A”, “
” - and so on. Hardware and
software manufacturers often give vague or ambiguous advice on this subject,
leaving many users with a feeling that they just donít know what is really
going on, and why this should all be so complicated.
So I decided to get to the bottom of this. This involved going back to the
standards for analogue video on which most digital video formats are derived and
finding out how all of this confusion has arisen, and the correct definitions
for many of these contradictory terms.
The standards themselves make fairly dry reading, but there are some
interesting details there which help to explain the roots of digital video. If
we understand some of this then it all becomes a lot clearer. So, Iíve
extracted some of these details and presented them in a series of simple
diagrams which help to explain, for example, why certain resolutions are used
and the concept of the notorious rectangular pixels used in the video world.
The PAL television system runs at 25 frames a second. Each frame is comprised
of 625 horizontal lines, and is made up of two interlaced
time by 1/50th of a second. If you number the lines from the top of the frame,
the top line is 1 and the bottom line is 625. The first
odd-numbered lines and the second
contains the even-numbered lines. But
youíll probably know most of this already, so Iíll move on to the more
Although in desktop video we are dealing with digital video, the roots of
digital video are firmly planted in the original analogue video standards. Most
readers in the UK will be interested in PAL which has 625 horizontal lines, but
we will also be describing NTSC which has 525 lines. In fact itís more
accurate to refer to 625-line and 525-line systems rather than “PAL” and “NTSC”,
so that is what we will do in most of this article. We will start by describing
the 625-line system because it has fewer complications.
Analogue and Digital Active Areas
OK, so letís start with analogue video. “Interlaced” video is a
continuous waveform comprising a stream of
which are “shot” at
a rate of 50
per second. But each
only contains the alternate lines
in the original picture. The first
1” contains the odd
numbered lines (assuming that you start counting from 1 rather than 0), and the
2” contains the even numbered lines 1/50th of a
second later - we will refer to these
as “F1” and “F2” for
The analog waveform of a
carries a 'signature' that effectively
identifies it as an F1 or F2
. The signature occurs as a pattern of sync
pulses that differs in F1 and F2
- this ensures that the lines of either
appear in the correct position on a television screen. When these two
are interlaced you end up with all of the lines in the picture making a complete
has several lines which do not contain any picture information.
These lines occur during the Vertical Blanking Interval (VBI) when the electron
beam in the TV tube returns from the bottom to the top of the screen. The VBI
also covers several lines which carry data such as Closed Captions and Teletext.
The lines which contain picture information comprise the “active”
area of each
. The ITU-R BT.470-6 recommendation shows where the active
picture area starts and ends in both
. This is the analogue active area.
ITU-R BT.601-5 and 656-4 describe a digital active area. This is used when
the analogue video signal is converted to a digital format, and it does not
exactly coincide with the analogue active area.
The diagram below shows the analogue and digital active areas for a 625-line
Diagram 1 - 625-line Analogue and Digital Active Areas
You can see how the two
(F1 and F2) are interlaced together to make a
frame. The frame-based line numbers are shown on the left in black - these
range from 1 to 625 from the top to the bottom of a complete frame.
However, the ITU documents number the lines in temporal (i.e. chronological)
order as they appear in the analogue waveform, so the lines of
numbered 1,2,3,4 and so on up to 313, and the lines of the subsequent
are numbered 314,315,316 up to 625. The numbers in brackets show a different
line numbering system used in some standards for
2 which is 1 to 312.
The solid lines show the analogue active area. Note that it starts and ends
with a half-line. The dotted lines show the Vertical Blanking Interval.
The digital active area is shown between the two grey Digital Blanking areas.
The digital active area starts with line 23 of
1 and ends with line 623 of
2 - that's 576 lines in total. The first half of line 23 and the second
half of line 623 are part of the (analogue) VBI and do not contain any picture
data, but they are part of the digital active area all the same.
Converting the analogue video waveform to digital involves sampling the
analogue waveform at regular intervals. ITU-R BT.601-5 states that both 525-line
and 625-line systems are sampled at the same rate of 13.5 MHz - thatís 13.5
million samples per second. This common sampling rate is partly to keep
equipment costs down.
Now, interlaced PAL has 25 frames per second, each with 625 lines. So the
number of samples on a single line must be:
13500000 / 25 / 625 = 864 samples per line
And for NTSC which has 29.97 frames per sec, each with 525 lines:
13500000 / 29.97 / 525 = 858 samples per line
However, each line in the analogue video signal contains an area of
horizontal blanking before and after the picture data which is used to keep the
TV display synchronised. And we have already explained that each
several complete lines of blanking (the VBI) before and after the picture data.
The sampling process is a continuous one which covers the whole video waveform
including the horizontal and vertical blanking regions, not just the actual
picture area. So we can represent a complete frame (of two interlaced
an area of samples 625 lines in height by 864 samples in width, and within this
is the window which contains the actual picture - the digital active area.
The process of digitally capturing the analogue signal involves storing only
the samples which contain picture data - the samples from the horizontal and
vertical blanking regions are usually ignored.
Diagram 2 - 625-line Digital Sampling Space at 13.5MHz
The ITU documents define the location of the Digital Active Area by
specifying its first and last line number and the first and last sample number
on the line. These numbers are shown on the diagram above. The area is therefore
720 samples wide and 576 lines high. This forms the image area which is stored
in digital video files which is of course the well-known 720x576 pixels used in
DV, DVD and other digital video formats.
OH is the location of the line sync pulse which marks the start of an
analogue line. (For clarity on this diagram I have numbered the samples starting
at 1 from the position of OH; however the ITU start numbering samples from 0 at
the start of the Digital Active Area.)
Now, ITU-R BT.470-6 states that the active part of an analogue line (i.e the
part of the analogue line which contains the picture) lasts for 52 microseconds.
Using the 13.5 MHz sampling rate, this means that the width of the analogue
active area only covers 702 samples. But the digital active area is wider, at
720 samples. Why the difference? Well, the extra 9 pixels on either side are to
accommodate the growth and decay of the analogue waveform so that there is no
abrupt clipping which may cause “ringing”. 720 also happens to be an
exact multiple of 8 which helps in MPEG and DV encoding - but we wonít go
into that here.
Some analogue video capture cards only capture 704 samples per line which is
also an exact multiple of 8. However, many digital video standards specify 720
samples/pixels width. So if you have some video “footage” which is 704
pixels wide and you want to convert it to a 720 pixel wide format then you really need to add 8 black pixels to either side so that the correct
proportions of objects in the picture are accurately preserved.
The analogue active area represents the area of the picture which can be
displayed on a television screen. The aspect ratio of a TV picture is 4:3, thatís
4 units wide and 3 units high (for the sake of clarity weíll leave out
widescreen). We know that the picture has 576 lines, so we might think that its
width in pixels should be
576 * 4 / 3 = 768 square pixels
but we know that the width of the analogue active area is just 702 samples
(which are effectively pixels). Wouldn't 702 pixels make the picture too narrow?
If that is the question you are asking then the problem is that you are thinking
in terms of “square” pixels which are used in the computer graphics
Square pixels take up the same amount of space horizontally and vertically on
a screen. For example, 1000 pixels across a computer screen takes up the same
space on the screen (measured in inches, centimetres or any other spacial units)
as 1000 pixels down the screen. A 500x500 pixel image will appear exactly square
on a perfect computer screen.
But in the world of digital video we have rectangular pixels, and the width
of the pixel arises from the sampling rate. As we said earlier, the length of
the analogue active part of a line is fixed at 52 microseconds, and if we sample
it at 13500000 times a second we end up with 702 samples covering the complete
52 microsecond time period. If instead we sampled at twice this rate, i.e.
27000000 times a second we would end up with twice the number of samples (1404)
but they would be covering the same 52 microsecond period. We know that in 52
microseconds the electron beam must cover a particular distance across the TV
screen, so we can see that there can be any number of samples across this
distance depending on the sampling rate we choose. A higher sampling rate means
more samples across the same screen width.
In digital video each sample is effectively a pixel, so the number of pixels
across the screen width can vary depending on the sampling rate. So you can see
that if we increase or decrease the sampling rate we are effectively squeezing
or stretching the pixels horizontally so that in total they will occupy the same
spacial distance across the screen.
ITU-R BT.470-6 states that the sampling rate is 13.5MHz, so at this rate the
pixels turn out to be slightly wider than their height for 625-line systems. In
fact the aspect ratio of this rectangular pixel is 54:59. Unfortunately the
convention for stating the pixel aspect ratio is height:width (or y/x) which is
the inverse of the convention for stating the frame aspect ratio (i.e. the
aspect ratio of the picture) which is width:height (or x/y).
These rectangular pixels are often referred to as “Rec 601” pixels and
they apply to D1, DV and DVD digital formats.
If we do a little maths on this we find that 576 lines comprising 702 of
these rectangular pixels would make a picture with the frame aspect ratio of:
702 * 59 / 54 / 576 = 1.332
which is approximately our ideal 4:3 (1.33333) frame aspect ratio.
The diagram below compares the pixel aspect ratios of square (computer
screen) pixels and the Rec 601 rectangular pixels for 625-line systems. It is
based on the width of the digital active area which is 720 Rec 601 pixels wide.
Note that 720x576 is spacially wider than the 4:3 frame aspect ratio, and
corresponds to a width of 786.667 square pixels, i.e.:
720 * 59 / 54 = 786.667 square pixels
Diagram 3 - Pixel Aspect Ratios
The third type of pixels shown are for SVCD. SVCD stands for “Super
Video Compact Disc” and is an official extension of the CD standard which
can be used for storing interlaced digital video on standard compact discs. At
the moment this medium is quite popular among video hobbyists but will probably
become less so as the cost of writing DVDs reduces. Compact discs are relatively
small for the use of digital video, so a much lower sampling rate is used. This
results in only 480 pixels covering the same spacial width as the 720 Rec 601
pixels. SVCD pixels are therefore much wider.
525-line (NTSC) Systems
Letís now look at the 525-line system used in NTSC. The diagram below shows
the analogue and digital active lines for 525-line systems:
Diagram 4 - 525-line Analogue and Digital Active Areas
OE2 is the
sync pulse which marks the start of
The ITU standards show the analogue active picture area starting half way
through line 282 of
2 and ending half way through line 263 of
However, over the years the FCC have allocated the first few of these lines to
non-picture data such as line 20 and 283 carrying source identification data,
and line 21 and 284 carrying closed caption and program rating information.
Consequently the analogue active area is now specified by the FCC to start at
line 22 of
1, with all the preceding lines being part of the Vertical
Blanking Interval (VBI). In addition, line 22 which is in the FCC's visible
picture area has been authorised to carry “electronic verification of
television broadcasts” to track which programmes and adverts are actually
The ITU's digital active area starts on line 20 of
F1 and ends on line
F1, which seems to give 487 digital lines, although the last line
only has picture data in its first half. However, in practice one of these lines
is dropped and only 486 lines form part of the digital active area Ė but which
one? Chris Pirazzi in his excellent and oft-linked-to “Lurkerís Guide to
Video” says that SGI hardware drops line 20 of
1. However the working
group of the ITU who are responsible for the ITU-R BT.656-4 document believe
that line 263 should be omitted, perhaps because the start of the digital
blanking period for
2 canít be on line 263, because this line begins in
Most digital formats such as DV, DVD and SVCD specify 480 lines as this is
exactly divisible by 16 for MPEG encoding.
This means that 6 lines must be
dropped from the ITUís 486 - but which 6? In practice this too seems to vary,
but it would appear prudent to drop an even number of lines above and below to
order. Because the FCC specify that the (analogue) picture
area starts on line 22, we could drop the four lines from line 20 of
2, and drop the remaining two lines from 262 of
1. Or we
could also drop line 22 of
1 and 285 of
2 and keep line 262 of
1 and 525 of
2. Some implementations do not even use a full 480 lines of
picture data. Confusing, isn't it?
Nevertheless, the following diagram shows the ITU's digital sampling space
for 525-line systems.
Diagram 5 - 525-line Digital Sampling Space at 13.5MHz
As shown earlier, the 525-line 29.94 frames per sec system sampled at 13.5
MHz gives 858 samples (Rec-601 rectangular pixels) per line. In this case the
pixel aspect ratio is 11:10 which is narrower than the 625-line system's 54:59,
and also narrower than square pixels. Consequently the pixel aspect ratio of
Rec-601 rectangular pixels for 525 and 625 line systems are not the same.
The ITU standards specify that the analogue active line length for M/NTSC
(which is used in the US) is approximately 52.66 microseconds. At the 13.5MHz
sampling rate this gives a width of the analogue active area of almost 711
So, 486 lines comprising 711 of these rectangular pixels would make a picture
with the frame aspect ratio of:
711 * 10 / 11 / 486 = 1.330
which is roughly our ideal 4:3 (1.33333) frame ratio.
The diagram below compares the pixel aspect ratios of square (computer
screen) pixels and the Rec-601 13.5MHz rectangular pixels for 525-line systems.
It is based on the width of the digital active area which is 720 Rec-601 pixels.
Note that 720x480 is spacially wider than the 4:3 frame aspect ratio, and
corresponds to a width of 654.545 square pixels, i.e.:
720 * 10 / 11 = 654.545 square pixels
Diagram 6 - Pixel Aspect Ratios
When two digital
are interleaved we end up with a complete 576 line
frame. The convention for referring to these two
is the “upper”
(which contains the top line of the frame) and the
“lower” or “bottom”
are 1/50th of a second apart, so
if the upper
is temporally earlier than the lower
then we say that
order is “upper
first”. If the lower
temporally earlier than the upper
order is “lower
In diagram 1 the top line (and alternate lines) of the digital active area is
in F1. We know that F2 follows F1 so we might conclude that 625 line systems are
first” - but this is not necessarily the case. It is not
the video standard that defines the
order, it is the operation of the
hardware used to capture and store the analogue video waveform.
Letís assume that we have an analogue video capture card which complies
with the ITU standards. It must therefore capture lines in the range 23 to 310
1, and lines 336 to 623 of
2. When these are interlaced to make a
frame then line 23 is the top line. This means that the capture card (and its
drivers) must slot F1 into the upper
and F2 into the lower
frame. But this still does not tell you whether the
order of the resulting
frames is upper or lower
Remember that the video waveform is a continuous stream of alternating
. If the capture card starts from an F1
, then the F1
is fed into the upper
, and the subsequent F2
is fed into the
of the frame. In this case the
order is indeed upper
first. However, if the card starts from an F2
, the F2
is fed into
and the subsequent F1
is fed into the upper
frame. We now find that the
order is lower
first, because within the
interlaced frame it is the lower
which is temporally earlier than the
The diagram below illustrates an analogue video waveform being digitised by a
video capture card. It is of course vastly simplified. To make it easier to see
what is going on, the picture carried by the waveform starts off completely
white, and then fades quickly to black in just 5/50ths of a second. So the
waveform begins with several “white”
, and then 5
progressively darker, and then several “black”
. The diagram
shows an instant where this fade-out is being digitised. Each pair of
interlaced and stored in an AVI file as frames.
To make it easier to see what is happening only the top 8 lines of each
interlaced frame are shown. If you bear in mind that lighter greys occur
“earlier” than darker greys, then you can easily see the
of the interlaced frames.
Diagram 7 - Field Order
Both of the capture cards shown are ITU compliant so they produce interlaced
frames whose top line is line 23 of
F1. However, the first card processes
F1 first and then
F2; the second card processes
F2 first and
F1. You can therefore see the results of digitising an F1 or an F2
However it is not uncommon for some hardware to start capturing on another
line, for example the line 336 of
F2. In this case it is an F2
is fed into the upper
and not an F1
. So now if the capture card
starts from an F1
, the F1
is fed into the lower
, then the
is fed into the upper
of the frame, and the
order is therefore lower
first - the opposite of the previous case.
So you can see that the
order of the frames in the captured AVI file
depends on the line number where capturing starts, and which
F1 or F2 is
As a PC user, both of these things are usually hidden from you. If you are
lucky the documentation for the capture card will tell you whether the card is
upper or lower
first - i.e. the
order of the digital video files it
creates. If not then you are completely in the dark and you must examine the
files yourself to find out what their
order is - but you will only have to
do this once as the card will always produce files with same
it has buggy drivers!).
Many video file formats do not carry information which tells programs the
-order of the interlaced video material they contain. Consequently you need
to know the
-order of your video clips and tell the video applications what
it is, although some programs will have a stab at trying to work out the
-order by analysing the picture content - but they can get it wrong!
The problem really arises when you have a video editing project on your PC
which contains clips which are from multiple sources which have differing
orders, or you want to use another program perhaps to convert your video to
another file format.
Suppose your capture card generates AVI files which are upper
you cut to an AVI file which is lower
first. Now when this is output from
your capture card to a TV monitor, the first clip will play fine as the
will be output upper
first (which is the same order as the
captured); but the second clip will also be output upper
first which is
the opposite order to which its
were captured. The result is that when
the playback reaches the second clip motion will appear jerky because the
are being displayed in the wrong order, effectively jig-zagging back and forth
The solution is to change the
order of the second clip. This
effectively shifts the frame boundary by 1
. In this example, the new first
frame will contain the upper
of the old first frame interlaced with the
of the old second frame; the new second frame will contain the upper
of the old second frame interlaced with the lower
of the old third
frame, and so on. This would result in the old first
and old last
the clip being discarded, but it would make the clip 2
duration. An alternative would be to discard only the first
the last lower
which would then be paired with the final upper
this would cause a temporal discontinuity in the final frame of the clip but
would maintain its duration.
For example, the
order can be changed in Premiere by selecting the clip
in the timeline and changing its video properties by ticking both “reverse
dominance” and “interlace consecutive frames”.
the second solution above. Note that you must click both of these options.
dominance” just swaps the odd and even lines within
each frame which is completely wrong as it effectively causes a vertical spacial
“scrambling” of alternate lines. “Interlace consecutive frames”
swaps the lower
between each pair of frames, so that the new frame 1
contains the upper
of the old frame 1 interlaced with the lower
the old frame 2, and the new frame 2 contains the upper
of the old frame 2
interlaced with the lower
of the old frame 1. The combination of these two
options properly changes the
order of the clip.
An alternative (and simpler) solution is simply to shift all of the lines in
the frame up or down by 1 line. This effectively shifts all of the lines that
are in the upper
to the lower
, and vice versa, but without causing
any spacial scrambling. It does however mean that you lose 1 line of picture
detail from the top or bottom of the frame.
Upper, Top, Odd, 1 or A ?!
And so we come to the source of so much confusion in the world of digital
video Ė the naming conventions for
As we now know, an interlaced frame consists of two
contains the top line of the frame and all of the alternate lines beneath, and
contains the line-below-the-top-line and the alternate lines
beneath. Notice that I have avoided using numbers in this description (such as
line 1 or first line), and there is a reason for this which will become clear.
We need to give these two
different names so that we can easily
distinguish between them. There are several naming conventions used, but almost
all of them are open to misinterpretation, and should therefore be avoided if at
The only names which seem to be unequivocal are “upper
” (or “top
” and “bottom
contains the top line of the frame and the lower
line-below-the-top line of the frame. The meaning of the terms “upper”
and “lower” become obvious if you just imagine the top two lines of a
You can now also express the
order as “upper
first” without fear of confusion. Consequently we will
use the terms “upper
” and “lower
” as our naming
Here are some of the other terms often used and why I think they should be
- If you look at the frame-based line numbers you
can see that one
occupies the odd-numbered lines and the other
occupies all the even numbered lines. This seems like an ideal system for naming
, except that some specifications start numbering the lines from 0
instead of 1. So the “odd”
could be either the upper or lower
depending on how you start numbering the lines. Consequently the
first” is not very helpful.
B - There is disagreement over whether
A refers to
or the lower
. And some programs such as TMPGEnc take “
Order A” to mean upper
first, while others such as Uleadís products
take it to mean lower
2 - The analogue video waveform contains two types of
which have different
sequences of sync pulses. These two
are called in the ITU specs
1” and “
2” and the video waveform alternates
between the two which produces the interlaced picture on a TV screen. However as
we have seen it does not necessarily follow that
1 is the upper
Depending on how the waveform is captured,
2 may be the upper
may be further confused by some people simply assuming that
1 must always
mean the upper
or the earlier of the two
in a frame.
Dominance is not the same as
Order, although it is often used as
such. It has nothing to do with whether the upper or lower
earlier within a frame.
In the studio, analogue video is stored on tape as a series of alternating
- there are no frames as such. When this material is
edited, separate clips must be joined together, but we must be careful to
preserve the alternating F1-F2-F1-F2 sequence of
across the “joins”.
We must therefore decide whether a new scene begins on an F1 or F2
provided we stick to that rule then the sequence of
will be correct for
the complete project. This “rule” is the
If our scenes (i.e. our edits) start on
F1, then our editing system is
F1 Dominant; and if the scenes start on
F2 then our system is F2 Dominant.
However, when this video is transferred to our PC we no longer know which
was F1 or F2. All we see is our AVI file containing an upper and lower
, and we usually donít know which was an F1 or F2
in the original
Similarly, computer-based video editing programs use the frame boundary for
their editing “joins”. A cut from one scene to another occurs on a frame
boundary. Because we know that these frames are either upper or lower
first then we could say that our video material is “upper
dominant”. And some people do use these terms, although
dominance is really meant to be applied to the F1 or F2 analogue
Furthermore, if we are capturing some material which has the opposite
dominance to that of our capture card, we could end up with a clip which has
scene changes in mid-frame. In this case we could have a clip with
first” but with “lower
Some Digital File Formats
Lets conclude by taking a quick look at a couple of the common pro-sumer
digital video formats and their
Most video formats stored on computer do not contain any information about
order of the video material they contain. This means that we usually
have to study the file contents to work this out ourselves.
DV & DVCAM
DV & DVCAM is used in consumer and semi-professional camcorders. In
625-line systems it has a resolution of 720x576, and in 525-line systems is
DV uses intra-frame compression, which means that each frame is compressed
separately, so it is very easy to edit. It also makes it easy to start recording
over material on the camcorder at any particular frame.
The DV standards say that the digital active area of DV for 625-line systems
is from line 335 of
2 to line 310 of
1. If you take a look at
diagram 1 youíll notice that this range is offset from the ITU specs by 1
Similarly the digital active area for 525-line systems is from line 285 of
2 to line 262 of
Consequently in both systems
1 is the “lower”
dominance effectively means that the
order of DV material is “lower
first” for both PAL and NTSC, which is nice!
DVD uses MPEG1 or MPEG2 compression, but only MPEG2 can store interlaced
MPEG uses inter-frame compression, which means that most frames are encoded
by storing only the features in the image which have changed compared to
previous (or later) frames. This makes it more difficult to edit because you
usually have to build up a complete frame by accumulating the changes over
several previous frames.
The MPEG specs say that interlaced material is encoded as a series of
identified as a “top” or “bottom”
Alternatively it may be encoded as a series of frames containing two interlaced
, and each frame has a flag which states whether it is “top
In either case the
order is effectively contained within the MPEG file.
A DVD player would have to output the
in the specified order.
Analogue and Digital TV:
Rec. ITU-R BT.470-6
Rec. ITU-R BT.601-5
Rec. ITU-R BT.656-4
Copyright © 2002-2010 DVMP - All rights reserved.
<< Back to “Miscellaneous Articles”