DVD subtitles
         ---------------


  0. Introduction
  1. Basics
  2. The data structure
  3. Reading the control header
  4. Decoding the graphics
  5. What I do not know yet / What I need
  6. Thanks


0. Introduction

  One of the last things we missed in DVD decoding under my system was the
decoding of subtitles. I found no information on the web or Usenet about them,
apart from a few words on them being run-length encoded in the DVD FAQ.

  So we decided to reverse-engineer their format (it's completely legal in
France, since we did it on interoperability purposes), and managed to get
almost all of it.


1. Basics

  DVD subtitles are hidden in private PS packets (0x000001ba), just like AC3
streams are.

  Within the PS packet, there are PES packets, and like AC3, the header for the
ones containing subtitles have a 0x000001bd header.
  As for AC3, where there's an ID like (0x80 + x), there's a subtitle ID equal
to (0b001x xxxx), where the last 5 bits are the subtitle ID. There are 32 
possible different subtitles on a DVD (my Taxi Driver copy has 16).

  I'll suppose you know how to extract AC3 from a DVD, and jump to the
interesting part of this documentation. Anyway you're unlikely to have
understood what I said without already being familiar with MPEG2.


2. The data structure

A subtitle packet, after its parts have been collected and appended, looks
like this :

   +----------------------------------------------------------+    
   |                                                          |
   |   0    2                                         size    |
   |   +----+------------------------+-----------------+      |
   |   |size|       data packet      |     control     |      |
   |   +----+------------------------+-----------------+      |
   |                                                          |
   |                     a subtitle packet                    |
   |                                                          |
   +----------------------------------------------------------+    

size is a 2 bytes word, and data packet and control may have any size.


Here is the structure of the data packet :

   +----------------------------------------------------------+    
   |                                                          |
   |   2    4                                        x0+2     |
   |   +----+------------------------------------------+      |
   |   | x0 |                  data                    |      |
   |   +----+------------------------------------------+      |
   |                                                          |
   |                      the data packet                     |
   |                                                          |
   +----------------------------------------------------------+    

x0, the data packet size, is a 2 bytes word.


Finally, here's the structure of the control packet :

   +----------------------------------------------------------+    
   |                                                          |
   | x0+2  x0+4                                 x1       size |
   |   +----+---------+---------+--+---------+--+---------+   |
   |   | x1 |ctrl seq |ctrl seq |..|ctrl seq |ff| end seq |   |
   |   +----+---------+---------+--+---------+--+---------+   |
   |                                                          |
   |                     the control packet                   |
   |                                                          |
   +----------------------------------------------------------+    

To summarize :

 - x1, at offset x0+2, the position of the end sequence
 - several control sequences
 - the 'ff' byte
 - the end sequence


3. Reading the control header

The first thing to read is the control sequences. There are several
types of them, and each type is determined by its first byte. As far
as I know, each type has a fixed length.

 * type 0x00 : '00' - 1 byte
   this identifies the subpicture stream in a menu.
   A menu subpicture stream control sequence has the following structure:
   0x00 0x03 0x00 0x00 0x04 0x00 0x00
   0x06 xx xx xx xx 0x05 xx xx xx xx xx xx 0xff (0xff)
   One or two 0xff's in the end to make the length even.
   
 * type 0x01 : '01' - 1 byte
   seems to say "start displaying"

 * type 0x03 : '03wxyz' - 3 bytes
   this one has the palette information ; it basically says
   encoded color 0 is the wth color of the palette, encoded color
   1 is the xth color, aso.

 * type 0x04 : '04wxyz' - 3 bytes
   this is the alpha channel information (mixer key); this entries are
   reversed (compared to the palette information), which means:
	A[0]<->P[3]
	A[1]<->P[2]
	A[2]<->P[1]
	A[3]<->P[0]
   each entry is one nibble.

 * type 0x05 : '05xxxXXXyyyYYY' - 7 bytes
  the coordinates of the subtitle on the screen :
   xxx is the first column of the subtitle
   XXX is the last column of the subtitle
   yyy is the first line of the subtitle
   YYY is the last line of the subtitle
  thus the subtitle's size is (XXX-xxx+1) x (YYY-yyy+1)

 * type 0x06 : '06xxxxyyyy' - 5 bytes
  xxxx is the position of the first graphic line, and yyyy is the position of
 the second one (the graphics are interlaced, so it helps a lot :p)

The end sequence has this structure:

 xxxx yyyy 02 ff (ff)

 it ends with 'ff' or 'ffff', to make the whole packet have an even length.

 xxxx is the display duration in units of frames.

 yyyy is equal to x1 (see picture).


Example of a control header :
----
0A 0C 01 03 02 31 04 0F F0 05 00 02 CF 00 22 3E 06 00 06 04 E9 FF
00 93 0A 0C 02 FF
----
Let's decode it. First of all, x1 = 0x0a0c.

The control sequences are :
 01
   Nothing to say about this one
 03 02 31
   Color 0 is 0, color 1 is 2, color 2 is 3, and color 3 is 1.
 04 0F F0
	 Colors 0 and 3 are transparent, and colors 2 and 3 are opaque
	 (not sure of this one)
 05 00 02 CF 00 22 3E
	 The first column is 0x000, the last one is 0x2cf, the first
	 line is 0x002, and the last line is 0x23e. Thus the subtitle's
	 size is 0x2d0 x 0x23d.
 06 00 06 04 E9
	 The first encoded image starts at offset 0x006, and the second
	 one starts at 0x04e9.

And the end sequence is :
 00 93 0A 0C 02 FF
	 Which means... well, not many things now. We can at least
	 verify that x1 (0x0a0c) is there.

4. Decoding the graphics

	 The graphics are rather easy to decode (at least, when you
	 know how to do it - it took us one whole week to figure out
	 what the encoding was :p).

   The picture is interlaced, for instance for a 40 lines picture:

  line 0  ---------------#----------
  line 2  ------#-------------------
   ...
  line 38 ------------#-------------
  line 1  ------------------#-------
  line 3  --------#-----------------
   ...
  line 39 -------------#------------

   When decoding you should get:

  line 0  ---------------#----------
  line 1  ------------------#-------
  line 2  ------#-------------------
  line 3  --------#-----------------
   ...
  line 38 ------------#-------------
  line 39 -------------#------------

   Computers with weak processors could choose only to decode even lines
  in order to gain some time, for instance.


   The encoding is run-length encoded, with the following alphabet:

   0xf
   0xe
   0xd
   0xc
   0xb
   0xa
   0x9
   0x8
   0x7
   0x6
   0x5
   0x4
   0x3-
   0x2-
   0x1-
   0x0f-
   0x0e-
   0x0d-
   0x0c-
   0x0b-
   0x0a-
   0x09-
   0x08-
   0x07-
   0x06-
   0x05-
   0x04-
   0x03--
   0x02--
   0x01--
   0x0000

	 '-' stands for any other nibble. Once a sequence X of this
	 alphabet has been read, the pixels can be displayed : (X >> 2)
	 is the number of pixels to display, and (X & 0x3) is the color
	 of the pixel.

	 For instance, 0x23 means "8 pixels of color 3".

	 "0000" has a special meaning : it's a carriage return. The
	 decoder should do a carriage return when reaching the end of
	 the line, or when encountering this "0000" sequence. When
	 doing a carriage return, the parser should be reset to the
	 next even position (it cannot be nibble-aligned at the start
	 of a line).

<NOTE> (thanks to Sham Gardner <livid@risctaker.de>):
I came across a subtitle in the region 1 version of "Stand By Me",
which contains the RLE sequence 0x0003, which would be illegal according to
the spec currently in the LiViD archive. As far as I can tell, it's not just
0x0000 that signifies the end of a line, but 0x000-.

Interpreting it this way provides an acceptable result. Presumeably the
bottom two bits indicate the colour with which to fill the rest of the line
as in the other codes. Unfortunately in this particular case the code occurs
very close to the actuall end of the line, so it's hard to tell whether
filling with 3 or 0 is correct.
</NOTE> After a carriage return, the parser should read a line on the
	 other interlaced picture, and swap like this after each
	 carriage return.

	 Perhaps I don't explain this very well, so you'd better have a
	 look at the enclosed source.


5. What I do not know yet / What I need

I don't know if there are other types of control sequences (in my
programs I consider 0xff as a control sequence type, as well as
0x02. I don't know if it's correct or not, so please comment on
this).

So what I need is you :

 - if you can, patch this document or my programs to fix strange
	 behaviour with your subtitles.

 - send me your subtitles (there's a program to extract them
	 enclosed) ; the first 10 KB of subtitles in a VOB should be
	 enough, but it would be cool if you sent me one subtitle file
	 per language.


6. Thanks

Thanks to Michel Lespinasse <walken@via.ecp.fr> for his great
help on understanding the RLE stuff, and for all the ideas he
had.

Thanks to mass and taaz (sorry guys, I don't know your real
names) from irc at openprojects.net for sending me their
subtitles.


-- 
Paris, January 16th 2000
Samuel Hocevar <sam@via.ecp.fr>

Minor changes:
Ottawa, January 17th 2000
Aaron Holtzman <aholtzma@ess.engr.uvic.ca>

Minor changes/adds:
Salzburg, January 25th 2000
Thomas Mirlacher <dent@linuxvideo.org>

Minor adds:
Providence, June 18th 2000
Yuqing Deng <Yuqing_Deng@brown.edu>


see also:
http://www.mpeg.org/MPEG/DVD/Book_B/Subpic.html