Note that in this paper, discussion of the CSS 2.1 Candidate Recommendation is made in reference to the latest revision (2007-07-19) at the publication date indicated above.
The specification concerning stacking contexts and z-index in CSS 2.1 Candidate Recommendation (Chapter 9 (Visual formatting model) – primarily Section 9.9 (Layered presentation) – and Appendix E (Elaborate description of Stacking Contexts)) contains ambiguities, errors and self-contradictions. In this paper, the problems are identified and discussed, and two proposals are presented for the clarification or replacement of the sections in question. Additionally, §4 identifies a certain part of the specification where the behaviour of all major graphical desktop browsers for Microsoft Windows deviates from that prescribed.
The requirements imposed by the 1998 CSS2 specification on z-axis layering were basic: the root element and positioned elements with integer z-index would form “stacking contexts” responsible for rendering their background, border and contents as an atomic unit, and the position of a stacking context on the z-axis would be given by its integer z-index value. This allowed user agents to divide and delegate the task of page rendering to the stacking contexts, and to then merely layer the stacking contexts according to their z-index values. However, as web authors’ understanding of the specification deepened, their innovations – particularly as regards negative margins, overflow and floats – revealed that z-axis layering behaviour, whilst superficially easy to understand, was underspecified by CSS2. It was unclear how backgrounds and borders, line boxes and floats should be layered relative to one another on the z-axis, with the result that UA implementations varied.
The current CSS 2.1 Candidate Recommendation (CSS 2.1 CR) devotes considerably more space to the definition and discussion of stacking contexts, stack layers and the z-index property than CSS2, defining the layering order within a stacking context in addition to the relative layering of the stacking contexts themselves. It also changed the layering behaviour of stacking contexts with negative z-index: whilst still layered beneath the non-positioned content of the stacking context to which they belong (henceforth “the containing stacking context”), they are layered above that stacking context’s background and borders. Effectively the specification makes it impossible for a box to be rendered beneath the background and borders of any ancestor stacking context.
CSS 2.1 CR summarizes z-axis layering behaviour in Section 9.9 (Layered presentation), and describes it in greater depth in the normative Appendix E (Elaborate description of Stacking Contexts). The latter description is rather thorough and provides details essential to the understanding of the simplified description in Section 9.9. Unfortunately, even with these extra details, Section 9.9 remains ambiguous, with the definition of z-index flawed and the exposition nonsensical in places. In part, this is due to the preservation of language used in CSS2, some of which is inaccurate or meaningless in CSS 2.1 CR.
Floats, inline blocks, inline tables, and positioned elements whose value of z-index is ‘auto’ should be treated as if they form stacking contexts except in that any descendant which actually creates a new stacking context should be considered part of the parent stacking context.
Stacking context–like elements have special behaviour but no term is defined to describe them. This hinders the understanding of the exposition, not least because this special behaviour is only introduced at the very end of Section 9.9 (Layered presentation) and in Appendix E (Elaborate description of Stacking Contexts), requiring the reader to backtrack in order to re-evalute his understanding of the topic.
Moreover, the failure to distinguish between stacking contexts and stacking context–like elements introduces ambiguities; the term “stacking context” in Section 9.9 (Layered presentation) is usually intended to refer to elements which actually generate stacking contexts and to stacking context–like elements. However, in the statements which specify that stacking context descendants of stacking context–like elements “take part in the parent stacking context”, the “parent” stacking context is question refers to an actual stacking context and not a stacking context–like element; indeed, painting responsibility for such descendants may pass through several ancestor stacking context–like elements before an actual stacking context is reached.
Define a term to describe stacking context–like elements. Alternatively, given that stacking context–like elements possess much of the behaviour specified by CSS 2.1 CR for actual stacking contexts, include these elements in the definition of stacking context and explicitly indicate the few occasions where the discussion concerns only the root element and positioned elements with integer z-index to the exclusion of stacking context–like elements.
Each box belongs to one stacking context. Each stacking context consists of several stacking levels, including (for example) a stacking level for in-flow inline-level descendants.
The stacking context in which a given box participates is not explicity specified and cannot be logically deduced (although the only sensible interpretation of the specification is the one that its authors intended). This makes it very difficult to express the specification – and indeed this paper – rigorously.
Specify that a box which does not itself generate a stacking context participates in the stacking context generated by the closest ancestor box which either generates a stacking context or which is a stacking context–like element, while a box which does itself generate a stacking context participates in the closest ancestor stacking context. This definition is adopted in this paper. A formal distinction between stacking contexts and stacking context–like elements (as discussed in §2.1 above) would aid the exposition of this resolution.
Section 9.9 (Layered presentation) and Appendix E (Elaborate description of Stacking Contexts) make several references to the “parent stacking context” of a box.
The term “parent stacking context” of a box is undefined, but is intended to mean the stacking context to which the box belongs (the “containing stacking context”, which may be an ancestor of the box’s parent element in the document tree).
Replace the phrase “parent stacking context” with “the stacking context to which [the box] belongs”, or simply “containing stacking context” (once this has been defined).
Within the description in Section 9.9 (Layered presentation) of the stacking layers that comprise a stacking context, and within Appendix E (Elaborate description of Stacking Contexts), the term “descendant” is frequently used.
Where the term “descendant” is used in the description of the stacking layers that comprise a stacking context C, something more precise is intended, namely that the descendant be a box for whom C is the containing stacking context.
Define a term to describe a descendant of a stacking context C for whom C is the containing stacking context, and use this term instead of “descendant”. In this paper, the term “dependant” is adopted for this purpose.
In addition to their horizontal and vertical positions, boxes lie along a "z-axis" and are formatted one on top of the other. Section 9.9 (Layered presentation) claims that it “discusses how boxes may be positioned along the z-axis”.
These statements imply that all boxes are positioned along a global z-axis. While this can be made technically correct with the addition of omitted mathematical details, it is neither useful, necessary nor enlightening. Indeed, it immediately conjures up an incorrect model in the reader’s mind: that of a three-dimensional space instead of “nested three-dimensional spaces”.
Explain with greater clarity that each stacking context and stacking context–like element is individually thought of as a three-dimensional space with a local z-axis along which its dependant boxes lie, but that each stacking context is thought of as “atomic” (projected, or ‘flattened’) when it is itself positioned on the local z-axis of its containing stacking context.
A stacking context is atomic from the point of view of its parent stacking context; boxes in other stacking contexts may not come between any of its boxes.
The second clause of this statement is poorly worded; it is not clear which other stacking contexts are referred to, nor whether the term “other stacking contexts” is made in contrast to the original stacking context or to its parent stacking context. Furthermore, it is not just boxes in other stacking contexts that may not come between the original stacking context’s boxes (or indeed between its background and its boxes), but also boxes which are stacking contexts.
Atomicity should be defined more carefully, such as by stating that a stacking context sees a dependant stacking context C as a single ‘flattened’ box, and may not insert any other dependent box between C’s own dependant boxes, nor between C’s background and its own dependant boxes.
Section 9.9 (Layered presentation) defines the “stack level” of a box to be its position on the z-axis relative to other boxes in the same stacking context. The stack level of stacking contexts is given by their z-index value. In the definition of z-index, a stacking context is also defined to have a stack level of 0 for the purposes of calculating the stack levels of its dependants, and the term “local stacking context” is used to describe a stacking context in this role.
It is unnecessarily complex that a stacking context should have two stack levels. Whilst its first stack level provides its z-axis position as an atomic box in its containing stacking context, its assigned stack level of 0 as a local stacking context plays no part in the specification other than to impart this same value to its non-positioned dependants (a process which is itself flawed as described in §2.8 below). This part of the specification would seem to have been carried over from the CSS2 specification where the z-axis layering behaviour of descendant stacking contexts with negative z-index differed from that specified by CSS 2.1 CR, and where (the background and non-context-generating dependants of) a stacking context was thought of as lying on a plane through the origin (stack level 0) of a local z-axis. By contrast, in CSS 2.1 CR a stacking context provides a local z-axis but has no position on it; it can be thought of as a three-dimensional space with a z-axis along which its dependants lie.
The term “local stacking context” is not explicitly defined, and is superfluous since it means nothing more than a stacking context which is a descendant of another stacking context; the term simply attempts to distinguish between the two stack levels of a stacking context, and it is not used outside of the definition of z-index.
It would be clearer to not assign a stack level to a stacking context for the purposes of calculating the stack levels of its dependants, and instead explicitly define the stack levels of boxes without reference to the containing stacking context. The term “local stacking context” can be removed.
The stack level of an element in the current stacking context whose value of z-index is ‘auto’ is the same as its parent’s box. Boxes with greater stack levels are always formatted in front of boxes with lower stack levels. Boxes with the same stack level in a stacking context are stacked back-to-front according to document tree order.
The stack level of non-positioned non-element boxes is not explicitly defined (although the computed value of z-index for a non-positioned element is defined to be ‘auto’).
Moreover, since a specific relative layering order is later imposed on all non-positioned non-floated boxes, floats, and inline boxes which belong to a common containing stacking context, the stack levels of these three different types of box need to differ if the statement that “boxes with the same stack level in a stacking context are stacked back-to-front according to document tree order” is to hold true. Hence the three types of box cannot all have the same stack level as their parent’s box, no matter what that is defined to be.
Define positioned boxes whose value of z-index is ‘auto’ to have stack level 0. (In the current specification this is equivalent to assigning them the same stack level as their parent box since a stacking context is defined to have a stack level of 0 for the purposes of calculating the stack levels of its dependants, a definition which is itself problematic as described in §2.7 above.) Then to non-positioned non-floated boxes, floats, and inline boxes assign stack level i, j, k respectively where, if boxes with greater stack levels are always to be rendered in front of boxes with lower stack levels and if the formatting order of boxes defined later in Section 9.9 (Layered presentation) is to be adhered to, we must have -1 < i < j < k < 0. (The equivalence just described then no longer holds, and is no longer relevant.)
Section 9.9 (Layered presentation) states that a stacking context consists seven stacking levels, which it proceeds to list.
The phrase “stacking level” (which is introduced without definition or discussion but which refers to a formatting layer on which boxes are painted) is very easily confused with the term “stack level” (which is a number). The possibility for confusion is made greater by the use of Arabic numerals to label these seven layers.
Replace the phrase “stacking level” with one which more clearly describes its nature and which is less easily confused with the term “stack level”. Use a list style other than Arabic numerals to label the seven layers.
Section 9.9 (Layered presentation) states that one of the seven stacking levels within a stacking context is for floats and their contents.
It is not specified whether this stacking level is for all floated dependants or only for non-positioned floated dependants. Windows Internet Explorer 8 beta 1 for Developers places all floats on this stacking level, whereas all other major graphical desktop browsers currently available for Microsoft Windows (Firefox 2, Firefox 3 beta 5, Opera 9.27, Opera 9.5 beta 1/Win, Safari 3.1/Win, Internet Explorer 6, Internet Explorer 7) place only non-positioned floats on this stacking level.
The phrase “and their contents” is ambiguous and leads to overspecification: the contents of a float are individually subject to the rules described in Section 9.9 (Layered presentation), and so their handling should not be respecified here.
Specify whether or not the stacking layer for floats is for all floated dependants or non-positioned floated dependants only. (The existing behaviour of the major browsers is the likely intended interpretation of the current specification.) Remove the phrase “and their contents”.
Section 9.5 (Floats) states the following. “The contents of floats are stacked as if floats generated new stacking contexts, except that any elements that actually create new stacking contexts take part in the float’s parent’s stacking context. A float can overlap other boxes in the normal flow (e.g., when a normal flow box next to a float has negative margins). When this happens, floats are rendered in front of non-positioned in-flow blocks, but behind in-flow inlines.”
This paragraph of the specification merely restates what is specificed in Appendix E (Elaborate description of Stacking Contexts) and unfortunately introduces a error: the phrase “float’s parent’s stacking context” should read “float’s parent stacking context” since these two stacking contexts are different if the float’s parent is itself a stacking context. Moreover, the latter phrase is itself problematic as described in §2.3 above.
Revise or replace this paragraph with a link to Section 9.9 (Layered presentation).
The problem discussed in §2.11 above is easily addressed using the possible resolution presented there.
Section 9.9 (Layered presentation) should be rewritten to address the problems discussed in §2.1–2.10 above. This paper presents the following two proposals as possible revisions. The only difference between the two is that in the first proposal, stacking contexts and stacking context–like elements are defined as “painting contexts”, of which the root element and positioned elements with integer ‘z-index’ remain defined as “stacking contexts”; while in the second proposal, stacking context–like elements are included in the definition of “stacking context”, and the root element and positioned elements with integer ‘z-index’ are distinguished as “strong stacking contexts”.
Appendix E (Elaborate description of Stacking Contexts) states (subject to the terminology problems described in §2 above) that a float, inline-block or inline-table should be treated as if it created a new stacking context except in that any descendant which actually creates a new stacking context should be considered part of the parent stacking context. For inline-blocks and inline-tables this statement is echoed in Section 9.9 (Layered presentation); for floats it is echoed in Section 9.5 (Floats), and implied in list of stacking layers presented in Section 9.9. However, all major graphical desktop browsers currently available for Microsoft Windows (Firefox 2, Firefox 3 beta 5, Opera 9.27, Opera 9.5 beta 1/Win, Safari 3.1/Win, Internet Explorer 6, Internet Explorer 7 and Internet Explorer 8 beta 1) fail to render these types (where supported) atomically in this way; every positioned child P of a box T of type float (test case), inline-block (test case) or inline-table (test case) is considered part of T’s containing stacking context, even if P does not itself forming a stacking context (that is, its value of z-index is ‘auto’).
Before this paper reached publication, the issue discussed above was raised independently on the www-style mailing list, and approval was granted for revising Appendix E (Elaborate description of Stacking Contexts) to describe the behaviour exhibited by these browsers by stating that the containing stacking context of any positioned element is that generated by its closest positioned ancestor with integer z-index. Section 9.5 (Floats) and Section 9.9 (Layered presentation) will also need revising to reflect this change.