5.2.1 Character sets

Previous Table of Contents

212 Two sets of characters and their associated collating sequences shall be defined: the set in which source files are written (the source character set), and the set interpreted in the execution environment (the execution character set).

213 Each set is further divided into a basic character set, whose contents are given by this subclause, and a set of zero or more locale-specific members (which are not members of the basic character set) called extended characters.

214 The combined set is also called the extended character set.

215 The values of the members of the execution character set are implementation-defined.

216 In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters.

217 A byte with all bits set to 0, called the null character, shall exist in the basic execution character set;

218 it is used to terminate a character string.

219 Both the basic source and basic execution character sets shall have the following members: the 26 uppercase letters of the Latin alphabet

        A  B  C  D  E  F  G  H  I  J  K  L  M
        N  O  P  Q  R  S  T  U  V  W  X  Y  Z

the 26 lowercase letters of the Latin alphabet

        a  b  c  d  e  f  g  h  i  j  k  l  m
        n  o  p  q  r  s  t  u  v  w  x  y  z

the 10 decimal digits

        0  1  2  3  4  5  6  7  8  9

the following 29 graphic characters

        !  "  #  %  &  '  (  )  *  +  ,  -  .  /  :
        ;  <  =  >  ?  [  \  ]  ^  _  {  |  }  ~

the space character, and control characters representing horizontal tab, vertical tab, and form feed.

220 The representation of each member of the source and execution basic character sets shall fit in a byte.

221 In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.

222 In source files, there shall be some way of indicating the end of each line of text;

223 this International Standard treats such an end-of-line indicator as if it were a single new-line character.

224 In the basic execution character set, there shall be control characters representing alert, backspace, carriage return, and new line.

225 If any other characters are encountered in a source file (except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token), the behavior is undefined.

226 A letter is an uppercase letter or a lowercase letter as defined above;

227 in this International Standard the term does not include other characters that are letters in other alphabets.

228 The universal character name construct provides a way to name other characters.

229 Forward references: universal character names (6.4.3), character constants (6.4.4.4), preprocessing directives (6.10), string literals (6.4.5), comments (6.4.9), string (7.1.1).