Group: comp.lang.tcl


Subject: Is there a way to dynamically determine (in 'c') if an Tcl_Obj is in fact a Unicode object.
From: Joe English
Date: 11/25/2007 3:35:36 PM
Todd Helfter wrote: > [...] > Is there a way to tell if the result from Tcl_GetUnicodeFromObj() is > actually a unicode object, meaning that it cannot be represented > accurately by a Tcl_GetStringFromObj(). The question as posed makes no sense; I suspect a misunderstanding somewhere. Tcl string values are *always* Unicode. More precisely: every character in a Tcl string is interpreted as a code point in the Unicode character repertoire. Internally, the primary representation of a string -- the one you get back from Tcl_GetStringFromObj() -- is encoded in UTF-8, which is a multibyte encoding. Tcl also uses a secondary, wide character representation for some operations. The API and man pages incorrectly call this the "Unicode" representation, but it's really UCS-2. Tcl_GetUnicodeFromObj() returns a pointer to a (wide-character) string in this representation. Any sequence of Unicode characters drawn from the BMP can be represented equally well in either of these representations. > Inversely, if I have a pointer to an area of memory.. casting it as > either as a (uchar) or a (char) is easy, but again is there an easy > way to tell if such a thing is needed. Depends on where the contents of the memory came from, but you probably never need to cast. You may, however, need to perform an encoding translation. Tcl uses ByteArray objects to represent text in non-Unicode encodings. You can use Tcl_ExternalToUtf to convert these into Tcl string values. --Joe English