|
|
Subject: Is there a way to dynamically determine (in 'c') if an Tcl_Obj is in fact a Unicode object.
From: Joe English
Date: 11/25/2007 3:35:36 PM
Todd Helfter wrote:
> [...]
> Is there a way to tell if the result from Tcl_GetUnicodeFromObj() is
> actually a unicode object, meaning that it cannot be represented
> accurately by a Tcl_GetStringFromObj().
The question as posed makes no sense; I suspect a
misunderstanding somewhere.
Tcl string values are *always* Unicode. More precisely: every
character in a Tcl string is interpreted as a code point
in the Unicode character repertoire.
Internally, the primary representation of a string --
the one you get back from Tcl_GetStringFromObj() --
is encoded in UTF-8, which is a multibyte encoding.
Tcl also uses a secondary, wide character representation
for some operations. The API and man pages incorrectly call
this the "Unicode" representation, but it's really UCS-2.
Tcl_GetUnicodeFromObj() returns a pointer to a (wide-character)
string in this representation.
Any sequence of Unicode characters drawn from the BMP can be
represented equally well in either of these representations.
> Inversely, if I have a pointer to an area of memory.. casting it as
> either as a (uchar) or a (char) is easy, but again is there an easy
> way to tell if such a thing is needed.
Depends on where the contents of the memory came from,
but you probably never need to cast. You may, however,
need to perform an encoding translation.
Tcl uses ByteArray objects to represent text in non-Unicode
encodings. You can use Tcl_ExternalToUtf to convert these
into Tcl string values.
--Joe English
|