TSpython 発言

作者: Bruce.
日時: 2006/4/24(12:42)

Bruce.です。

#あう、Firefoxがいきなり落ちたのでやり直し。

機械伯爵 writes:

> > それはともかくPythonでいうところのUnicodeエンコーディングは
> > 内部的にはUCS-4かUCS-2(UTF-32とUTF-16と書くべき?)だったような
> > 気がするのですが違いますでしょうか?
> 
> 　う〜ん、内部はわかんないですけど、コデック使わずにUTF-16かましたら
> 前にエラーが出た覚えがあるんですけど。
> 
> 　UTF-8エンコードだと、基本的に１バイト文字はそのままなんで、そっち
> 使ってると思ったんですけどね。

Python 2.4.2のソースで調べてみました。

unicodeobject.h
> #ifndef PY_UNICODE_TYPE
>
> /* Windows has a usable wchar_t type (unless we're using UCS-4) */
> # if defined(MS_WIN32) && Py_UNICODE_SIZE == 2
> #  define HAVE_USABLE_WCHAR_T
> #  define PY_UNICODE_TYPE wchar_t
> # endif
>
> # if defined(Py_UNICODE_WIDE)
> #  define PY_UNICODE_TYPE Py_UCS4
> # endif
>
> #endif

(略)

> /*
> * Use this typedef when you need to represent a UTF-16 surrogate pair
> * as single unsigned integer.
> */
> # if SIZEOF_INT >= 4 
> typedef unsigned int Py_UCS4; 
> #elif SIZEOF_LONG >= 4
> typedef unsigned long Py_UCS4; 
> #endif
>
> typedef PY_UNICODE_TYPE Py_UNICODE;


なので、UTF-8でないのは確実かと。

いじょ。

前の発言:

767. operator は嫌いですか [Fe2+] 2006/4/24(12:03)
後の発言:

769. Re: operator は………びみょー [機械伯爵] 2006/4/25(08:40)
親発言:

766. Re: Py3k [機械伯爵] 2006/4/24(10:49)
子発言:

770. Re: Py3k [機械伯爵] 2006/4/25(08:49)