TSfree 発言

作者: davi
日時: 2005/12/16(01:23)

Bruce.さん  ＜  こん？？は でび です

On Thu, 15 Dec 2005 23:05:06 +0900
"Bruce." <kbk@...> wrote:

> 一度UCS-4での表現に
> 変換したものをUTF-8表現にするという手順を踏みます。

え・・・

RFC 3629

｜   In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16
｜   accessible range) are encoded using sequences of 1 to 4 octets. 

しらなかったよ〜

｜ Implementations of the decoding algorithm above MUST protect against
｜decoding invalid sequences. For instance, a naive implementation may
｜decode the overlong UTF-8 sequence C0 80 into the character U+0000,
｜or the surrogate pair ED A1 8C ED BE B4 into U+233B4.

「6バイトで送っても復号できるようにしなければならない…」
ということは、間違ったデータの作り方なのね。

でも、そういうデータも復号できなきゃダメっていうのは、
実装者にはキツそうですね。

でび  http://homepage1.nifty.com/davi/

前の発言:

1399. Re: サロゲートペア使ってますか? [Bruce.] 2005/12/16(00:00)
後の発言:

1401. Re: サロゲートペア使ってますか? [Bruce.] 2005/12/16(12:49)
親発言:

1398. Re: サロゲートペア使ってますか? [Bruce.] 2005/12/15(23:05)
子発言:

1401. Re: サロゲートペア使ってますか? [Bruce.] 2005/12/16(12:49)