TSpython 発言

作者: Ueta Masayuki
日時: 2008/9/24(21:08)

Python3.0で日本語処理がどうなっているか
少し試しました。

2.5までと異なって、unicodeと指定しなくても、
lenで日本語の文字数を数えたり、日本語を
一文字ずつ正しく分割できたり、また正規表
現でも英数字と同じように扱えるようになった
みたいですね。

Python 3.0rc1 (r30rc1:66507, Sep 18 2008, 14:47:08) [MSC v.1500 32 bit 
(Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '試験とテストtest'
# c:\Python30\lib\encodings\utf_16_be.pyc matches 
c:\Python30\lib\encodings\utf_
16_be.py
import encodings.utf_16_be # precompiled from 
c:\Python30\lib\encodings\utf_16_b
e.pyc
>>> s
'試験とテストtest'
>>> print s
  File "<stdin>", line 1
    print s
          ^
SyntaxError: invalid syntax
>>> print (s)
試験とテストtest
>>> len(s)
10
>>> list(s)
['試', '験', 'と', 'テ', 'ス', 'ト', 't', 'e', 's', 't']
>>> import re
>>> for i in list(s):
...   if re.search('[^あ-んア-ン]+', i):
...     print (i)
...
import array # builtin
試
験
t
e
s
t
>>>

以上です。

前の発言:

1150. Re: rc1(3.0)/rc2(2.6) でました [機械伯爵] 2008/9/19(20:01)
後の発言:

1152. 2.6 has been released [Bruce.] 2008/10/2(12:40)
親発言:
子発言:

1154. Re: Python3.0rc1の日本語処理 [藤岡和夫] 2008/10/03(21:04)