You can't change your fate, but you can change your probabilities.
おや、もう3月だ。
暇をみて こんなビデオを翻訳していたのだが、 やっと終わった。すんごい時間がかかった。
ある人から「日本語でVTRって何の意味」って聞かれて "Video footage のことだ" と答えておいたのだが、考えてみれば 現在ではテープじゃなくても VTR っていうのな。
だが、ストーブが熱い。
So nobody wants to hear my rant about the Unicode character widths, sigh...
The "character width" is a concept that is how many digits/columns that one character
has to take when it's displayed on a terminal. Traditionally, most Japanese hiragana and kanji
were considered as "full-width (zenkaku)" while other ASCII characters are "half-width
(hankaku)". The concept of "zenkaku" and "hankaku" date back to the old DOS era.
But actually, there's so much more. According to
this Standard,
there are five types of character widths defined in Unicode. Namely,
"Full-width", "Half-width", "Wide", "Narrow", and "Ambiguous". What
the heck. Its full specification per character is found
here
and you'll find it kinda random because they are determined
mostly on the historical basis. Apparently no one cared to follow this
completely when writing their terminal or terminal based apps. Since I use
Emacs on
tmux on
xterm using
Kappa-UCS,
the character widths that each program expects should be perfectly aligned,
otherwise you'll have some garbled output on the window.
And you can see it's unrealistic to expect that all the above four
implementations agree on its character width, for every character.
Kappa probably follows the old DOS convention, but unfortunately some of the
characters are in the "Ambiguous" category in the today's Standard
and therefore its treatment is considered inconsistent. Xterm has a few options to handle
character widths, and some of them eventually rely on the system (glibc)'s
wcwidth(3).
Its source is unclear, but this SO question and the looks of the current source code
suggest it's based on the Standard. Now, tmux apparently uses a rather inaccurate data
based on this wcwidth implementation
or something. Then, there's this Emacs dude. It's character width is determined
by the char-table-width
elisp variable, which is defined in
share/lisp/international/characters.el
, but it changes depending on its
current-language-environment
.
This is too much crap to handle.
By the way, what kind of list has "ASCII", "Arabic", "Chinese-BIG5", "Chinese-CNS", "Cyrillic-ALT", "English", "Japanese", "Latin-1", "Russian", "UTF-8" and "Windows-1255" in the same place? I'ma go ahead and smash my head to the wall for a while. <> P.S. and I forgot about other ncurses apps, such as Lynx and WeeChat.