^ up < Prev

Feb. 2016

Last Modified: Fri Mar 11 07:26:07 UTC 2016

Title 2016-03-11 [Fri] 16:25

You can't change your fate, but you can change your probabilities.

Title 2016-03-05 [Sat] 10:10

おや、もう3月だ。

暇をみてこんなビデオを翻訳していたのだが、やっと終わった。すんごい時間がかかった。

Title 2016-02-21 [Sun] 23:16

ある人から「日本語でVTRって何の意味」って聞かれて "Video footage のことだ" と答えておいたのだが、考えてみれば現在ではテープじゃなくても VTR っていうのな。

ひさびすりに暇な休日 2016-02-11 [Thu] 11:07

だが、ストーブが熱い。

Charあらららcter Widths 2016-02-07 [Sun] 10:15

So nobody wants to hear my rant about the Unicode character widths, sigh...

The "character width" is a concept that is how many digits/columns that one character has to take when it's displayed on a terminal. Traditionally, most Japanese hiragana and kanji were considered as "full-width (zenkaku)" while other ASCII characters are "half-width (hankaku)". The concept of "zenkaku" and "hankaku" date back to the old DOS era. But actually, there's so much more. According to this Standard, there are five types of character widths defined in Unicode. Namely, "Full-width", "Half-width", "Wide", "Narrow", and "Ambiguous". What the heck. Its full specification per character is found here and you'll find it kinda random because they are determined mostly on the historical basis. Apparently no one cared to follow this completely when writing their terminal or terminal based apps. Since I use Emacs on tmux on xterm using Kappa-UCS, the character widths that each program expects should be perfectly aligned, otherwise you'll have some garbled output on the window. And you can see it's unrealistic to expect that all the above four implementations agree on its character width, for every character. Kappa probably follows the old DOS convention, but unfortunately some of the characters are in the "Ambiguous" category in the today's Standard and therefore its treatment is considered inconsistent. Xterm has a few options to handle character widths, and some of them eventually rely on the system (glibc)'s wcwidth(3). Its source is unclear, but this SO question and the looks of the current source code suggest it's based on the Standard. Now, tmux apparently uses a rather inaccurate data based on this wcwidth implementation or something. Then, there's this Emacs dude. It's character width is determined by the char-table-width elisp variable, which is defined in share/lisp/international/characters.el, but it changes depending on its current-language-environment.

This is too much crap to handle.

By the way, what kind of list has "ASCII", "Arabic", "Chinese-BIG5", "Chinese-CNS", "Cyrillic-ALT", "English", "Japanese", "Latin-1", "Russian", "UTF-8" and "Windows-1255" in the same place? I'ma go ahead and smash my head to the wall for a while. <> P.S. and I forgot about other ncurses apps, such as Lynx and WeeChat.

^ Top

Yusuke Shinyama

Document ID: 7b9d1e5cecc907bcc427592facaf7ac4