Group: pgsql.patches


Subject: initdb of regression test failed.
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 10/2/2007 1:30:05 PM
"Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> writes: > The database cluster will be initialized with locale Japanese_Japan.932. > initdb: could not find suitable encoding for locale "Japanese_Japan.932" So, what encoding *should* we use for that locale? > I think this is required.... We are certainly not going to disable pg_regress's ability to test in non-C locales. ISTM a proper fix is an addition to the table in src/port/chklocale.c. This example suggests actually that we need a boatload more table entries to handle Windows locale names :-( (count on Microsoft to ignore standards...) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq

Subject: initdb of regression test failed.
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 10/4/2007 1:44:35 AM
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > In fact, we can accept options like: > initdb -E UTF8 --locale=Japanese_Japan.932 -- CP932 is SJIS in nature Hmm, but does that really work safely? I think varstr_cmp() does work, because it forces our data into wchar format and then calls wcscoll(). The thing that scares me is that various random other operating-system calls might deliver strings in an unexpected encoding. We've been through similar problems with timezone names reported by strftime, for example. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster

Subject: initdb of regression test failed.
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 10/4/2007 3:26:53 PM
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > The attached is the second plan. It uses UTF-8 and locale=C when > the default locale encoding is not supported and none of encoding and > locale are passed to initdb. It would help users who use the default > settings (including regression test). I'm not very happy with this proposal, because for people who don't actually care about non-ASCII data (which is still a lot of people), forcing UTF-8 as the default encoding will impose pretty substantial overhead compared to SQL_ASCII --- it turns on all those multibyte-encoding checks. Implicitly selecting --no-locale doesn't seem like a big step forward either, since then you've just given up whatever you might have learned from the locale setting. Besides, if that's the behavior the user wants, he can specify it. I still think that what we should try to do in the default case is find a locale that is the same language but UTF-8 encoding. > At the moment, it reset all of lc_* variables, but it might be possible > use the default locale at lc_messages, lc_monetary, lc_numeric and lc_time > even if lc_collate and lc_ctype are reset to C. Well, that just leaves me wondering what encoding the localized messages would be presented in ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org