Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/7/2007 11:34:52 AM
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> [ bad estimate for LIKE ]
Hmmm ... what locale are you working in? I notice that the range
estimator for this pattern would be "ancestors >= '1062/' AND
ancestors < '10620'", which will do the right thing in C locale
but maybe not so much elsewhere.
> Version is PostgreSQL 8.1.8 on i686-redhat-linux-gnu,
You'd probably get better results with 8.2, which has a noticeably
smarter LIKE-estimator, at least for histogram sizes of 100 or more.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/7/2007 11:58:49 AM
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> On 11/7/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmmm ... what locale are you working in? I notice that the range
>> estimator for this pattern would be "ancestors >= '1062/' AND
>> ancestors < '10620'", which will do the right thing in C locale
>> but maybe not so much elsewhere.
> Sorry for not having mentioned it before. Locale is UTF-8.
I wanted the locale (lc_collate), not the encoding.
> I suppose my best bet is to remove the pg_statistic line and to set
> the statistics to 0 for this column so that the stats are never
> generated again for this column?
That would optimize this particular query and probably pessimize
a lot of others. I have another LIKE-estimation bug to go look at
today too; let me see if this one is fixable or not.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/7/2007 6:14:08 PM
I wrote:
> "Guillaume Smet" <guillaume.smet@gmail.com> writes:
>> [ bad estimate for LIKE ]
> Hmmm ... what locale are you working in? I notice that the range
> estimator for this pattern would be "ancestors >= '1062/' AND
> ancestors < '10620'", which will do the right thing in C locale
> but maybe not so much elsewhere.
I've applied a patch that might help you:
http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/8/2007 10:01:09 AM
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> On Nov 8, 2007 12:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've applied a patch that might help you:
>> http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php
> AFAICS, it doesn't seem to fix the problem.
Hmm, can we see the pg_stats row for the ancestors column?
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/8/2007 12:22:28 PM
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> On Nov 8, 2007 12:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've applied a patch that might help you:
>> http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php
> AFAICS, it doesn't seem to fix the problem. I just compiled
> REL8_1_STABLE branch and I still has the following behaviour:
OK, I tried it in fr_FR locale and what I find is that
regression=# select '123/' < '1230'::text;
?column?
----------
t
(1 row)
so make_greater_string() will still think that its first try at
generating an upper-bound string is good enough. However
regression=# select '123/1' < '1230'::text;
?column?
----------
f
(1 row)
so the data starting with '123/' is still outside the generated range,
leading to a wrong estimate. I didn't see this behavior yesterday but
I was experimenting with en_US which I guess has different rules.
What I am tempted to do about this is have make_greater_string tack "zz"
onto the supplied prefix, so that it would have to find a string that
compares greater than "123/zz" before reporting success. This is
getting pretty klugy though, so cc'ing to pgsql-hackers to see if anyone
has a better idea.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/8/2007 7:40:20 PM
Gregory Stark <stark@enterprisedb.com> writes:
> Doesn't really strike at the core reason that this is so klugy though. Surely
> the "right" thing is to push the concept of open versus closed end-points
> through deeper into the estimation logic?
No, the right thing is to take the folk who defined "dictionary sort
order" out behind the barn and shoot 'em ;-). This has got nothing to
do with open/closed endpoints and everything to do with the bizarre
sorting rules used by some locales. In particular the reason I want to
append a letter is that some locales discriminate against non-letter
characters in the first pass of sorting.
I did do some experimentation and found that among the ASCII characters
(ie, codes 32-126), nearly all the non-C locales on my Fedora machine
sort Z last and z next-to-last or vice versa. Most of the remainder
sort digits last and z or Z as the last non-digit character. Since Z is
not that close to the end of the sort order in C locale, however, z
seems the best bet.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
|