Group: pgsql.performance


Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/7/2007 11:34:52 AM
"Guillaume Smet" <guillaume.smet@gmail.com> writes: > [ bad estimate for LIKE ] Hmmm ... what locale are you working in? I notice that the range estimator for this pattern would be "ancestors >= '1062/' AND ancestors < '10620'", which will do the right thing in C locale but maybe not so much elsewhere. > Version is PostgreSQL 8.1.8 on i686-redhat-linux-gnu, You'd probably get better results with 8.2, which has a noticeably smarter LIKE-estimator, at least for histogram sizes of 100 or more. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq

Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/7/2007 11:58:49 AM
"Guillaume Smet" <guillaume.smet@gmail.com> writes: > On 11/7/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmmm ... what locale are you working in? I notice that the range >> estimator for this pattern would be "ancestors >= '1062/' AND >> ancestors < '10620'", which will do the right thing in C locale >> but maybe not so much elsewhere. > Sorry for not having mentioned it before. Locale is UTF-8. I wanted the locale (lc_collate), not the encoding. > I suppose my best bet is to remove the pg_statistic line and to set > the statistics to 0 for this column so that the stats are never > generated again for this column? That would optimize this particular query and probably pessimize a lot of others. I have another LIKE-estimation bug to go look at today too; let me see if this one is fixable or not. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend

Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/7/2007 6:14:08 PM
I wrote: > "Guillaume Smet" <guillaume.smet@gmail.com> writes: >> [ bad estimate for LIKE ] > Hmmm ... what locale are you working in? I notice that the range > estimator for this pattern would be "ancestors >= '1062/' AND > ancestors < '10620'", which will do the right thing in C locale > but maybe not so much elsewhere. I've applied a patch that might help you: http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org

Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/8/2007 10:01:09 AM
"Guillaume Smet" <guillaume.smet@gmail.com> writes: > On Nov 8, 2007 12:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I've applied a patch that might help you: >> http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php > AFAICS, it doesn't seem to fix the problem. Hmm, can we see the pg_stats row for the ancestors column? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq

Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/8/2007 12:22:28 PM
"Guillaume Smet" <guillaume.smet@gmail.com> writes: > On Nov 8, 2007 12:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I've applied a patch that might help you: >> http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php > AFAICS, it doesn't seem to fix the problem. I just compiled > REL8_1_STABLE branch and I still has the following behaviour: OK, I tried it in fr_FR locale and what I find is that regression=# select '123/' < '1230'::text; ?column? ---------- t (1 row) so make_greater_string() will still think that its first try at generating an upper-bound string is good enough. However regression=# select '123/1' < '1230'::text; ?column? ---------- f (1 row) so the data starting with '123/' is still outside the generated range, leading to a wrong estimate. I didn't see this behavior yesterday but I was experimenting with en_US which I guess has different rules. What I am tempted to do about this is have make_greater_string tack "zz" onto the supplied prefix, so that it would have to find a string that compares greater than "123/zz" before reporting success. This is getting pretty klugy though, so cc'ing to pgsql-hackers to see if anyone has a better idea. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org

Subject: Estimation problem with a LIKE clause containing a /
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 11/8/2007 7:40:20 PM
Gregory Stark <stark@enterprisedb.com> writes: > Doesn't really strike at the core reason that this is so klugy though. Surely > the "right" thing is to push the concept of open versus closed end-points > through deeper into the estimation logic? No, the right thing is to take the folk who defined "dictionary sort order" out behind the barn and shoot 'em ;-). This has got nothing to do with open/closed endpoints and everything to do with the bizarre sorting rules used by some locales. In particular the reason I want to append a letter is that some locales discriminate against non-letter characters in the first pass of sorting. I did do some experimentation and found that among the ASCII characters (ie, codes 32-126), nearly all the non-C locales on my Fedora machine sort Z last and z next-to-last or vice versa. Most of the remainder sort digits last and z or Z as the last non-digit character. Since Z is not that close to the end of the sort order in C locale, however, z seems the best bet. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org