Group: pgsql.general


Subject: Slow PITR restore
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 12/11/2007 10:21:16 PM
"Joshua D. Drake" <jd@commandprompt.com> writes: > ... Now I understand that restoring log files can be slow but this is a big > machine. Yeah, restoring is known to be less than speedy, because essentially zero optimization work has been done on it. Heikki has improved matters a bit in 8.3, but I suspect it's still not great. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq

Subject: Slow PITR restore
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 12/12/2007 12:13:58 PM
Simon Riggs <simon@2ndquadrant.com> writes: > On Tue, 2007-12-11 at 22:21 -0500, Tom Lane wrote: >> Yeah, restoring is known to be less than speedy, because essentially >> zero optimization work has been done on it. > If there was a patch to improve this, would it be applied to 8.3? Good grief, no. We have not even done the research to find out where the bottleneck(s) is/are. We're not holding up 8.3 while we go back into development mode, especially not when this problem has existed for seven or eight years (even if JD failed to notice before) and there are already some improvements for it in 8.3. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/

Subject: Slow PITR restore
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 12/12/2007 12:56:04 PM
Jeff Trout <threshar@threshar.is-a-geek.com> writes: > I've seen this on my PITR restores (thankfully, they were for > fetching some old data, not because we expoded). On a 2.4ghz opteron > it took 5-50 seconds per wal segment, and there were a LOT of > segments (replay took hours and hours). I asked a few folks and was > told it is the nature of the beast. Hopefully something in 8.4 can > be done. Before we get all panicked about that, someone should try to measure the restore speed on 8.3. It's possible that this patch already alleviated the problem: http://archives.postgresql.org/pgsql-committers/2007-05/msg00041.php regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend

Subject: Slow PITR restore
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 12/12/2007 8:32:19 PM
"Joshua D. Drake" <jd@commandprompt.com> writes: > Jeff Trout <threshar@threshar.is-a-geek.com> wrote: >> in this case it was 24hrs of data - about 1500 wal segments. During >> this time the machine was nearly complete idle and there wasn't very >> much IO going on (few megs/sec). > Exactly. Which is the point I am making. Five minutes of transactions > is nothing (speaking generally).. In short, if we are in recovery, and > we are not saturated the I/O and at least a single CPU, there is a huge > amount of optimization *somewhere* to be done. You sure about that? I tested CVS HEAD just now, by setting the checkpoint_ parameters really high, running pgbench for awhile, and then killing the bgwriter to force a recovery cycle over all the WAL generated by the pgbench run. What I saw was that the machine was 100% disk write bound. Increasing shared_buffers helped, not in that the write rate got less according to vmstat, but the completion time did. shared_buffers 32MB 100MB pgbench -c 5 -t 40000 bench 7m23s 2m20s subsequent recovery 4m26s 2m21s Typical "vmstat 1" lines during recovery: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 9 70024 29232 19876 824368 0 0 0 3152 1447 233 0 1 0 99 0 0 9 70024 29232 19876 824368 0 0 0 3660 1474 252 0 1 0 99 0 0 8 70024 28960 19876 824404 0 0 0 3176 1448 265 0 2 1 97 0 I don't see the machine sitting around doing nothing ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq

Subject: Slow PITR restore
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 12/13/2007 1:41:32 AM
"Joshua D. Drake" <jd@commandprompt.com> writes: > Tom Lane wrote: >> You sure about that? I tested CVS HEAD just now, by setting the >> checkpoint_ parameters really high, > ... And: >> 2007-12-13 00:55:20 EST LOG: restored log file "00000001000007E10000006B" from archive Hmm --- I was testing a straight crash-recovery scenario, not restoring from archive. Are you sure your restore_command script isn't responsible for a lot of the delay? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly