Group: pgsql.hackers


Subject: [GENERAL] Slow PITR restore
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 12/13/2007 4:41:55 PM
Heikki Linnakangas <heikki@enterprisedb.com> writes: > Hmm. That assumes that nothing else than the WAL replay will read > pages into shared buffers. I guess that's true at the moment, but it > doesn't seem impossible that something like Florian's read-only queries > on a stand by server would change that. A general comment on this thread: the idea of putting any sort of asynchronous behavior into WAL recovery gives me the willies. Recovery is inherently one of the least-exercised parts of the system, and it gets more so with each robustness improvement we make elsewhere. Moreover, because it is fairly dumb, anything that does go wrong will likely result in silent data corruption that may not be noted until much later. Any bugs we introduce into recovery will be very hard to find ... and timing-dependent ones will be damn near impossible. So in my mind the watchword has got to be KISS. If that means that recovery isn't terribly speedy, so be it. I'd far rather get the right answer slower. Also, I have not seen anyone provide a very credible argument why we should spend a lot of effort on optimizing a part of the system that is so little-exercised. Don't tell me about warm standby systems --- they are fine as long as recovery is at least as fast as the original transactions, and no evidence has been provided to suggest that it's not. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings

Subject: [GENERAL] Slow PITR restore
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 12/13/2007 5:10:44 PM
Heikki Linnakangas <heikki@enterprisedb.com> writes: > Koichi showed me & Simon graphs of DBT-2 runs in their test lab back in > May. They had setup two identical systems, one running the benchmark, > and another one as a warm stand-by. The stand-by couldn't keep up; it > couldn't replay the WAL as quickly as the primary server produced it. > IIRC, replaying WAL generated in a 1h benchmark run took 6 hours. [ shrug... ] This is not consistent with my experience. I can't help suspecting misconfiguration; perhaps shared_buffers much smaller on the backup, for example. > One KISS approach would be to just do full page writes more often. It > would obviously bloat the WAL, but it would make the replay faster. ... at the cost of making the primary lots slower. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly

Subject: [GENERAL] Slow PITR restore
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 12/13/2007 7:37:39 PM
Josh Berkus <josh@agliodbs.com> writes: > Tom, >> [ shrug... ] This is not consistent with my experience. I can't help >> suspecting misconfiguration; perhaps shared_buffers much smaller on the >> backup, for example. > You're only going to see it on SMP systems which have a high degree of CPU > utilization. That is, when you have 16 cores processing flat-out, then > the *single* core which will replay that log could certainly have trouble > keeping up. You are supposing that replay takes as much CPU as live query processing, which is nonsense (at least as long as we don't load it down with a bunch of added complexity ;-)). The argument that Heikki actually made was that multiple parallel queries could use more of the I/O bandwidth of a multi-disk array than recovery could. Which I believe, but I question how much of a real-world problem it is. For it to be an issue, you'd need a workload that is almost all updates (else recovery wins by not having to replicate reads of pages that don't get modified) and the updates have to range over a working set significantly larger than physical RAM (else I/O bandwidth won't be the bottleneck anyway). I think we're talking about an extremely small population of real users. regards, tom lane 3e ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster