Group: pgsql.patches


Subject: prevent invalidly encoded input
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 9/11/2007 2:06:09 PM
Andrew Dunstan <andrew@dunslane.net> writes: > Attached is a patch to the scanner and the COPY code that checks for > invalidly encoded data that can currently leak into our system via \ > escapes in quoted literals or text mode copy fields, as recently > discussed. That would still leave holes via chr(), convert() and > possibly other functions, but these two paths are the biggest holes that > need plugging. The COPY code looks sane. On the scan.l change, I believe two out of three of those calls are useless, because we do not do backslash processing in dollar-quoted strings nor in quoted identifiers. Also, I'd kinda like to have the check-for-high-bit optimization in scan.l too --- some people do throw big literals at the thing. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

Subject: prevent invalidly encoded input
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 9/11/2007 11:05:16 PM
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> Also, I'd kinda like to have the check-for-high-bit optimization in >> scan.l too --- some people do throw big literals at the thing. >> > OK, will do. Am I correct in thinking I don't need to worry about the > <xeescape> case, only the <xeoctesc> and <xehexesc> cases? [ squint ... ] Hm, wouldn't bet on it. That leads to unescape_single_char(), which is fine for the cases that it explicitly knows about (\b and so on), but what if the following byte has the high bit set? Not only would that pass through a high bit to the output, but very possibly this results in disassembling a multibyte input character. So it looks like you need to recheck if unescape_single_char sees a high-bit-set char. You should take a second look at the COPY code to see if there's a similar case there --- I forget what it does with backslash followed by non-digit. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

Subject: prevent invalidly encoded input
From: tgl@sss.pgh.pa.us (Tom Lane)
Date: 9/12/2007 12:19:18 PM
Andrew Dunstan <andrew@dunslane.net> writes: > addlitchar(unescape_single_char(yytext[1])); > + if (IS_HIGHBIT_SET(literalbuf[literallen])) > + saw_high_bit = true; Isn't that array subscript off-by-one? Probably better to put the test inside unescape_single_char(), anyway. Otherwise it looks sane, though maybe shy a comment or so. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings