Date: Wed, 10 Mar 2004 13:18:17 +0100 From: Gabriele Bartolini To: htdig-general@lists.sourceforge.net Subject: [htdig] Another small patch for ht://Dig 3.1.6 Hi guys, just to let you know that I discovered a small bug in the parsing of UNICODE characters (e.g. &#...;) that can't fit in a 'char' size (whose code is bigger than 255). This small patch simply ignores those characters, preventing an overflow. I put it available on my personal website: http://www.prato.linux.it/~gbartolini/en/view-a/23/ Ciao and thanks, -Gabriele The ht://Dig search engine is probably one of the most spread search engine available in the open-source community. I found a very small bug in the source code and this patch fixes it. HTML authors can specify characters in unicode, using the &#digit; syntax; as ht://Dig can handle only ASCII characters, when the specified character is bigger than 255 an overflow may occur. This patch prevents this from happen. How to install the patch First, in order to install a patch, you need the original version; you can therefore download the 3.1.6 version of ht://Dig from the main site. You can apply the regular expressions and cookies patch either before or after this (they are compatible). Then, download the patch for correct SGML codes handling, put it into the sources directory and run: zcat sgml.patch.1.gz | patch -p1. If you get no errors ... the patch is applied and you can follow the normal configure + make + make install instructions diff -3upNr htdig-3.1.6/htdig/SGMLEntities.cc htdig-3.1.6-sgml/htdig/SGMLEntities.cc --- htdig-3.1.6/htdig/SGMLEntities.cc Fri Feb 1 00:47:17 2002 +++ htdig-3.1.6-sgml/htdig/SGMLEntities.cc Wed Mar 10 08:55:25 2004 @@ -161,7 +161,11 @@ SGMLEntities::translate(char *entity) // // This looks like a numeric entity. That's fine. // - return atoi(entity + 1); + unsigned int c = atoi(entity + 1); + if (c>255) + return ' '; // Not ASCII - Change it to a space and ignore it + else + return (unsigned char) c; } else {