Date: Thu, 13 Nov 2003 17:17:24 -0600 (CST) From: Gilles Detillieux To: Joe R. Jah Cc: "ht://Dig mailing list" Subject: [htdig] Re: Porting fileSpace.1 from 3.1.6 to 3.2.0b5;) According to Joe R. Jah: > Hi Gilles, > > I have been trying to port fileSpace.1 from 3.1.6 to 3.2.0b5. At first it > appeared to be relatively simple, attached, but: > > URL.cc: In method `URL::URL(const String &, const URL &)': > URL.cc:147: `config' undeclared (first use this function) > URL.cc:147: (Each undeclared identifier is reported only once > URL.cc:147: for each function it appears in.) > URL.cc:149: `ref' undeclared (first use this function) > URL.cc: In method `void URL::parse(const String &)': > URL.cc:341: ambiguous overload for `const String & + int' > URL.cc:341: candidates are: operator +(const char *, int) > ../htlib/htString.h:162: class String operator +(const > String &, const String &) > URL.cc:342: passing `char **' to argument 1 of `__istype(int, long > unsigned int)' lacks a cast > URL.cc:349: no `operator ++ (int)' declared for postfix `++', trying > prefix operator instead > URL.cc:349: no match for `++const String &' > gmake[1]: *** [URL.lo] Error 1 > > Please advise. There are a few important internal changes in 3.2 that seem to have tripped you up: 1) config is no longer a globally declared object, so instead each function must get and use a pointer to the HtConfiguration object. 2) most argument passing of char * types was replaced by String &, so if you need to go back to char * handling you need to explicitly declare and set the pointer. 3) liberal use of the const type modifier to preserve an idea of what will remain constant, so you have to be careful about using this correctly when declaring and setting pointers. Here's a corrected patch, which at least compiles cleanly, though I haven't tested it beyond that... --------------------------------------------------------------- This patch adds an allow_space_in_url attribute to htdig 3.2.0b5, so that you can get htdig to handle URLs that contain embedded spaces. Technically, this is a violation of RFC 2396, which says spaces should be stripped out (as htdig does by default). However, many web browsers and HTML code generators violate this standard already, so enabling this attribute allows htdig to handle these non-compliant URLs. Even with this attribute set, htdig still strips out all white space (leading, trailing and embedded), except that space characters embedded within the URL will be encoded as %20. --- htcommon/URL.cc.orig 2003-07-21 07:40:16.000000000 -0500 +++ htcommon/URL.cc 2003-11-13 16:50:03.000000000 -0600 @@ -144,8 +144,26 @@ URL::URL(const String &url, const URL &p _signature(parent._signature), _user(parent._user) { - String temp(url); - temp.remove(" \r\n\t"); + HtConfiguration* config= HtConfiguration::config(); + int allowspace = config->Boolean("allow_space_in_url", 0); + String temp; + const char *urp = url.get(); + while (*urp) + { + if (*urp == ' ' && temp.length() > 0 && allowspace) + { + // Replace space character with %20 if there's more non-space + // characters to come... + const char *s = urp+1; + while (*s && isspace(*s)) + s++; + if (*s) + temp << "%20"; + } + else if (!isspace(*urp)) + temp << *urp; + urp++; + } char* ref = temp; // @@ -314,8 +332,26 @@ void URL::rewrite() // void URL::parse(const String &u) { - String temp(u); - temp.remove(" \t\r\n"); + HtConfiguration* config= HtConfiguration::config(); + int allowspace = config->Boolean("allow_space_in_url", 0); + String temp; + const char *urp = u.get(); + while (*urp) + { + if (*urp == ' ' && temp.length() > 0 && allowspace) + { + // Replace space character with %20 if there's more non-space + // characters to come... + const char *s = urp+1; + while (*s && isspace(*s)) + s++; + if (*s) + temp << "%20"; + } + else if (!isspace(*urp)) + temp << *urp; + urp++; + } char *nurl = temp; // -- Gilles R. Detillieux E-mail: Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)