Date: Wed, 21 Apr 2004 23:13:27 +1000 From: Lachlan Andrew To: Gilles Detillieux , Christopher Murtagh Cc: htdig-dev@lists.sourceforge.net Subject: [htdig-dev] Re: Performance issue with exclude_urls Greetings Gilles + all, Yes, I agree that we need a more "polished" patch for the distribution. I still like my intermediate path: If *any* server blocks or URL blocks are used, then the user takes the performance hit and re-parses each time. If *no* server/URL blocks are used, we use Chris's patch. This should be just as fast as Chris's patch (in the "3.1-compatibly mode" without server/URL blocks), and just as flexible as the current status (if blocks are used). If that can get ht://Dig fast enough to get into sarge, then I suggest we implement it first, and then work on Gilles's more complete solution at more leisure. A first hack at this (not even compile-tested) is attached, patched relative to Chris's patched version, so you can see what I mean. If people are in favour, I'll try to work on it over the weekend. One issue with caching input strings is that we would have to have some sort of cache-flushing, or just let the storage grow as HtRegEx is called repeatedly. Cheers, Lachlan On Wed, 21 Apr 2004 07:45 am, Gilles Detillieux wrote: > Hi, Chris and other developers. The problem with this fix is that > exclude_urls and bad_querystr can no longer be used in server > blocks or URL blocks, as they'll only be parsed once regardless of > how they're used. --- htcommon/conf_parser.h 2003-09-26 22:22:57.000000000 +1000 +++ htcommon/conf_parser.h 2004-04-21 22:56:50.000000000 +1000 @@ -71,3 +71,4 @@ +extern bool config_server_URL_blocks; --- htcommon/conf_parser.cxx 2003-11-22 15:15:40.000000000 +1100 +++ htcommon/conf_parser.cxx 2004-04-21 22:56:32.000000000 +1000 @@ -99,6 +99,8 @@ #include "htconfig.h" #endif /* HAVE_CONFIG_H */ +bool config_server_URL_blocks = false; + /* Bison version > 1.25 needed */ /* TODO: 1. Better error handling @@ -1131,6 +1133,7 @@ case 11: { + config_server_URL_blocks=true; // check if " ... " are equal if (strcmp(yyvsp[-10].str,yyvsp[-2].str)!=0) { // todo: setup error string, return with error. --- htdig/Retriever.cc 2004-04-21 22:58:07.000000000 +1000 +++ htdig/Retriever.cc 2004-04-21 22:58:39.000000000 +1000 @@ -996,7 +996,7 @@ // mark it as invalid // - if(!(exclude_parsed)){ + if(config_server_URL_blocks || !(exclude_parsed)){ //only parse this once and store into global variable tmpList.Destroy(); tmpList.Create(config->Find(&aUrl, "exclude_urls"), " \t"); @@ -1016,7 +1016,7 @@ // mark it as invalid // - if(!(badquerystr_parsed)){ + if(config_server_URL_blocks || !(badquerystr_parsed)){ //only parse this once and store into global variable tmpList.Destroy(); tmpList.Create(config->Find(&aUrl, "bad_querystr"), " \t");