From urs at tin.org Tue Sep 20 14:27:25 2022 From: urs at tin.org (Urs =?UTF-8?Q?Jan=C3=9Fen?=) Date: Tue, 20 Sep 2022 14:27:25 +0200 Subject: [tin-dev] URL_REGEX update Message-ID: The following update for the URL_REGEX should stop capturing illegal path components (non ascii chars) like in the 2nd url and take into account that non punycode TLDs exists with 18 chars *sigh* === modified file 'include/tin.h' --- include/tin.h 2022-08-29 13:27:10 +0000 +++ include/tin.h 2022-09-20 12:06:30 +0000 @@ -707,7 +707,7 @@ * - test IDNA (RFC 3490) case * - adjust to follow RFC 3986 (section 2.3) */ -#define URL_REGEX "\\b(?:https?|ftp|gopher)://(?:[^:@/\\s]*(?::[^:@/\\s]*)?@)?(?:(?:(?:[^\\W_](?:(?:-|[^\\W_]){0,61}(?