Re: [misc] [PATCH] Update UTF-8 base code

On Wed, 29 Nov 2017 at 22:19:10, Lars Henriksen wrote:
> UTF-8 encodes characters in one to four bytes (since 2003).
> 
> Because 0 is a valid code point, the decode function utf8_ord()
> should return -1, not 0, on error. As a consequence utf8_width()
> should return 0 for a continuation byte (as it did previously).
> ---
>  src/calcurse.h |  9 +++------
>  src/utf8.c     | 28 +++++++++-------------------
>  2 files changed, 12 insertions(+), 25 deletions(-)
> [...]

Thanks for working on this! The changes look good, apart from the
comment below...

> @@ -326,13 +320,9 @@ int utf8_width(char *s)
>  /* Get the width of a UTF-8 string. */
>  int utf8_strwidth(char *s)
>  {
> -       int width = 0;
> -
> -       for (; s && *s; s++) {
> -               if (!UTF8_ISCONT(*s))
> -                       width += utf8_width(s);
> -       }
> -
> +       int width;
> +       for (width = 0; *s; width += utf8_width(s++))
> +               ;

I find this to be much less readable than the original code. Doing the
increment of both "width" and "s" in a single statement like this is
quite tricky. Can we write

    for (width = 0; *s; s++)
    	width += utf8_width(s);

instead to improve readability? I am also unsure about removing the
safeguard (NULL check) but I can see there are arguments for it :)

>         return width;
>  }
>  
> -- 
> 2.14.2.666.gea220ee40
> 
> 

Links