![]() |
|
Message-ID: <CADDzAfO4_fcrefSR5OVHipoB0QCx7t+jMffd5m9Zotxqbyxp-w@mail.gmail.com> Date: Tue, 29 Apr 2025 17:14:54 +0800 From: Kang-Che Sung <explorer09@...il.com> To: musl@...ts.openwall.com Cc: Alejandro Colomar <alx@...nel.org> Subject: mbsnrtowcs(3) behavior not compatible with POSIX.1-2024 Hi, musl libc developers, I just tested the mbsnrtowcs function in musl libc and discovered there is one behavior that is not compatible with the new POSIX.1-2024 standard. It's this thing: POSIX.1-2017 stated "If the input buffer ends with an incomplete character, it is unspecified whether conversion stops at the end of the previous character (if any), or at the end of the input buffer. [...] A future version may require that when the input buffer ends with an incomplete character, conversion stops at the end of the input buffer." (Reference: https://pubs.opengroup.org/onlinepubs/9699919799/functions/mbsrtowcs.html) POSIX.1-2024 now requires the conversion stop at the end of the input buffer in that case. (https://pubs.opengroup.org/onlinepubs/9799919799/functions/mbsrtowcs.html) (https://www.austingroupbugs.net/view.php?id=616) Test code ```c #include <locale.h> #include <stdio.h> #include <string.h> #include <wchar.h> wchar_t wcs[100]; char mbs[100]; int main() { mbstate_t state; const char *s; setlocale(LC_CTYPE, "en_US.UTF-8"); memset(&state, 0, sizeof(state)); // U+754C U+7DDA memcpy(mbs, "\xe7\x95\x8c\xe7\xb7\x9a", 7); s = mbs; printf("%zu, ", mbsnrtowcs(wcs, &s, 5, 100, &state)); printf("%td\n", s - mbs); // Expected output: "1, 5". Actual output in musl: "1, 3". memset(&state, 0, sizeof(state)); memcpy(mbs, "\xe7\x95\x8c\xe7\xb7", 6); s = mbs; printf("%zu, ", mbsnrtowcs(wcs, &s, 6, 100, &state)); printf("%td\n", s - mbs); // Expected output: "18446744073709551615, 3" } ``` By the way, I Cc'd the Linux man pages' maintainer as I plan to suggest a patch to the mbsnrtowcs(3) man page. And it would be good to see the behaviors of mbsnrtowcs consistent between glibc and musl libc.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.