Commit Graph

4 Commits

Author SHA1 Message Date
05ecfe005c Add robust UTF-8 character validation and length check
Implemented the `is_valid_utf8_char` procedure to safely validate UTF-8
sequences and return their byte length (1-4, or 0 if invalid).

This routine implements strict Unicode compliance checks, including:
- Rejection of overlong encodings (e.g., checking 0xC0/0xC1, and strict
bounds for 0xE0/0xF0).
- Prevention of surrogate half decoding (restricting 0xED bounds).
- Enforcement of the maximum Unicode scalar value limit (U+10FFFF).
- Safe handling of null-terminators and truncated sequences.

This provides a secure foundation for upgrading the codepoint and
grapheme counting functions in upcoming commits.
2026-03-21 12:37:43 +05:30
92a0deabad Convert Unicode procedures to stdcall convention.
1. Use cinvoke macro instead of invoke for con_printf
2. Convert `end:` to `P_END:`, As the previous one was a FASM keyword
3. Upgrade all functions in libunicode.asm to use stdcall convention
2026-03-20 20:42:37 +05:30
45a85015c1 Convert initial libunicode into library structure
Edit the example file to reflect the changes
2026-03-20 17:52:59 +05:30
8de745c811 Add initial libunicode for UTF-8 parsing and example
- Added libunicode.asm to parse UTF-8 strings.
- Implemented count_utf8_codepoints to skip continuation bytes.
- Implemented count_utf8_graphemes to handle ZWJ (E2 80 8D) and combining marks (CC/CD).
- Added console.asm to the examples folder to test and print the results.
- Submitted for GSoC qualification task.
2026-03-15 23:22:08 +05:30