[GSoC] Add initial libunicode parser and example #356

Open
codewithchill wants to merge 1 commits from codewithchill/kolibrios:libutf into main
First-time contributor

Hi Ivan and team,

This PR introduces the initial setup for libunicode as part of my GSoC qualification task.

What is included:

  • /programs/develop/libraries/libunicode/libunicode.asm: Contains the core parsing logic.
    • count_utf8_codepoints: Counts raw Unicode values.
    • count_utf8_graphemes: Counts visual characters (includes logic to subtract counts for Zero-Width Joiners and basic combining marks).
  • /programs/develop/libraries/libunicode/examples/console.asm: A test application that passes a complex UTF-8 string to the functions and prints the results to the console.

I have tested this against standard ASCII, Russian text, complex ZWJ emojis, and accented characters.

I am looking forward to your feedback on KolibriOS code style, formatting, and best practices so I can update this to match the official OS standards!

Hi Ivan and team, This PR introduces the initial setup for `libunicode` as part of my GSoC qualification task. **What is included:** * `/programs/develop/libraries/libunicode/libunicode.asm`: Contains the core parsing logic. * `count_utf8_codepoints`: Counts raw Unicode values. * `count_utf8_graphemes`: Counts visual characters (includes logic to subtract counts for Zero-Width Joiners and basic combining marks). * `/programs/develop/libraries/libunicode/examples/console.asm`: A test application that passes a complex UTF-8 string to the functions and prints the results to the console. I have tested this against standard ASCII, Russian text, complex ZWJ emojis, and accented characters. I am looking forward to your feedback on KolibriOS code style, formatting, and best practices so I can update this to match the official OS standards!
codewithchill added 1 commit 2026-03-09 07:40:59 +00:00
Add initial libunicode for UTF-8 parsing and example
Some checks are pending
Build system / Check kernel codestyle (pull_request) Blocked by required conditions
Build system / Build (pull_request) Blocked by required conditions
6de88d5fd0
- Added libunicode.asm to parse UTF-8 strings.
- Implemented count_utf8_codepoints to skip continuation bytes.
- Implemented count_utf8_graphemes to handle ZWJ (E2 80 8D) and combining marks (CC/CD).
- Added console.asm to the examples folder to test and print the results.
- Submitted for GSoC qualification task.
Some checks are pending
Build system / Check kernel codestyle (pull_request) Blocked by required conditions
Required
Details
Build system / Build (pull_request) Blocked by required conditions
Required
Details
This pull request doesn't have enough required approvals yet. 0 of 2 official approvals granted.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u libutf:codewithchill-libutf
git checkout codewithchill-libutf
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: KolibriOS/kolibrios#356