ICU - Did U C Me?

The more observant of you out there may have noticed me popping up in the background of the recent KDE PIM Sprint, my first real KDE appearance in a while. Work has left me with very little time this year to do KDE and Qt stuff: we're not quite in a Death March, but it's getting close :-) However I made time to attend the PIM sprint to try and make progress on some of my pet projects.

I had plans to discuss a number of items at the sprint, including Plasma/Akonadi integration and ideas for new QML calendar widgets, but mainly I came to work on the new ICU localisation backend for Qt 5.1 which we plan to use in Frameworks 5 to replace KLocale, KCalendarSystem, KTimeZone and KDateTime. I've been working on the design for this on and off for quite a while now, but have only just got around to getting code written down and needed some time to just focus on getting it done. I've now pushed the first results to Gitorious for anyone interested in having a look (Warning: I ruthlessly rebase and push --force!). The code is still a little rough and broken in places but close enough to done to be worth looking at. Comments very welcome.

The core design was dictated by the somewhat opposing designs of QLocale and ICU. QLocale has the concept of Default and System locales which are shared objects in every application which consequently need to be immutable and thread-safe. ICU provides formatters that are created with a fixed locale and style, but all the other settings can be changed in a non-thread-safe way, and indeed must be changed to obtain anything other than the default formats, even simple things like changing the number of decimal places or padding requires mutation.

As a consequence I've defined two new classes, QNumberFormatter and QDateTimeFormatter, which wrap the ICU formatter object for a particular locale and style in an immutable Qt-ish API. Calling this API with options for non-default settings causes the formatter to be temporarily cloned and modified to return the desired results. Two more classes QCustomNumberFormatter and QCustomDateTimeFormatter extend the base classes with setters to allow permanent changes without cloning or messing with the default/system objects.

The QNumberFormatter and QCustomrNumberFormatter classes support the following styles:

  • DecimalStyle // 10.00
  • ScientificStyle // 1.0E2
  • PercentageStyle // 10.00%
  • OrdinalStyle // 10th
  • SpelloutStyle // ten
  • DurationStyle // 00:00:10
  • CurrencyStyle // $10.00
  • CurrencyCodeStyle // USD10.00
  • CurrencyNameStyle // 10.00 US dollars
  • DecimalPatternStyle // User defined pattern and symbols
  • RuleBasedStyle // User defined rules

The QDateTimeFormatter and QCustomDateTimeFormatter classes support the following styles:

  • FullStyle
  • LongStyle
  • MediumStyle
  • ShortStyle
  • FullRelativeStyle    // Today
  • LongRelativeStyle
  • MediumRelativeStyle
  • ShortRelativeStyle
  • IsoStyle    // 2012-01-01
  • IsoWeekStyle    // 2012-W01-1
  • IsoOrdinalStyle    // 2012-001
  • Rfc2822Style
  • Rfc3339Style

These new classes provide all the code required to replace the existing QLocale backend implementation as well as provide some new features like the Ordinal and Spell-out number formats, however that's not what really interests us in KDE or why I was at the PIM sprint. It's the next step of adding full Calendar System and Time Zone support into Qt that is the real pay-off. The ICU formatter already implicitly supports the default calendar system and time zone whatever it may be, but my current implementation does not as yet expose the ability to manipulate the calendar or time zone and that is the next step needed before integrating it all into QLocale and QDateTime. However, if you ask those sat next to me during the sprint who suffered my dark mutterings and hair-tearing, this is not going to be easy given the brain-dead design of ICU.

You see, ICU is notorious for being difficult to build against (don't ask packagers about the crazy library numbering and naming scheme), difficult to use and very poorly documented. It started life as a Java library, and it shows :-) The Java API was then ported to C++, but the C++ API comes with no binary compatibility guarantee as it's supposedly "too hard" to do. For an application that may be an inconvenience, but for a library like Qt it makes the C++ API basically unusable. Instead we have to use the C API which is a wrapper around the C++ API.

The problem is this C API is only a limited sub-set of the C++ API and while there is a partial API for Calendars with some Time Zone features, there is no C API provided for just working with Time Zones. You apparently can't even do something as simple as query what Time Zone a Calendar or Formatter is using. Then there is also the restriction we need to support older versions of OSX which ship ancient versions of ICU missing important features. It's going to take some ingenuity to work around these shortcomings, and naturally that is not going to be as efficient or reliable as I might like. Which got me wondering out loud about smarter ways of dealing with ICU, and this is where sitting in a room full of very clever people really comes in handy :-) After some discussions with Kevin and Sune I may have a proposal that allows me to use the full C++ API, but that's something for the Qt mailing list.