Wednesday, January 9, 2008

Tiburón Will Get New UnicodeString Type

Last year I talked about unicode here, here and here. At that point CodeGear wasn't publicly talking about the implement. But today Allen Bauer wrote a lot of insightful information in this post.

The question that he answers that I've been asked a lot is "are you just using WideString"? here is a quote from Allen's post:

"...There is a new data type. UnicodeString. It will be reference-counted just like AnsiString and unlike WideString which is a BSTR. This new data type will be mapped to string which is currently an AnsiString underlying type depending on how it is declared. Any string declared using the [] operator (string[64]) will still be a ShortString which is a length prefixed AnsiChar array. The UnicodeString payload will match that of the OS, which is UTF16. This means you can, at times, have surrogate pairs for characters. For characters that fall outside the Basic Multilingual Plane (BMP)."

So there you have it. You still have the old faithful AnsiString type but you also have a new UnicodeString type that is UTF16 that will be aliased to string. This means you don't have to change much of your code which is great news. More to come about UnicodeString.

5 comments:

Aivars said...

Are there planned any project options or compiler flags that will allow strings function as AnsiStrings for certain more problematic projects or units? Or at least a refactoring feature "convert all string types to ansistrings" so we don't have to search and replace in every unit of the project and its components...

stanleyxu said...

Generally speaking it is not so painful to upgrade to Unicode stage. But some cases (i.e. some WinAPI structures, Unicode clipboard) must to be checked completely.

I don't agree with the new Name Unicodestring. Also keep Widestring will remain the maximal compatibility to BSTR. But it is really confused. What WideChar and PWideChar stand for? Should n't they be UnicodeChar and PUnicodeChar either?

In my opinion, Unicodestring should be named as Widestring. And then create a new type (BSTR or whatever) for the old Widestring, and provide two functions BSTRToStr() and StrToBSTR(). All BSTR variables that are declared WideString should be corrected to BSTR.

When string=AnsiString, WideString=Old_WideString (D4-D2007), BSTR=WideString, nothing is broken.

BTW: Hi Chris, please delete my previous comment.

stanleyxu said...

Hi Chris, here is my opinion on this topic.
http://stanleyxu2005.blogspot.com/2008/01/random-thoughts-on-unicode_10.html
Sorry, I am not english native speaker.

Chris Bensen said...

aivars,

There have been talks about a compiler switch but there are too many problems with it.

Chris Bensen said...

stanleyxu,

Allen Bauer has some blog posts why the type UnicodeString was chosen that make a good read.

Post a Comment