Monday, November 26, 2007

Unicode

At some point more details will come out on how Delphi and C++Builder are going to handle unicode in the next release codenamed Tiburón. I started posting some heads up posts about problems I've encountered that have nothing to do with how the Tiburón implementation will be, but rather general guidelines and advice in this post. I will continue to do that, but I wanted to let everyone know my current experience.

We do have an implementation of unicode and we have been going through our code to take advantage of the new unicode support. I just completed the tlibxxx.bpl package which is one of the nastiest packages in the product due to interfacing with RTTI and it really wasn't bad. In fact it was easy. We'll get more information out to everyone as we can but I just wanted to let everyone know this really isn't scary. Sure depending on your code you could be doing some things that make it more difficult but in general I think it'll be very straight forward.

9 comments:

Namık Kemal KARASU said...

Hi Chris;

I've a question: Is Unicode support will be an optional compiler directive or it will be replaced with native strings/widestrings in all VCL or it will be a new type just like "unistring".

Something like that:
{$UNICODE ON}
//or
var
s: unistring;

Cheers

Chris Bensen said...

namık kemal karasu,

I can't talk about implementation details yet especially since nothing is finalized. Nick Hodges has mentioned a few things publicly so your best source of public information is to search for what he has said about unicode.

Joe White said...

Will the Unicode strings be refcounted like AnsiStrings are? My understanding is that WideString is not refcounted, but is always copied.

Chris Bensen said...

Joe White,

That is an implementation detail, but I think it's safe to answer so I will. Yes. Unicode strings will be nearly identical to AnsiString only unicode enabled. You are correct that WideString, aka BSTR, is not reference counted and is copied unless you pass it by var or const. With that in mind, you still almost always want to pass an AnsiString as var or const so the compiler doesn't generate a try-finally block.

Anonymous said...

Unicode support, that is the great news for us!

Kryvich said...

> Unicode strings will be nearly identical to AnsiString only unicode enabled.

Very good! I'd like this behavior!

Kryvich said...

Just curious, which Unicode encoding you have selected: UTF16 or UTF32? I'd prefer UTF16 by default, with possibility to set UTF32 in the compiler options.

m. Th. said...

Hi Chris,

Your blog posts about Unicode are very appreciated! Keep them on going.
But can you enlighten us about IO string operations? Perhaps you can cover the following cases: The 'string' data is in A.) a file B.) DB String/Blob field and we need: 1. to be converted to Unicode. 2. We need to stay as is. Also take in account that in the case of a DB field we can have a much smaller window maintenance. Perhaps in the case of a Load/SaveToStream you can do a nice extension like Load/SaveToStream(aStream: TStream; aFormat: TIOFormat = iofDefault); where TIOFormat = (iofDefault, iofUnicode, iofAnsi {, etc.}); - where iofDefault its a little bit nasty, I accept it. While in the case of Load/SaveToStream you can do the above trick, in order to assure a smooth conversion, in the case of {Field}.AsString it's a little bit trickier. I think now about {Field}.SaveFormat: TIOFormat; {Field}.LoadFormat: TIOFormat; in order to achieve the same effect from above. What is your approach?

Also, please tell us some words about TStringStream. This nice class really begs you to put binary data in it... ;-)

Anonymous said...

is it the code name Tiburón? instead of tiberon

Post a Comment