Wednesday, January 23, 2008

Unicode: SizeOf(Char) and Sizeof(Byte)

New to Tiburon SizeOf(Char) will not equal Sizeof(Byte). This means that any pointer arithmetic currently being done by casting something to a PChar should be changed to use a PByte. This is a change in the language because all current versions don't allow pointer arithmetic on PByte and of course because Char will be mapped to WideChar instead of AnsiChar.

So if you are going through your code now to get it ready for unicode, I suggest adding {$IFDEF UNICODE} or something equivalent around the code that is ANSI only so you can test it and mark it to be looked at later.

I'm thinking of creating a unicode FAQ where I gather up all the unicode information into one location. Would that be useful for everyone?

11 comments:

Anonymous said...

To me seems useful.

stanleyxu (2nd) said...

I am waiting for the FAQ. I have already two questions here:
1. Is there a compiler switch available in next release? like {$IFDEF UNICODE}
2. When does WideString really behavior as COM BSTR?
[yes] When a WideString is assigned to an OleVariant
[no] When being passed as a parameter of a Window API.
[?] When being passed as a parameter of a method of an interface.

Fernando Madruga said...

"Would that be useful for everyone?"

Of course! I'm not considering unicode in the near future, but I can see that resource to be useful to everyone! An example is the code break that you just pointed out!

Having a FAQ to bookmark and consult sure beats looking up a dozen blogs! :)

Anonymous said...

Will FillChar be marked deprecated and replaced with (say) FillBytes? It may sound minor, but the current naming of such a basic routine is the sort of thing that might lead to confusion.

Rob Kennedy said...

If we're casting to PChar today to do byte-based pointer arithmetic, an ifdef doesn't sound like the best solution. Rather, we should change our code to cast to PAnsiChar instead. Then we'll continue to have byte-based arithmetic, and the code will compile and run properly today and tomorrow.

With an ifdef, we need to look at the code twice. Once today, to put the ifdef in all the right places, and then again tomorrow when the new compiler makes the ifdef select the other branch that we weren't able to test before the new compiler arrived.

Aside from that, I'd like you to clarify: Did you say that the next version will support pointer arithmetic on non-character types, such as PByte?

Anonymous said...

@stanleyyxu

As Allen Bauer mentioned in his recent posts about unicode:

1) There will not be a compiler switch.
2) WideString will remain BSTR compatible as it always has been (ie. not reference counted). There is a new string type is called UnicodeString which is reference counted like String.

Anonymous said...

"Would that be useful for everyone?"

absolutely! i know precious little about unicode but my customers want my app to be unicode. any best practices, "how to" or details would be welcomed.

Chris Bensen said...

Rob,

The only reason I suggested the ifdef is so you knew to go look at your code in the future. The PAnsiChar is a more direct choice to make now however.

Correct, other types will support pointer arithmetic. It should be a cool feature. Something I've been asking for since starting the Unicode work.

Chris Bensen said...

cr,

We probably will need to add a FillBytes method.

Unknown said...

If I've got a variable P of type PChar, and I do Inc(P), will that move one byte or one character? I think that C's ++ operator does the latter, but I'm not sure about Delphi's Inc.

Rob Kennedy said...

Joe, for any pointer type P, Inc(P) advances by SizeOf(P^) bytes. So if P is a PChar, it advances by one Char, just like C++'s increment operator.

Post a Comment