Unicode database concepts
Firebird is a great database server that supports both single-byte and unicode databases and clients. But if you have been using single-byte databases for some time and you want to migrate to unicode, you better be aware of several conceptual changes.
Firebird uses ICU libraries to handle Unicode data. This is a good idea since Unicode is a complex topic and ICU libraries are free software developed and maintained by IBM. The thing is that ICU libraries are advancing very fast and different ICU versions generate different keys used for Firebird indexes.
This means that if you generate your collation according to any locale, your Unicode database is bound to the ICU version you have used at the time of creation and cannot be used with another version (older or newer).
You can move single-byte databases freely from server to server (plain copy) and they always work. For Unicode databases this may not be the case. Using default collations (UCS_BASIC, UNICODE, UNICODE_CI) may be safe but I have not tested these cases extensively. The situation changes if you are using national collations. You can still move your database freely but only when the same ICU version is used on both servers.
If you move the database to a server with different ICU, you can connect to the database but every select returns en error
collation is not installed
The good news is that GBAK dump and restore always works and this way you can copy data from any server to any server. Another good news is that Firebird can use several ICU versions at the same time.
Bundled and system ICU
If you download Firebird from the the project website, it uses so-called bundled ICU. It means that it carries an ICU library and uses always this one. All the Firebird 2.x versions come with ICU version 3.0, it is located in the firebird/lib directory and consists of three files (libicudata, libiuui18n, libicuuc).
But this is not a full-featured ICU. In order to keep it small, almost all locales are stripped (original libicudata is about 9MB while the bundled one is about 1.5MB). In order to be able to create your collations according to locales you will need to replace these libraries with full versions (see below).
Linux distributions (Debian, Ubuntu for sure) compile Firebird wirh so-called system ICU. This means that it uses the ICU library that comes with your system. This is always the full version of the library but the version numbers may differ wildly.
I prefer to use Firebird release from the project website and to configure ICU libraries myself as I describe below.
Please note that Firebird 3.0 will bring in some changes. It is reported that the bundled ICU version may be abandoned. I have no information on configuration options' changes.
Before we start
The next steps presume that you have Firebird 2.5.1 downloaded and installed in the /opt directory. These steps are tested with Firebird release from the Firebird project website, on Ubuntu 11.10-64.
Note that Firebird 2.5.0 may not work with ICU versions newer than 4.2 since there was a naming change in the ICU APIs with the next release. Firebird 2.5.1 works fine with both old and new ICU versions.
Step 1: Downloading and Setting Up the ICU
ICU can be downloaded from the ICU-project web site at http://site.icu-project.org/download
There are binaries for many systems as well as sources. In this example we will download the binaries for ICU 4.8 (icu4c-4_8_1_1-RHEL6-x64.tgz) and sources for ICU 3.0 since there are no binaries for 64bit Linux (icu-3.0.tgz).
It is simple to install the binaries. Decompress the file, go to the usr/local/lib directory and copy the needed files into /opt/firebird/lib directory.
cd icu4c-4_8_1_1-RHEL6-x64/usr/local/lib
cp libicudata.so.48* /opt/firebird/lib/
cp libicui18n.so.48* /opt/firebird/lib/
cp libicuuc.so.48* /opt/firebird/lib/
Then open /opt/firebird/intl/fbintl.conf and set up the ICU versions you want to enable. We will tell Firebird to use both the default and the new ICU 4.8 version.
Before:
<intl_module builtin>
icu_versions default
icu_versions default
</intl_module>
After:
<intl_module builtin>
icu_versions default 4.8
</intl_module>
icu_versions default 4.8
</intl_module>
It is important to make changes in the builtin section not in the fbintl section. The changes in the fbintl section have no influence.
More examples how the icu_versions line may look like (these all work):
icu_versions default
icu_versions default 4.8
icu_versions 3.0
icu_versions 3.0 4.8
When you have saved the changes, restart the Firebird service.
/etc/init.d/firebird restart
Step 1a: compiling ICU 3.0 from sources
If you need to compile ICU 3.0, it is quite easy, there is just one little catch.
Decompress the downloaded file and do the usual thing:
cd icu/sources
./configure
make
The catch is that the compilation may stop with an error. It is usually just a small mistake in the file layoutex/ParagraphLayout.cpp. If this is the case, open the file layoutex/ParagraphLayout.cpp, go the line 748 and change the #elif directive to #else. Save and make again.
When the make is finished you will find the libraries in the icu/sources/lib directory. Copy them the same way as described above, replacing the Firebird original ICU libraries version 3.0.
Step 2: Create the database
create database
'localhost:/path/to/database.fdb'
default character set UTF8;
Step 3: Create the national collation
First connect to the database. Then:
create collation UNICODE_CSCZ_CI
for UTF8
from UNICODE
case insensitive
'LOCALE=cs_CZ;ICU-VERSION=4.8';
This command creates a czech (cs_CZ) case insensitive collation using ICU version 4.8.
- Note: quoted attributes on the last line are case sensitive
- Note: if the given locale is not present in the ICU library, you get an error invalid collation attributes
- Note: if the given ICU version is not configured in Firebird or libraries are missing, you get an error invalid collation attributes
- Note: ICU libraries must be located in the firebird/lib directory.
Step 4: Use the national collation
This command sets up the collation as the default for the database.
alter character set UTF8
set default collation UNICODE_CSCZ_CI;
Conclusion
And this is all there is to it. It really boils down to just a 3 simple commands (create database, create collation, alter character set) provided that the correct versions of ICU libraries are present in the right directories and correctly set up ;-)
εὕρηκα
Useful links
1 comment:
There was also an issue with
icu 52.1 in Debian and Ubuntu
but now is solved at least in Debian/Ubuntu packages
http://www.firebirdnews.org/?p=9201
Post a Comment