The NLS_CHARACTERSET of an Oracle database defines what characters can be stored in the database using the CHAR, VARCHAR2, LONG and CLOB datatypes. This blog gives a quick overview of a method to change the database character set or to check before migrating data between databases with a different NLS_CHARACTERSET.

If you change character sets there is a possibility that characters that you currently use are not defined in the new character set or that the current setup is not correctly used and therefore you could lose data when changing the NLS_CHARACTERSET.

In previous blog “2014-06 – Database Character Set Scanner”, we addressed how to install a great Oracle tool called CSSCAN. Always check data migration by using the Character Set Scanner (CSSCAN) before making any changes to your character set.

This blog will show you how to validate a data migration process prior to change the NLS_CHARACTERSET.

The current NLS_CHARACTERSET is seen in NLS_DATABASE_PARAMETERS:

SQL> SELECT value FROM NLS_DATABASE_PARAMETERS 
      WHERE parameter = 'NLS_CHARACTERSET';

VALUE
----------------------------------------
WE8MSWIN1252

In this example, our database has “WE8MSWIN1252″ character set. Our goal is to change the current NLS_CHARACTERSET from WE8MSWIN1252 to WE8ISO8859P15 character set.

Changing WE8MSWIN1252 TO WE8ISO8859P15

You can’t simply go from WE8MSWIN1252 to WE8ISO8859P15 because these aren’t binary supersets. This means that *NOT ALL* characters included in the WE8MSWIN1252 character set are included in WE8ISO8859P15 so that means going to WE8ISO8859P15 could result in a loss of data.

Actions to take

1.) Install and configure CSSCAN

Please refer to blog “2014-06 – Database Character Set Scanner”.

2.) Check if you have no invalid code points in the current character set

In this example, run csscan for “SYSMAN” schema with the following syntax:

csscan FULL=N USER='SYSMAN' FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252 LOG=1252check CAPTURE=Y ARRAY=1000000 PROCESS=2?

A csscan output run will look like this:

Enumerating tables to scan...

. process 1 scanning SYSMAN.AQ$_MGMT_NOTIFY_QTABLE_T
. process 2 scanning SYSMAN.AQ$_MGMT_LOADER_QTABLE_I
.......
. process 2 scanning SYSMAN.MGMT_HTTP_SESSION_CALLBACKS[AAAT/eAACAAAJNoAAA]
. process 1 scanning SYSMAN.MGMT_CREDENTIAL_TYPES[AAAT0uAACAAAJXYAAA]
. process 2 scanning SYSMAN.EUME2E_ASSOCS_LOOKUP[AAAUAqAACAAAJfoAAA]
. process 1 scanning SYSMAN.MGMT_ECM_GEN_SNAPSHOT[AAAT11AACAAAO1IAAA]

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.

This will create 3 files:

  • 1252check.out – a logging of the output of csscan
  • 1252check.txt – a Database Scan Summary Report
  • 1252check.err – a log file that normally should contain the rowid of the rows of the tables reported in 1252check.txt

Because you’ve entered the TO and FROM character sets as the same you would not have any “convertible” data.

Verify output results in P1252check.txt file:

Datatype                    Changeless      Convertible       Truncation
--------------------- ---------------- ---------------- ----------------
VARCHAR2                     1,236,441                0                0 
CHAR                             1,882                0                0
LONG                                 0                0                0
CLOB                             1,996                0                0
VARRAY                           5,373                0                0
--------------------- ---------------- ---------------- ---------------- 
Total                        1,245,692                0                0 
Total in percentage            100.000%           0.000%           0.000%

If you have NO entries in P1252check.txt under the Convertible or Exceptional columns then proceed to the next point.

3.) Check what rows contains data that needs to be changed from code point

Run CSSCAN with the following syntax:

csscan FULL=N user='SYSMAN' FROMCHAR=WE8MSWIN1252 TOCHAR=WE8ISO8859P15 LOG=P1252TOP15 CAPTURE=Y ARRAY=1000000 PROCESS=2

This will create 3 files:

  • P1252TOP15.out – a logging of the output of csscan
  • P1252TOP15.txt – a Database Scan Summary Report
  • P1252TOP15.err – a log file that normally should contain the rowid of the rows of the tables reported in P1252TOP15.txt
Datatype                    Changeless      Convertible       Truncation
--------------------- ---------------- ---------------- ---------------- 
VARCHAR2                     1,232,510                0                0 
CHAR                             1,882                0                0 
LONG                                 0                0                0 
CLOB                             1,996                0                0 
VARRAY                           5,373                0                0  
--------------------- ---------------- ---------------- ---------------- 
Total                        1,241,761                0                0   
Total in percentage            100.000%           0.000%           0.000%

If you have NO entries in P1252TOP15.txt under the Convertible or Exceptional columns then proceed to step 5 but if you have entries go to step 4.

4.) If you have Convertible or Exceptional entries, you need to process that data:

* any “EXEPTIONAL” data contains characters that are known in the characterset WE8MSWIN1252 but not in the WE8ISO8895P15. This data will be lost. It needs to be deleted before going further.

* any “CONVERTIBLE” data needs to be exported and then deleted. This data can be translated via the export / import process after the database character set is converted to preserve the data.

* If you have indexes on any tables with Convertible or Exceptional data, you will get entries in P1252TOP15.txt under the “[Indexes to be Rebuilt]” section. Deleting the rows will also clear the need for a rebuild in most cases.

Run csscan again to see if you have missed some data. If this reports now no Convertible or Exceptional data any more, then proceed.

5.) Perform the character set change

Perform a backup of the database and then use the “alter database” command, this changes the current database character set definition WITHOUT changing the actual stored data.

ALTER DATABASE CHARACTER SET INTERNAL_USE WE8ISO8859P15;

A future blog will address a case where the data that is already in the database would need to be converted.

Link to script that contains the examples in this post.

 

 

 

1 Comment