Monday, July 15, 2013

Strange characters in your Web apps? It could be Oracle's Unicode handling.

Do you have strange characters in your Web app? Like a reversed question mark? Or characters like these: ⁿ‼↨♫☼◙ Sometimes it happens where you should see an apostrophe or a quote character like : Daniel◙s Bar-B-Q .

Here's one reason why: Oracle and Unicode. My scenario went like this:
  1. A user has a Document with some Unicode characters on it like the fancy "smart quote" characters (a Unicode u2019 for apostrophe)
  2. The user does a copy & paste to put the text into the Web app's comments field (a Java String)
  3. The web app saves the String into an Oracle VARCHAR2(4000) column called comments
  4. The Oracle instance has a popular character setting: US-ASCII-7, which does not support Unicode characters
  5. Oracle converts the character by truncating the leftmost bits, rendering an unprintable character ASCII(19) into the comments column
  6. Later, the Webapp reads the Oracle column, displays it to the browser, and displays the unprintable ASCII(19) character. as something Strange like ♫ or ◙.

Truncate Bits!!! What the...? WHY??
Yeah, that's right. Oracle truncated the bits, losing the original character - silently killing those Unicode characters without so much as a "by your leave" . Isn't "character assassination" illegal? How could Oracle DO such a thing?! The answer is not a simple one. First you need to understand Unicode character sets, and then realize that Oracle has tons of options and support for different character sets. But we can imagine what Oracle designers were thinking when converting a big 16-bit character into a tiny 7-bit space:
  • Throw an error. This would rollback transactions and force developers to handle or prevent the errors - pbbbt... what else you got?
  • Remove the Unicode character(s). That would change the size of the value stored.
  • Convert the character to something else, quietly. This is actually what happens, but it has it's own pitfalls: silent but deadly... pitfalls
Oracle can actually do some impressive up-converts : like from Windows-1252 to UTF-8. But unfortunately, the down-convert smashing of "smart quote" chars into ASCII uses the most brain-dead algorithm - truncating the left most bits - and u2019 becomes x19, which is unprintable. And if you try to convert that to XML, x19 is not even a valid XML character and the parser crashes.

Why Unicode? What are Smart Quotes?
So how did these Unicode characters get into my App anyhow? "My app only uses English, it's a simple app", you might be saying. The answer is that popular apps like Microsoft Office will automatically change your " or ' characters into more fancy, curly quote characters as you type. Nice, eh? Why not make your text look better? Who wouldn't want that? So now millions of MS Word documents with Unicode chars are floating around the planet, even your users are typing them, and they are copy & pasting those strings into your app. So THAT's where the Unicode came from for my scenario above.
And truly this should be OK, because tons of systems support Unicode already, Java and other languages use Unicode for their Strings internally already. And lots of apps like your Web browser support Unicode too. Oracle supports Unicode as well, but you do have to be aware at installation time. Oracle's US7ASCII setting is very popular, including on my Oracle instance, but I wouldn't recommend it. In today's world, you need Unicode, UTF-8 is well supported, and would have helped me here.

Now What? How do I Fix It?
You have a few options:

1. Long-term, switch to UTF-8. UTF-8 is well-supported and it's a super-set of ASCII, so all of the old ASCII text should convert in all of your VARCHAR2 columns. For almost all characters, the size is still just 8-bits, but can be larger for smart-quotes and others. However, converting an entire Oracle instance is nuclear - if you have a large database, this can take time, and the conversion your DBA recommends may be akin to backing up everything, and re-importing all of your data: hours or days of effort.

2. Use NVARCHAR2. You might consider the Oracle NVARCHAR2 column type just for your large comment-style fields that need free-form text. This will only require an ALTER TABLE on that column. But you will have to remember this for new Tables with similar needs.

3. Use a BLOB. If you store the field in a BLOB column, Oracle will not do any conversion. For columns where you don't care about indexes and search, like a comments section on your app, this should be OK. In Java, for example, you may not have to change any code at all. This is because you can still directly use this column as a String, i.e. JDBC Statement.setString() and getString() methods work fine without the messy JDBC BLOB handling. Or, JPA's @Lob annotation can also be placed on a String for minimum coding effort. As an added benefit, your users can paste the entire contents of "War and Peace" without having a size limit of 4000 in VARCHAR2. Note: Oracle's CLOB column type will still do character conversion, so it will not help here.

4. Filter the characters. In the short term I ended up using a regex to filter out and remove any unprintable characters. Unfortunately this lost the character, turning "Dan's Bar-B-Q" into "Dans Bar-B-Q". Users were not too happy with that, but it did stop the bleeding until I can implement one of the above solutions.

Conclusion, Recommendation
If you have strange characters appearing in your application, check the character set in your database engine. Oracle, SQL Server, Postgres, MySQL - they can all support a wide range of settings. Also, think about the design of your fields, do you really want VARCHAR2(4000) for a comments field? Maybe BLOB would be better?
  • Use UTF-8 in your database engine
  • Use a BLOB (careful, not CLOB)for a comments column. You'll get a big size, and still use a simple String in memory
  • Make a separate table for all of the comment fields, you probably have many comment fields in your app:
    CREATE TABLE COMMENT(
      ID VARCHAR2(20) PRIMARY KEY,
      COMMENT_TEXT BLOB
    )
    


Other Solutions
Windows-1252. This Microsoft-created character set looks attractive because it is still 8-bit, and it includes the curvy smart-quote characters and many other useful punctuation characters. Oracle supports it too, calling it WE8MSWIN1252 in their settings. However, this is really just a stop-gap, a hack, until good Unicode support arrived. So I cannot recommend this unless you are desperate. Technically it would work, but now that UTF-8 support is so good everywhere, and with disk space at pennies per GB, there's no need for it, and IMHO, Windows-1252 character set support should start to fade.

Monday, May 27, 2013

Using Oracle Stored Procedures and Java ResultSet

Oracle stored procedures are not my favorite slice of technology, but I have to use them sometimes. In this case I had a stored procedure that I needed to call: it returns a set of rows - a result set. But Oracle Stored procedures do not return anything by design - other database systems like SQLServer and Postgres can, but not Oracle. So what's a Git-R-Done pragmatic developer to do?

The stored procedure in question used an OUT parameter with a "cursor" type, this is how Oracle can "return" the results of a select statement. (I can feel my CS professor grading this now: "D-, NEVER use Out Params!" some languages like Java do not even have out params). And then the NEXT challenge: How do you call that from Java? Good question - this case doesn't happen in the abridged JDBC docs, nor in the Hibernate nor Spring frameworks docs either.

JDBC has standard support for ResultSets returned from a stored procedure CallableStatment, this is how many DB vendors like Postgres and SQLServer work. In those drivers, Statement.exeecuteQuery() returns a ResultSet. But Oracle has a long history of having backwards and low-compatibility issues for JDBC details,and this is another case of Oracle jamming their square-peg stored procedures into the round hole of JDBC.  But you can make it work.

Here are the "challenges":
* Oracle Stored Procedures CANNOT return a result set (it's an old design?)
* Oracle "gives" you SYS_REFCURSOR on an OUT param (not a JDBC standard, so that's an odd feeling)
* Oracle's JDBC driver does not have a JDBC-standard way to do this, they require OracleType.CURSOR
* Oracle's driver rejects the "OTHER" SQL Type, so you cannot use this JDBC Standard either

Step 1: Your stored procedure

Oracle stored procedures can use an OUT param with type SYS_REFCURSOR to return results of a SQL Select:
 CREATE PROCEDURE "JAY_PROC" ( st_cursor OUT SYS_REFCURSOR  ).... --details below

Step 2: CallableStatment

Of course, been there, done that - the ol' JDBC way to call stored procedures:
CallableStatement cs = connection.prepareCall("call jay_proc(?)");

Step 3: register the OUT param

Feeling squeemish? It's OK... we are almost finished ...you'll feel a small prick in your good programming sense:
cs.registerOutParameter(1, OracleType.CURSOR);
cs.execute();

Step 4: blink... cast the ResultSet ??!

err.. Yes, it's true, the object returned is actually a ResultSet, but you have to cast it:
ResultSet rs = (ResultSet) cs.getObject(1);

Step 5: Use the ResultSet as normal, JDBC calls

Whew! It's over.. Now we have what we wanted, a ResultSet:
while( rs.next()){

Conclusion

Oracle does not directly support stored procedures that return ResultSet, but with these 3 or 4 steps, you can get the stored procedure with a cursor to work. I assume that Oracle's JDBC driver does the work of mapping their cursor object to a JDBC-compliant ResultSet, which is nice. But if I had a preference, I would wish for Oracle to support procedures that can return rows from a select. Then you could use the execute() method to return a ResultSet directly without the odd casting operation in step 4.

Ideally you would use a SQL Select statement and the PreparedStatement JDBC object, but I was saddled with an existing stored procedure in this case. I do not recommend using stored procedures for almost any use case, if you can avoid it. Instead I prefer to use Java's standard JPA as much as possible.

Here is the working code:

package com.harpoontech.test.dao;

import java.sql.CallableStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.springframework.jdbc.datasource.DriverManagerDataSource;


public class TestOracleStoredProcedure {
 
 /**
  * The SQL TYPE for an Oracle CURSOR in a Oracle Stored Procedure.
  * This duplicates the OracleTypes.CURSOR, but with this constant 
  * we do not need to import com.oracle.* jars into this project.
  * However this class is still 100% dependent on Oracle at runtime 
  * and cannot be unit tested without Oracle.
  */
 private int ORACLE_CURSOR_TYPE=-10;
 
 public static void main(String[] args) throws SQLException {
  TestOracleStoredProcedure t = new TestOracleStoredProcedure();
  t.test();
 }

 public void test() throws SQLException {
  
  //let Spring create a Data source for the Oracle connections
  DriverManagerDataSource ds = new DriverManagerDataSource();
  String url = "jdbc:oracle:thin:@myhost:1525:DEVDB";
  ds.setUrl(url);
  ds.setUsername("scott");
  ds.setPassword("tiger");

  CallableStatement cs = ds.getConnection()
    .prepareCall("call jay_proc(?)");
  cs.registerOutParameter(1, ORACLE_CURSOR_TYPE);
  cs.execute();
  ResultSet rs = (ResultSet) cs.getObject(1);
  while( rs.next()){
              System.out.println("title:"+rs.getString(1)+" author:"
                            +rs.getString(2));
  }
  rs.close();
  cs.close();

 }
 
 /*
 --the Oracle 9+ stored proc with 1 OUT parameter for the results of a select:
 CREATE OR REPLACE PROCEDURE "JAY_PROC" ( st_cursor OUT SYS_REFCURSOR  )
     is
 BEGIN
  OPEN st_cursor FOR
  SELECT TITLE, AUTHOR, ISBN_ID 
  FROM books;
 end JAY_PROC;
 */
}

Notes

* I used the Spring DriverManagerDataSource, but you could replace that with straight JDBC calls

* Notice I used my own constant ORACLE_CURSOR_TYPE to avoid importing the oracle dependencies.

* You can pass the ResultSet object to another framework like Spring-JDBC, if you like

* I found most of this by using the debugger to show what Oracle drivers were really doing