Oracle SQL: Normalize the UnNormalized

What is the problem?

Sometimes you are faced with a bad (poorly normalized) schema design. For example:

1

But, you may need the output to be in the following normalized form:

2

The correct solution may be to fix the schema design, but in most cases this will not be feasible.

Test Case 1:

 

We need to write our query to present the data as it should be stored, in spite of how it is being actually stored.

Let us build the following test case:

CREATE TABLE Cust_Phones

(

CustomerID INT PRIMARY KEY,

Phone1 VARCHAR(32),

Phone2 VARCHAR(32),

Phone3 VARCHAR(32)

);

INSERT into Cust_Phones  (CustomerID, Phone1, Phone2, Phone3)

VALUES   (1,’705-491-1111′, ‘705-491-1110′, NULL);

INSERT into Cust_Phones   (CustomerID, Phone1, Phone2, Phone3)

VALUES  (2,’613-492-2222′, NULL, NULL);

INSERT into Cust_Phones   (CustomerID, Phone1, Phone2, Phone3)

VALUES  (3,’416-493-3333’, ‘416-493-3330’, ‘416-493-3339’);

Commit;

Select * from cust_phones;

1

Now to normalize the output, we will write our query to use the unpivot operator as follows:

SELECT CustomerID, Phone

FROM

( SELECT CustomerID, Phone1, Phone2, Phone3  FROM Cust_Phones )

UNPIVOT ( Phone FOR Phones IN (Phone1, Phone2, Phone3));

2

The magic operator here is the “unpivot”

The “Phones” alias  is saying: “extract a new row for every value you find in the columns Phone1, Phone2, and Phone3.

Test Case 2:

What if we have a more complicated case of 3 phone numbers with different types, like:

3

And we need to normalize the output to be as follows:

4

Let us build the following test case:

CREATE TABLE Cust_Phones2

(

CustomerID INT PRIMARY KEY,

Phone1 VARCHAR(32),

PhoneType1 CHAR(4),

Phone2 VARCHAR(32),

PhoneType2 CHAR(4),

Phone3 VARCHAR(32),

PhoneType3 CHAR(4)

);

INSERT into Cust_Phones2 VALUES (1,’705-491-1111′, ‘cell’, ‘705-491-1110’, ‘home’, NULL,NULL);

INSERT into Cust_Phones2 VALUES  (2,’613-492-2222′, ‘home’, NULL, NULL, NULL, NULL);

INSERT into Cust_Phones2 VALUES   (3,’416-493-3333′, ‘work’, ‘416-493-3330’, ‘cell’,’416-493-3339′, ‘home’);

commit;

select * from Cust_Phones2;

3

Now to normalize the output, we will write our query to use the unpivot operator as follows:

SELECT CustomerID, Phone, PhoneType

FROM

(  SELECT CustomerID, Phone, PhoneType, Phones , PhoneTypes

, SUBSTR(Phones, LENGTH(Phones) – REGEXP_INSTR( REVERSE(Phones) , ‘[^0-9]’) + 2, 32) as idp,

SUBSTR(PhoneTypes,  LENGTH(PhoneTypes) – REGEXP_INSTR(REVERSE(PhoneTypes) , ‘[^0-9]’) + 2, 32) as idpt

FROM ( SELECT CustomerID, Phone1, Phone2, Phone3, PhoneType1, PhoneType2, PhoneType3 FROM Cust_Phones2 )

UNPIVOT (Phone FOR Phones IN (PHONE1, PHONE2, PHONE3))

  UNPIVOT (PhoneType FOR PhoneTypes IN (PhoneType1, PhoneType2, PhoneType3)) )

WHERE idp = idpt;

4

We used one “unpivot” operator to normalize the “phone” è produced number of rows = number of phones of each customer.

We used another “unpivot” operator to normalize the “PhoneType” è produced number of rows = number of phone types of each customer.

Note that if you run only the middle select query:

SELECT CustomerID, Phone, PhoneType, Phones , PhoneTypes

, SUBSTR(Phones, LENGTH(Phones) – REGEXP_INSTR( REVERSE(Phones) , ‘[^0-9]’) + 2, 32) as idp,

SUBSTR(PhoneTypes,  LENGTH(PhoneTypes) – REGEXP_INSTR(REVERSE(PhoneTypes) , ‘[^0-9]’) + 2, 32) as idpt

FROM ( SELECT CustomerID, Phone1, Phone2, Phone3, PhoneType1, PhoneType2, PhoneType3 FROM Cust_Phones2 )

UNPIVOT (Phone FOR Phones IN (PHONE1, PHONE2, PHONE3))

  UNPIVOT (PhoneType FOR PhoneTypes IN (PhoneType1, PhoneType2, PhoneType3))

 You will get the following output:

5.jpg

Which is like a Cartesian product of the number of phones * the number of corresponding phone types

To get only the relevant rows, we added the last where condition:

WHERE idp = idpt;

The following part of the query:

    SUBSTR(Phones, LENGTH(Phones) – REGEXP_INSTR( REVERSE(Phones) , ‘[^0-9]’) + 2, 32) as idp,

Tills us it is phone1 or phone2 or phone3. For example:

select substr(‘PHONE1’,LENGTH(‘PHONE1’) – REGEXP_INSTR(REVERSE(‘PHONE1′),'[^0-9]’)+2,32) from dual;

Will produce 1

While:

select substr(‘PHONE2’,LENGTH(‘PHONE2’) – REGEXP_INSTR(REVERSE(‘PHONE2′),'[^0-9]’)+2,32) from dual;

Will produce 2

and so on.

The following part of the query:

      SUBSTR(PhoneTypes,  LENGTH(PhoneTypes) – REGEXP_INSTR(REVERSE(PhoneTypes) , ‘[^0-9]’) + 2, 32) as idpt

Tills us it is phonetype1 or phonetype2 or phonetype3. For example:

select substr(‘PHONETYPE1’,LENGTH(‘PHONETYPE1’) – REGEXP_INSTR(REVERSE(‘PHONETYPE1′),'[^0-9]’)+2,32) from dual;

  • Will produce 1

 

Select REGEXP_INSTR(REVERSE(‘PHONETYPE1′),'[^0-9]’) from dual;

  • Will return 2, which is the position of the first non-numeric character from the end of the string.

 

Test Case 3:

What if you add a new phone number to the above table “Phone4” & “PhoneType4”,in this case you have to modify your queries. But is there a more dynamic way to build the UNPIVOT queries without advanced knowledge of the number of Phone/PhoneTypes? . This may be a subject of another future blog !

References:

A similar SQL Server case:

https://www.mssqltips.com/sqlservertip/3000/use-sql-servers-unpivot-operator-to-help-normalize-output/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s