Oracle SQL: Normalize the UnNormalized

What is the problem?

Sometimes you are faced with a bad (poorly normalized) schema design. For example:

1

But, you may need the output to be in the following normalized form:

2

The correct solution may be to fix the schema design, but in most cases this will not be feasible.

Test Case 1:

 

We need to write our query to present the data as it should be stored, in spite of how it is being actually stored.

Let us build the following test case:

CREATE TABLE Cust_Phones

(

CustomerID INT PRIMARY KEY,

Phone1 VARCHAR(32),

Phone2 VARCHAR(32),

Phone3 VARCHAR(32)

);

INSERT into Cust_Phones  (CustomerID, Phone1, Phone2, Phone3)

VALUES   (1,’705-491-1111′, ‘705-491-1110′, NULL);

INSERT into Cust_Phones   (CustomerID, Phone1, Phone2, Phone3)

VALUES  (2,’613-492-2222′, NULL, NULL);

INSERT into Cust_Phones   (CustomerID, Phone1, Phone2, Phone3)

VALUES  (3,’416-493-3333’, ‘416-493-3330’, ‘416-493-3339’);

Commit;

Select * from cust_phones;

1

Now to normalize the output, we will write our query to use the unpivot operator as follows:

SELECT CustomerID, Phone

FROM

( SELECT CustomerID, Phone1, Phone2, Phone3  FROM Cust_Phones )

UNPIVOT ( Phone FOR Phones IN (Phone1, Phone2, Phone3));

2

The magic operator here is the “unpivot”

The “Phones” alias  is saying: “extract a new row for every value you find in the columns Phone1, Phone2, and Phone3.

Test Case 2:

What if we have a more complicated case of 3 phone numbers with different types, like:

3

And we need to normalize the output to be as follows:

4

Let us build the following test case:

CREATE TABLE Cust_Phones2

(

CustomerID INT PRIMARY KEY,

Phone1 VARCHAR(32),

PhoneType1 CHAR(4),

Phone2 VARCHAR(32),

PhoneType2 CHAR(4),

Phone3 VARCHAR(32),

PhoneType3 CHAR(4)

);

INSERT into Cust_Phones2 VALUES (1,’705-491-1111′, ‘cell’, ‘705-491-1110’, ‘home’, NULL,NULL);

INSERT into Cust_Phones2 VALUES  (2,’613-492-2222′, ‘home’, NULL, NULL, NULL, NULL);

INSERT into Cust_Phones2 VALUES   (3,’416-493-3333′, ‘work’, ‘416-493-3330’, ‘cell’,’416-493-3339′, ‘home’);

commit;

select * from Cust_Phones2;

3

Now to normalize the output, we will write our query to use the unpivot operator as follows:

SELECT CustomerID, Phone, PhoneType

FROM

(  SELECT CustomerID, Phone, PhoneType, Phones , PhoneTypes

, SUBSTR(Phones, LENGTH(Phones) – REGEXP_INSTR( REVERSE(Phones) , ‘[^0-9]’) + 2, 32) as idp,

SUBSTR(PhoneTypes,  LENGTH(PhoneTypes) – REGEXP_INSTR(REVERSE(PhoneTypes) , ‘[^0-9]’) + 2, 32) as idpt

FROM ( SELECT CustomerID, Phone1, Phone2, Phone3, PhoneType1, PhoneType2, PhoneType3 FROM Cust_Phones2 )

UNPIVOT (Phone FOR Phones IN (PHONE1, PHONE2, PHONE3))

  UNPIVOT (PhoneType FOR PhoneTypes IN (PhoneType1, PhoneType2, PhoneType3)) )

WHERE idp = idpt;

4

We used one “unpivot” operator to normalize the “phone” è produced number of rows = number of phones of each customer.

We used another “unpivot” operator to normalize the “PhoneType” è produced number of rows = number of phone types of each customer.

Note that if you run only the middle select query:

SELECT CustomerID, Phone, PhoneType, Phones , PhoneTypes

, SUBSTR(Phones, LENGTH(Phones) – REGEXP_INSTR( REVERSE(Phones) , ‘[^0-9]’) + 2, 32) as idp,

SUBSTR(PhoneTypes,  LENGTH(PhoneTypes) – REGEXP_INSTR(REVERSE(PhoneTypes) , ‘[^0-9]’) + 2, 32) as idpt

FROM ( SELECT CustomerID, Phone1, Phone2, Phone3, PhoneType1, PhoneType2, PhoneType3 FROM Cust_Phones2 )

UNPIVOT (Phone FOR Phones IN (PHONE1, PHONE2, PHONE3))

  UNPIVOT (PhoneType FOR PhoneTypes IN (PhoneType1, PhoneType2, PhoneType3))

 You will get the following output:

5.jpg

Which is like a Cartesian product of the number of phones * the number of corresponding phone types

To get only the relevant rows, we added the last where condition:

WHERE idp = idpt;

The following part of the query:

    SUBSTR(Phones, LENGTH(Phones) – REGEXP_INSTR( REVERSE(Phones) , ‘[^0-9]’) + 2, 32) as idp,

Tills us it is phone1 or phone2 or phone3. For example:

select substr(‘PHONE1’,LENGTH(‘PHONE1’) – REGEXP_INSTR(REVERSE(‘PHONE1′),'[^0-9]’)+2,32) from dual;

Will produce 1

While:

select substr(‘PHONE2’,LENGTH(‘PHONE2’) – REGEXP_INSTR(REVERSE(‘PHONE2′),'[^0-9]’)+2,32) from dual;

Will produce 2

and so on.

The following part of the query:

      SUBSTR(PhoneTypes,  LENGTH(PhoneTypes) – REGEXP_INSTR(REVERSE(PhoneTypes) , ‘[^0-9]’) + 2, 32) as idpt

Tills us it is phonetype1 or phonetype2 or phonetype3. For example:

select substr(‘PHONETYPE1’,LENGTH(‘PHONETYPE1’) – REGEXP_INSTR(REVERSE(‘PHONETYPE1′),'[^0-9]’)+2,32) from dual;

  • Will produce 1

 

Select REGEXP_INSTR(REVERSE(‘PHONETYPE1′),'[^0-9]’) from dual;

  • Will return 2, which is the position of the first non-numeric character from the end of the string.

 

Test Case 3:

What if you add a new phone number to the above table “Phone4” & “PhoneType4”,in this case you have to modify your queries. But is there a more dynamic way to build the UNPIVOT queries without advanced knowledge of the number of Phone/PhoneTypes? . This may be a subject of another future blog !

References:

A similar SQL Server case:

https://www.mssqltips.com/sqlservertip/3000/use-sql-servers-unpivot-operator-to-help-normalize-output/

Formatting a Wrapped Execution Plan

If you received or produced an execution plan like the following unreadable one:

——————————————————————————–

——————-

 

| Id  | Operation                   | Name                    | E-Rows |  OMem |

1Mem | Used-Mem |

 

——————————————————————————–

——————-

 

|   0 | SELECT STATEMENT            |                         |        |       |

|          |

 

|   1 |  SORT AGGREGATE             |                         |      1 |       |

|          |

 

|*  2 |   HASH JOIN                 |                         |     13 |  1102K|

1102K|  355K (0)|

 

|*  3 |    HASH JOIN                |                         |     13 |   988K|

988K |  367K (0)|

 

|*  4 |     HASH JOIN               |                         |     13 |   921K|

921K |  621K (0)|

 

|*  5 |      HASH JOIN OUTER        |                         |     13 |   836K|

836K | 1224K (0)|

 

|*  6 |       HASH JOIN             |                         |     13 |   821K|

821K |  501K (0)|

 

|*  7 |        HASH JOIN            |                         |     13 |  1102K|

1102K|  501K (0)|

 

|   8 |         MERGE JOIN CARTESIAN|                         |      1 |       |

|          |

 

|*  9 |          TABLE ACCESS FULL  | PROFILE$                |      1 |       |

|          |

 

|  10 |          BUFFER SORT        |                         |      1 | 73728 |

73728|          |

 

|* 11 |           TABLE ACCESS FULL | PROFILE$                |      1 |       |

|          |

 

|* 12 |         TABLE ACCESS FULL   | USER$                   |     36 |       |

|          |

 

|  13 |        TABLE ACCESS FULL    | PROFNAME$               |      1 |       |

|          |

 

|* 14 |       TABLE ACCESS FULL     | RESOURCE_GROUP_MAPPING$ |      1 |       |

|          |

 

|  15 |      TABLE ACCESS FULL      | TS$                     |      7 |       |

|          |

 

|  16 |     TABLE ACCESS FULL       | TS$                     |      7 |       |

|          |

 

|  17 |    TABLE ACCESS FULL        | USER_ASTATUS_MAP        |      9 |       |

|          |

 

——————————————————————————–

——————-

How we can reformat this plan into a more readable format?

We are going to use the “awk” Unix/Linux utility to format the plan. In Windows , you may install the cygwin64 terminal utility which has also awk utility available.

 

Procedure:

Copy the un-formatted execution plan to a text file “plan1.txt”

Open cygwin64 terminal and move to the directory containing the plan1.txt file.

Run the following awk program:

cat plan1.txt | awk ‘{ printf “%s”, $0 ; if (NR % 3 == 0) print “” }’

plan1

Note that:

$0           represents the entire line

NR          Row number

 

Note that the above AWK command is simply printing the carriage return (or Enter) at only rows that are multiple of 3 ( rows #: 3, 6, 9,…etc). Note that this may be different from one plan to another.

After that, you may do some minor manual corrections to add some spaces and produce the following final plan:

plan2

Another example:

| Id  | Operation                         | Name               | Starts |

E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |

———————————————————————————————————————————————

 

|   0 | SELECT STATEMENT                  |                    |      1 |

|      1 |00:00:10.97 |      36 |       |       |          |

|   1 |  SORT AGGREGATE                   |                    |      1 |

1 |      1 |00:00:10.97 |      36 |       |       |          |

|   2 |   PX COORDINATOR                  |                    |      1 |

|     12 |00:00:10.97 |      36 |       |       |          |

|   3 |    PX SEND QC (RANDOM)            | :TQ10000           |      0 |

1 |      0 |00:00:00.01 |       0 |       |       |          |

|   4 |     SORT AGGREGATE                |                    |      0 |

1 |      0 |00:00:00.01 |       0 |       |       |          |

|   5 |      PX PARTITION RANGE ITERATOR  |                    |      0 |

96M|      0 |00:00:00.01 |       0 |       |       |          |

|   6 |       BITMAP CONVERSION COUNT     |                    |      0 |

96M|      0 |00:00:00.01 |       0 |       |       |          |

|   7 |        BITMAP AND                 |                    |      0 |

|      0 |00:00:00.01 |       0 |       |       |          |

|   8 |         BITMAP OR                 |                    |      0 |

|      0 |00:00:00.01 |       0 |       |       |          |

|*  9 |          BITMAP INDEX SINGLE VALUE| F1_GDS_ID_IDX      |      0 |        |

0 |00:00:00.01 |       0 |       |       |          |

|* 10 |          BITMAP INDEX SINGLE VALUE| F1_GDS_ID_IDX      |      0 |        |

0 |00:00:00.01 |       0 |       |       |          |

|  11 |         BITMAP MERGE              |                    |      0 |

|      0 |00:00:00.01 |       0 |  1024K|   512K|          |

|* 12 |          BITMAP INDEX RANGE SCAN  | F1_MONTH_ID_IDX    |      0 |        |

0 |00:00:00.01 |       0 |       |       |          |

 

In this case I need only to consider the carriage returns at lines (2, 4, ……etc)

cat plan2.txt | awk ‘{ printf “%s”, $0 ; if (NR % 2 == 0) print “” }’

plan3

And with some minor manual modifications to the plan2.txt file, we can reach the following output:

plan4

 

References:

http://blog.tanelpoder.com/2010/01/18/sometimes-things-are-easy-part-1-how-to-fix-wrapped-execution-plan-text/