Tuesday, February 10, 2015

Oracle to Teradata Migration Guidelines

Primary Index creation rules: 


Oracle
Teradata
PRIMARY KEY
Make UPI and change those columns to NOT NULL.
No PRIMARY KEY but one or more UNIQUE indexes
Create UPI with unique index with the least number of columns. Maintain NULL/NOT NULL constraint as is.
No PRIMARY KEY, no Unique indexes, but only non-unique indexes
Pick a good candidate for PI and create NUPI. Maintain NULL/NOT NULL constraint as is.
No PRIMARY KEY, no Unique indexes, no non-unique indexes
Pick a good candidate for PI and create NUPI. Maintain NULL/NOT NULL constraint as is.

Table SET/MULTISET: All tables should be created as MULTISET.
Secondary Index Creation rules: Typically, any indexes provided in Oracle should not be converted as is to Secondary Indexes in Teradata. SI choices should be made by looking at queries and analyzing a real need for SIs to improve performance. PPI creation rules: Partitioning on Teradata tables needs to be done after analyzing queries and finding suitable candidate columns for partitioning. 

Data Type conversion rules: 


Oracle
Teradata
VARCHAR2(N)
If N <= 3 then CHAR(N)
Else VARCHAR(N)
VARCHAR2(N BYTE) – This stores N bytes of character data.
VARCHAR(N) – Check if UNICODE data.
If yes, then VARCHAR(2*N).
If not, then treat as VARCHAR2(N) and apply conversion for VARCHAR2(N)
NUMBER
FLOAT. Change to DECIMAL(x,y) after data profiling. Depending on min and max values of data, can change this to one of the integer data types. See conversion for NUMBER(N) below.
NUMBER(N) or NUMBER(N,0)
DECIMAL(N,0)  
NUMBER(X,Y)
DECIMAL(X,Y)
CLOB
CLOB. Check size. Can change to VARCHAR(N) if size <= 64000.
LONG
CLOB. Check size. Can change to VARCHAR(N) if size <= 64000.
RAW
BLOB. Check size
ROWID(N)
VARCHAR(N)
BIT
Example: AccountFlag bit NULL 
BYTEINT. Check data values.
If character, then
AccountFlag CHAR (5) CHECK (ACCOUNTFLAG in ('TRUE','FALSE', NULL))
If numeric, then
BYTEINT CHECK (ACCOUNTFLAG in (1,0, NULL))
DATE
If column name has _DT, use DATE.
Else use TIMESTAMP(0).
Once data is available, choose DATE or TIMESTAMP(n) as appropriate.

Default values:

Oracle
Teradata
DEFAULT sysdate
DEFAULT CURRENT_TIMESTAMP(N), if datatype is TIMESTAMP(N).
DEFAULT CURRENT_DATE if datatype is DATE.

UDFs:


 There are certain Oracle SQL functions that do not have a Teradata SQL equivalent. For example, replace. For those, install and use C UDFs from the Oracle UDF library which can be obtained from the Teradata UDF oreplace or  otranslate

SQL Functions:


Oracle
Teradata
DECODE
Use CASE if DECODEs are not deeply nested. Else, use UDF DECODE.
REPLACE
Use UDF OREPLACE.
GREATER
Use UDF GREATER

Some Timestamp Conversion From Oracle to TD:

To get Date at Midnight

Oracle :

SELECT TRUNC(SYSDATE, ‘HH’) FROM DUAL;
SELECT TRUNC(SYSDATE, ‘MI’) FROM DUAL;

TD :

SEL CAST(CAST(CAST(CAST( CURRENT_TIMESTAMP AS DATE FORMAT 'yyyy-mm-dd') AS CHAR(10))|| ' 00:00:00' AS CHAR(19)) AS TIMESTAMP(0));

SEL cast(CAST(CURRENT_TIMESTAMP(0) AS DATE) as timestamp(0));

SELECT CAST((CURRENT_TIMESTAMP(0 - CAST(CAST(CAST(CURRENT_TIMESTAMP(0
AS TIME(0) ) AS CHAR( 8)) AS INTERVAL HOUR TO SECOND ) ) AS TIMESTAMP(0) );

First day of the month:

 Oracle:

SELECT TO_CHAR(TRUNC(SYSDATE, 'MM'), 'DD-MON-YYYY HH:MI:SS')
FROM dual;

SELECT TO_CHAR(
TRUNC(SYSDATE, 'MON'), 'DD-MON-YYYY HH:MI:SS')
FROM dual;

SELECT TO_CHAR(
TRUNC(SYSDATE, 'MONTH'), 'DD-MON-YYYY HH:MI:SS')
FROM dual;

TD:

 SEL  (ADD_MONTHS(CURRENT_DATE,1) - EXTRACT(DAY
FROM ADD_MONTHS(CURRENT_DATE,1))) +1;

SEL (CURRENT_DATE- EXTRACT(DAY FROM CURRENT_DATE)) +1 ;
/* from Timestamp Column */
SEL CAST(CAST(CAST(((ADD_MONTHS(CAST(LOGDATE AS DATE FORMAT 'yyyy-mm-dd'),0) - EXTRACT(DAY
FROM ADD_MONTHS(CAST(LOGDATE AS DATE FORMAT 'yyyy-mm-dd'),0))) +1) AS DATE FORMAT 'YYYY-MM-DD') AS CHAR(10)) || ' 00:00:00.000000'  AS TIMESTAMP(6))

Last Day of a Month in TD:

sel add_months(current_date,1) - extract(day from add_months(current_date,1));

SELECT ADD_MONTHS((CAST(CURRENT_DATE AS DATE) - EXTRACT(DAY FROM CAST(CURRENT_DATE AS DATE))) +1, 1) -1;

Just a note: the "last day of month" TD function suggested will fail on dates where the following month doesn't have an equivalent "day number", such as Jan-30 or Jan-31, Mar 31, etc (try it with 2014-03-31 instead of current_date).
A suggested fix would be:
SELECT ADD_MONTHS((CAST(CURRENT_DATE AS DATE) - EXTRACT(DAY FROM CAST(CURRENT_DATE AS DATE))) +1, 1) -1;
This will also work with other data/time formats, such as CURRENT_TIMESTAMP, etc.

First day of the year:

ORACLE:
SELECT TO_CHAR(
TRUNC(SYSDATE, 'YYYY'), 'DD-MON-YYYY HH:MI:SS')
FROM dual;

SELECT TO_CHAR(
TRUNC(SYSDATE, 'YEAR'), 'DD-MON-YYYY HH:MI:SS')
FROM dual;

TD:
sel (current_date - cast((current_date (format 'ddd') (char(3))) as smallint))+1 ;

Julian Date:

ORACLE:

SELECT TO_NUMBER(TO_CHAR(SYSDATE,'J')) FROM DUAL

TD:

SELECT CAST(CURRENT_DATE - CAST(0101 AS DATE) AS INTEGER) + 2415021;


Some OLAP functions in ORACLE:

ROWS COLUMN PRECEDING 
ORACLE:

SELECT
   FIRST_NAME,
   SALARY,
   SALARY*.9 LOW,
   SALARY*1.1 HIGH,
   COUNT(*) OVER (ORDER BY SALARY ROWS SALARY*.1 PRECEDING ) COUNT
FROM    EMP;



FIRST_NAME
SALARY
LOW
HIGH
COUNT
Diana
4,200.00
3,780.00
4,620.00
1
David
4,800.00
4,320.00
5,280.00
2
Valli
4,800.00
4,320.00
5,280.00
3
Bruce
6,000.00
5,400.00
6,600.00
4
Daniel
9,000.00
8,100.00
9,900.00
5
Alexander
9,000.00
8,100.00
9,900.00
6
Nancy
12,000.00
10,800.00
13,200.00
7
Neena
17,000.00
15,300.00
18,700.00
8
Lex
17,000.00
15,300.00
18,700.00
9
Steven
24,000.00
21,600.00
26,400.00
10

TD Equivalent :

SELECT
   FIRST_NAME,
   SALARY,
   SALARY*.9 LOW,
   SALARY*1.1 HIGH,
   COUNT(*) OVER (ORDER BY SALARY
   RESET WHEN SALARY <=  SALARY*.1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW "COUNT"
FROM
   EMP
   order by salary,"COUNT";

FIRST_NAME
SALARY
LOW
HIGH
COUNT
Diana
4,200.00
3,780.00
4,620.00
1
David
4,800.00
4,320.00
5,280.00
2
Valli
4,800.00
4,320.00
5,280.00
3
Bruce
6,000.00
5,400.00
6,600.00
4
Daniel
9,000.00
8,100.00
9,900.00
5
Alexander
9,000.00
8,100.00
9,900.00
6
Nancy
12,000.00
10,800.00
13,200.00
7
Neena
17,000.00
15,300.00
18,700.00
8
Lex
17,000.00
15,300.00
18,700.00
9
Steven
24,000.00
21,600.00
26,400.00
10


 RANGE BETWEEN INTERVAL 

ORACLE:

select id ,first_name,city,start_date,salary, sum(salary)
over(partition by city order by start_date
range between interval '36' MONTH preceding and interval '0' month preceding) as sum_sal
from  employee
order by start_date


ID
FIRST_NAME
CITY
START_DATE
SALARY
SUM_SAL
1
2
Alison
Vancouver
3/21/1976 0:00
6,661.78
6,661.78
2
3
James
Vancouver
12/12/1978 0:00
6,544.78
13,206.56
3
4
Celia
Vancouver
10/24/1982 0:00
2,344.78
2,344.78
4
5
Robert
Vancouver
1/15/1984 0:00
2,334.78
4,679.56
5
6
Linda
New York
7/30/1987 0:00
4,322.78
4,322.78
6
7
David
New York
12/31/1990 0:00
7,897.78
7,897.78
7
1
Jason
Toronto
7/25/1996 0:00
1,234.56
1,234.56
8
8
James
Vancouver
9/17/1996 0:00
1,232.78
1,232.78


TD Equivalent:

select id,first_name,city,start_date,
 salary,
 sum(salary) over (partition by City order by start_date RESET WHEN
 (start_date - min(start_date)
 over (partition by city order by start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)) month(4) not between 0 and 36
 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )
 --min (start_date) over ( )
from employee
order by  start_date


ID
FIRST_NAME
CITY
START_DATE
SALARY
SUM_SAL
1
2
Alison
Vancouver
3/21/1976 0:00
6,661.78
6,661.78
2
3
James
Vancouver
12/12/1978 0:00
6,544.78
13,206.56
3
4
Celia
Vancouver
10/24/1982 0:00
2,344.78
2,344.78
4
5
Robert
Vancouver
1/15/1984 0:00
2,334.78
4,679.56
5
6
Linda
New York
7/30/1987 0:00
4,322.78
4,322.78
6
7
David
New York
12/31/1990 0:00
7,897.78
7,897.78
7
1
Jason
Toronto
7/25/1996 0:00
1,234.56
1,234.56
8
8
James
Vancouver
9/17/1996 0:00
1,232.78
1,232.78



Some Oracle Hierarchical Function:

Oracle provides SQL features that can give a hierarchical paths of parent-child values such as Manager-Employee. It can give a hierarchy from a manager to the bottom-most employees in his/her organization. For example,

Employee   Manager
E1         M1
E2         E1

Hierarchy: M1->E1->E2.

Oracle functions SYS_CONNECT_BY_PATH, and clauses CONNECT BY and PRIOR allow this.

This can be achieved in Teradata using RECRUSIVE queries or views.


Eg for LEVEL

Oracle:
create table corporate_slaves (
       slave_id            integer primary key,
       supervisor_id       references corporate_slaves,
       name                varchar(100)
);

insert into corporate_slaves values (1, NULL, 'Big Boss Man');
insert into corporate_slaves values (2, 1, 'VP Marketing');
insert into corporate_slaves values (3, 1, 'VP Sales');
insert into corporate_slaves values (4, 3, 'Joe Sales Guy');
insert into corporate_slaves values (5, 4, 'Bill Sales Assistant');
insert into corporate_slaves values (6, 1, 'VP Engineering');
insert into corporate_slaves values (7, 6, 'Jane Nerd');
insert into corporate_slaves values (8, 6, 'Bob Nerd');

select
  lpad(' ', (level - 1) * 2) || name as padded_name,
  slave_id,
  supervisor_id,
  level
from corporate_slaves
connect by prior slave_id = supervisor_id
start with slave_id = 1;


PADDED_NAME                       SLAVE_ID    SUPERVISOR_ID      LEVEL
------------------------------                  ---------- -            ------------                 ----------
  Big Boss Man                                     1                                              1
  VP Marketing                                      2                1                            2
  VP Sales                                            3                1                            2
  Joe Sales Guy                                    4                3                            3
  Bill Sales Assistant                              5                4                            4
  VP Engineering                                   6                1                            2
  Jane Nerd                                           7                6                            3
  Bob Nerd                                            8                6                            3


Equivalent TD implementation using Recursive table:

CREATE SET TABLE KOCH.corporate_slaves ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
      slave_id INTEGER,
      supervisor_id INTEGER,
      name VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( slave_id );
insert into corporate_slaves values (1, NULL, 'Big Boss Man');
insert into corporate_slaves values (2, 1, 'VP Marketing');
insert into corporate_slaves values (3, 1, 'VP Sales');
insert into corporate_slaves values (4, 3, 'Joe Sales Guy');
insert into corporate_slaves values (5, 4, 'Bill Sales Assistant');
insert into corporate_slaves values (6, 1, 'VP Engineering');
insert into corporate_slaves values (7, 6, 'Jane Nerd');
insert into corporate_slaves values (8, 6, 'Bob Nerd');


with recursive temp(slave_id,supervisor_id,name,level) as
(select slave_id,supervisor_id,name,1 as level 
from corporate_slaves where
supervisor_id is null
union  all
select e2.slave_id,e2.supervisor_id,e2.name,level+1
from corporate_slaves e2,temp e3
where e2.supervisor_id=e3.slave_id
)
select lpad(' ',level ,'__')||name,slave_id,supervisor_id,level  from temp
order by slave_id


(lpad(' ',level,'__')||name)         slave_id  supervisor_id           level
 Big Boss Man                          1                      ?                      1
 VP Marketing                          2                      1                      2
 VP Sales                                 3                      1                      2
 Joe Sales Guy                          4                      3                      3
 Bill Sales Assistant                   5                      4                      4
 VP Engineering                       6                      1                      2
 Jane Nerd                               7                      6                      3
 Bob Nerd                                8                      6                      3

Eg For NULLS LAST:

Oracle:

SELECT employee_id, department_id, hire_date,manager_id,
ROW_NUMBER( )  OVER (PARTITION BY department_id ORDER BY manager_id NULLS LAST)  SRLNO
FROM emp
where department_id in(60,90)
ORDER BY department_id, SRLNO;

EMPLOYEE_ID
DEPARTMENT_ID
HIRE_DATE
MANAGER_ID
Rank(MANAGER_ID ASC)
102
90
1/13/1993
100
2
101
90
9/21/1989
100
2
103
60
1/3/1990
102
1
106
60
2/5/1998
103
2
107
60
2/7/1999
103
2
105
60
6/25/1997
103
2
104
60
5/21/1991
103
2
100
90
6/17/1987
-
1

Equivalent TD:

SELECT   employee_iddepartment_idhire_datemanager_id
,rank() over (PARTITION BY department_id ORDER BY manager_id)
FROM emp
WHERE department_id IN (6090)
order by case when manager_id is null then 999 else manager_id end


EMPLOYEE_ID
DEPARTMENT_ID
HIRE_DATE
MANAGER_ID
Rank(MANAGER_ID ASC)
102
90
1/13/1993
100
2
101
90
9/21/1989
100
2
103
60
1/3/1990
102
1
106
60
2/5/1998
103
2
107
60
2/7/1999
103
2
105
60
6/25/1997
103
2
104
60
5/21/1991
103
2
100
90
6/17/1987
?
1


TD Equivalent for some ORACLE Grouping functions 

TD GROUPING is similar to ORACLE Grouping:

GROUPING can be used to represent the nulls that represent the empty set, by reporting instances of the empty set
using a character string instead of the QUESTION MARK character. In this example, the
character string representing the empty set is the phrase (-all-):

SELECT CASE GROUPING(state) WHEN 1 THEN '(-all-)' ELSE state END AS state,
CASE GROUPING(county)
WHEN 1
THEN '(-all-)'
ELSE county
END AS county,
CASE GROUPING(city) WHEN 1 THEN '(-all-)' ELSE city END AS city,
SUM(margin)
FROM sales_view
GROUP BY ROLLUP (state, county, city);

State
 County
City
SUM(margin)




CA
San Diego
San Diego 
19500
CA
San Diego
?
3000
CA
Los Angeles
Avalon
14400
CA
Los Angeles
Long Beach
24300
CA
Los Angeles
(-all-)
38700
CA
San Diego
(-all-)
22500
CA
(-all-)
(-all-)
61200
(-all-)
(-all-)
(-all-)
61200


GROUPING_ID() function in oracle:

SELECT
city, description,
GROUPING_ID(city, description) AS grp_id,
SUM(salary)
FROM employee
GROUP BY CUBE(city, description)
HAVING GROUPING_ID(city, description) > 0;


CITY
DESCRIPTION
GRP_ID
SUM(SALARY)
1
?
?
3
32,574.02
2
?
Tester
2
21,096.90
3
?
Manager
2
10,242.56
4
?
Programmer
2
1,234.56
5
Toronto
?
1
1,234.56
6
New York
?
1
12,220.56
7
Vancouver
?
1
19,118.90

TD Equivalent:

SELECT
city, description,
--GROUPING_ID(city, description) AS grp_id,
(grouping(city)*2+grouping(description)*1) as grp_id
,SUM(salary)
FROM employee
GROUP BY CUBE(city, description)
HAVING grp_id > 0
order by 1,2;



CITY
DESCRIPTION
GRP_ID
SUM(SALARY)
1
?
?
3
32,574.02
2
?
Tester
2
21,096.90
3
?
Manager
2
10,242.56
4
?
Programmer
2
1,234.56
5
Toronto
?
1
1,234.56
6
New York
?
1
12,220.56
7
Vancouver
?
1
19,118.90

In case the level of grouping increases by 1 more level will have to multiply by 4

Eg

      ( GROUPING(generic_product_desc) * 4.0 + GROUPING(consolidating_product_desc) * 2.0 + GROUPING(product_desc) * 1.0) product_grouping_id,

GROUP_ID Oracle function:
Group_id() = 0 is used to eliminate duplicate rows

SELECT city, description,GROUP_ID()
,SUM(salary)
FROM employee
GROUP BY city, ROLLUP(city, description)
HAVING GROUP_ID() = 0
order by 1,2;


CITY
DESCRIPTION
GROUP_ID()
SUM(SALARY)
1
New York
Manager
0
7,897.78
2
New York
Tester
0
4,322.78
3
New York
?
0
12,220.56
4
Toronto
Programmer
0
1,234.56
5
Toronto
?
0
1,234.56
6
Vancouver
Manager
0
2,344.78
7
Vancouver
Tester
0
16,774.12
8
Vancouver
?
0
19,118.90

TD Equivalent

SELECT city, description
,SUM(salary)
,row_number() over (partition by city,description
order by city,description) as rn
FROM employee
GROUP BY city,  ROLLUP(city, description)
qualify row_number() over (partition by city,description
order by city,description) = 1
order by 1,2;


City
Description
Sum(Salary)
rn
1
New York
?
12,220.56
1.00
2
New York
Manager
7,897.78
1.00
3
New York
Tester
4,322.78
1.00
4
Toronto
?
1,234.56
1.00
5
Toronto
Programmer
1,234.56
1.00
6
Vancouver
?
19,118.90
1.00
7
Vancouver
Manager
2,344.78
1.00
8
Vancouver
Tester
16,774.12
1.00

KEEP DENSE RANK IN ORACLE :

SELECT city,
       last_name,
       salary,
       MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY salary) OVER (PARTITION BY city) "Lowest",
       MAX(salary) KEEP (DENSE_RANK LAST ORDER BY salary) OVER (PARTITION BY city) "Highest"
FROM   employee
ORDER BY city, salary;

CITY
LAST_NAME
SALARY
Lowest
Highest

New York
Green
4322.78
4322.78
7897.78

New York
Larry
7897.78
4322.78
7897.78

Toronto
Martin
1234.56
1234.56
1234.56

Vancouver
Cat
1232.78
1232.78
6661.78

Vancouver
Black
2334.78
1232.78
6661.78

Vancouver
Rice
2344.78
1232.78
6661.78

Vancouver
Smith
6544.78
1232.78
6661.78

Vancouver
Mathews
6661.78
1232.78
6661.78



TD Equivalent :

SELECT city,
       last_name,
       salary,
       MIN(salary) OVER (PARTITION BY city order by salary DESC) "Lowest",
       MAX(salary) OVER (PARTITION BY city order by salary DESC) "Highest"
FROM   employee
ORDER BY city, salary;


CITY
LAST_NAME
SALARY
Lowest
Highest
1
New York
Green
4,322.78
4,322.78
7,897.78
2
New York
Larry
7,897.78
4,322.78
7,897.78
3
Toronto
Martin
1,234.56
1,234.56
1,234.56
4
Vancouver
Cat
1,232.78
1,232.78
6,661.78
5
Vancouver
Black
2,334.78
1,232.78
6,661.78
6
Vancouver
Rice
2,344.78
1,232.78
6,661.78
7
Vancouver
Smith
6,544.78
1,232.78
6,661.78
8
Vancouver
Mathews
6,661.78
1,232.78
6,661.78

 SYS_GUID()

Some Oracle tables may have SYS_GUID() function used. Example:

FINANCIALS_USER_DEF_FIELD_ID  VARCHAR(32 BYTE) DEFAULT SYS_GUID()

We may need an IDENTITY column in Teradata.


Teradata Keywords

If tables names or column names in Oracle use Teradata keywords, use the same names with double quotes around them. For example, ZONE should be used as “ZONE”. Queries need to be modified accordingly to use double quoted words.

Materialized Views:

The equivalent in Teradata is JOIN INDEX. These need not be converted as is, to JOIN INDEXes, unless a customer specifically wants to do that. Instead, analyze your queries and if needed, create Join indexes to improve performance. Remember that Join indexes take up a lot of space and cause performance degradation during data loading to the underlying tables.

No comments:

Post a Comment