Sreedhar's Blog: Oracle to Teradata Migration Guidelines

Primary Index creation rules:

Oracle	Teradata
PRIMARY KEY	Make UPI and change those columns to NOT NULL.
No PRIMARY KEY but one or more UNIQUE indexes	Create UPI with unique index with the least number of columns. Maintain NULL/NOT NULL constraint as is.
No PRIMARY KEY, no Unique indexes, but only non-unique indexes	Pick a good candidate for PI and create NUPI. Maintain NULL/NOT NULL constraint as is.
No PRIMARY KEY, no Unique indexes, no non-unique indexes	Pick a good candidate for PI and create NUPI. Maintain NULL/NOT NULL constraint as is.

Table SET/MULTISET: All tables should be created as MULTISET.
Secondary Index Creation rules: Typically, any indexes provided in Oracle should not be converted as is to Secondary Indexes in Teradata. SI choices should be made by looking at queries and analyzing a real need for SIs to improve performance. PPI creation rules: Partitioning on Teradata tables needs to be done after analyzing queries and finding suitable candidate columns for partitioning.

Data Type conversion rules:

Oracle	Teradata
VARCHAR2(N)	If N <= 3 then CHAR(N) Else VARCHAR(N)
VARCHAR2(N BYTE) – This stores N bytes of character data.	VARCHAR(N) – Check if UNICODE data. If yes, then VARCHAR(2*N). If not, then treat as VARCHAR2(N) and apply conversion for VARCHAR2(N)
NUMBER	FLOAT. Change to DECIMAL(x,y) after data profiling. Depending on min and max values of data, can change this to one of the integer data types. See conversion for NUMBER(N) below.
NUMBER(N) or NUMBER(N,0)	DECIMAL(N,0)
NUMBER(X,Y)	DECIMAL(X,Y)
CLOB	CLOB. Check size. Can change to VARCHAR(N) if size <= 64000.
LONG	CLOB. Check size. Can change to VARCHAR(N) if size <= 64000.
RAW	BLOB. Check size
ROWID(N)	VARCHAR(N)
BIT Example: AccountFlag bit NULL	BYTEINT. Check data values. If character, then AccountFlag CHAR (5) CHECK (ACCOUNTFLAG in ('TRUE','FALSE', NULL)) If numeric, then BYTEINT CHECK (ACCOUNTFLAG in (1,0, NULL))
DATE	If column name has _DT, use DATE. Else use TIMESTAMP(0). Once data is available, choose DATE or TIMESTAMP(n) as appropriate.

Default values:

Oracle	Teradata
DEFAULT sysdate	DEFAULT CURRENT_TIMESTAMP(N), if datatype is TIMESTAMP(N). DEFAULT CURRENT_DATE if datatype is DATE.

UDFs:

There are certain Oracle SQL functions that do not have a Teradata SQL equivalent. For example, replace. For those, install and use C UDFs from the Oracle UDF library which can be obtained from the Teradata UDF oreplace or otranslate

SQL Functions:

Oracle	Teradata
DECODE	Use CASE if DECODEs are not deeply nested. Else, use UDF DECODE.
REPLACE	Use UDF OREPLACE.
GREATER	Use UDF GREATER

Some Timestamp Conversion From Oracle to TD:

To get Date at Midnight

Oracle :

SELECT TRUNC(SYSDATE, ‘HH’) FROM DUAL;

SELECT TRUNC(SYSDATE, ‘MI’) FROM DUAL;

TD :

SEL CAST(CAST(CAST(CAST( CURRENT_TIMESTAMP AS DATE FORMAT 'yyyy-mm-dd') AS CHAR(10))|| ' 00:00:00' AS CHAR(19)) AS TIMESTAMP(0));

SEL cast(CAST(CURRENT_TIMESTAMP(0) AS DATE) as timestamp(0));

SELECT CAST((CURRENT_TIMESTAMP(0) - CAST(CAST(CAST(CURRENT_TIMESTAMP(0)

AS TIME(0) ) AS CHAR( 8)) AS INTERVAL HOUR TO SECOND ) ) AS TIMESTAMP(0) );

First day of the month:

Oracle:

SELECT TO_CHAR(TRUNC(SYSDATE, 'MM'), 'DD-MON-YYYY HH:MI:SS')
FROM dual;

SELECT TO_CHAR(TRUNC(SYSDATE, 'MON'), 'DD-MON-YYYY HH:MI:SS')
FROM dual;

SELECT TO_CHAR(TRUNC(SYSDATE, 'MONTH'), 'DD-MON-YYYY HH:MI:SS')
FROM dual;

TD:

SEL (ADD_MONTHS(CURRENT_DATE,1) - EXTRACT(DAY

FROM ADD_MONTHS(CURRENT_DATE,1))) +1;

SEL (CURRENT_DATE- EXTRACT(DAY FROM CURRENT_DATE)) +1 ;

/* from Timestamp Column */

SEL CAST(CAST(CAST(((ADD_MONTHS(CAST(LOGDATE AS DATE FORMAT 'yyyy-mm-dd'),0) - EXTRACT(DAY

FROM ADD_MONTHS(CAST(LOGDATE AS DATE FORMAT 'yyyy-mm-dd'),0))) +1) AS DATE FORMAT 'YYYY-MM-DD') AS CHAR(10)) || ' 00:00:00.000000' AS TIMESTAMP(6))

Last Day of a Month in TD:

sel add_months(current_date,1) - extract(day from add_months(current_date,1));

SELECT ADD_MONTHS((CAST(CURRENT_DATE AS DATE) - EXTRACT(DAY FROM CAST(CURRENT_DATE AS DATE))) +1, 1) -1;

Just a note: the "last day of month" TD function suggested will fail on dates where the following month doesn't have an equivalent "day number", such as Jan-30 or Jan-31, Mar 31, etc (try it with 2014-03-31 instead of current_date).
A suggested fix would be:
SELECT ADD_MONTHS((CAST(CURRENT_DATE AS DATE) - EXTRACT(DAY FROM CAST(CURRENT_DATE AS DATE))) +1, 1) -1;
This will also work with other data/time formats, such as CURRENT_TIMESTAMP, etc.

First day of the year:

ORACLE:
SELECT TO_CHAR(TRUNC(SYSDATE, 'YYYY'), 'DD-MON-YYYY HH:MI:SS')
FROM dual;

SELECT TO_CHAR(TRUNC(SYSDATE, 'YEAR'), 'DD-MON-YYYY HH:MI:SS')
FROM dual;

TD:

sel (current_date - cast((current_date (format 'ddd') (char(3))) as smallint))+1 ;

Julian Date:

ORACLE:

SELECT TO_NUMBER(TO_CHAR(SYSDATE,'J')) FROM DUAL

TD:

SELECT CAST(CURRENT_DATE - CAST(0101 AS DATE) AS INTEGER) + 2415021;

Some OLAP functions in ORACLE:

ROWS COLUMN PRECEDING

ORACLE:

SELECT

FIRST_NAME,

SALARY,

SALARY*.9 LOW,

SALARY*1.1 HIGH,

COUNT(*) OVER (ORDER BY SALARY ROWS SALARY*.1 PRECEDING ) COUNT

FROM EMP;

FIRST_NAME	SALARY	LOW	HIGH	COUNT
Diana	4,200.00	3,780.00	4,620.00	1
David	4,800.00	4,320.00	5,280.00	2
Valli	4,800.00	4,320.00	5,280.00	3
Bruce	6,000.00	5,400.00	6,600.00	4
Daniel	9,000.00	8,100.00	9,900.00	5
Alexander	9,000.00	8,100.00	9,900.00	6
Nancy	12,000.00	10,800.00	13,200.00	7
Neena	17,000.00	15,300.00	18,700.00	8
Lex	17,000.00	15,300.00	18,700.00	9
Steven	24,000.00	21,600.00	26,400.00	10

TD Equivalent :

SELECT

FIRST_NAME,

SALARY,

SALARY*.9 LOW,

SALARY*1.1 HIGH,

COUNT(*) OVER (ORDER BY SALARY

RESET WHEN SALARY <= SALARY*.1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) "COUNT"

FROM

EMP

order by salary,"COUNT";

FIRST_NAME	SALARY	LOW	HIGH	COUNT
Diana	4,200.00	3,780.00	4,620.00	1
David	4,800.00	4,320.00	5,280.00	2
Valli	4,800.00	4,320.00	5,280.00	3
Bruce	6,000.00	5,400.00	6,600.00	4
Daniel	9,000.00	8,100.00	9,900.00	5
Alexander	9,000.00	8,100.00	9,900.00	6
Nancy	12,000.00	10,800.00	13,200.00	7
Neena	17,000.00	15,300.00	18,700.00	8
Lex	17,000.00	15,300.00	18,700.00	9
Steven	24,000.00	21,600.00	26,400.00	10

RANGE BETWEEN INTERVAL

ORACLE:

select id ,first_name,city,start_date,salary, sum(salary)

over(partition by city order by start_date

range between interval '36' MONTH preceding and interval '0' month preceding) as sum_sal

from employee

order by start_date

	ID	FIRST_NAME	CITY	START_DATE	SALARY	SUM_SAL
1	2	Alison	Vancouver	3/21/1976 0:00	6,661.78	6,661.78
2	3	James	Vancouver	12/12/1978 0:00	6,544.78	13,206.56
3	4	Celia	Vancouver	10/24/1982 0:00	2,344.78	2,344.78
4	5	Robert	Vancouver	1/15/1984 0:00	2,334.78	4,679.56
5	6	Linda	New York	7/30/1987 0:00	4,322.78	4,322.78
6	7	David	New York	12/31/1990 0:00	7,897.78	7,897.78
7	1	Jason	Toronto	7/25/1996 0:00	1,234.56	1,234.56
8	8	James	Vancouver	9/17/1996 0:00	1,232.78	1,232.78

TD Equivalent:

select id,first_name,city,start_date,

salary,

sum(salary) over (partition by City order by start_date RESET WHEN

(start_date - min(start_date)

over (partition by city order by start_date ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)) month(4) not between 0 and 36

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )

--min (start_date) over ( )

from employee

order by start_date

	ID	FIRST_NAME	CITY	START_DATE	SALARY	SUM_SAL
1	2	Alison	Vancouver	3/21/1976 0:00	6,661.78	6,661.78
2	3	James	Vancouver	12/12/1978 0:00	6,544.78	13,206.56
3	4	Celia	Vancouver	10/24/1982 0:00	2,344.78	2,344.78
4	5	Robert	Vancouver	1/15/1984 0:00	2,334.78	4,679.56
5	6	Linda	New York	7/30/1987 0:00	4,322.78	4,322.78
6	7	David	New York	12/31/1990 0:00	7,897.78	7,897.78
7	1	Jason	Toronto	7/25/1996 0:00	1,234.56	1,234.56
8	8	James	Vancouver	9/17/1996 0:00	1,232.78	1,232.78

Some Oracle Hierarchical Function:

Oracle provides SQL features that can give a hierarchical paths of parent-child values such as Manager-Employee. It can give a hierarchy from a manager to the bottom-most employees in his/her organization. For example,

Employee Manager

E1 M1

E2 E1

Hierarchy: M1->E1->E2.

Oracle functions SYS_CONNECT_BY_PATH, and clauses CONNECT BY and PRIOR allow this.

This can be achieved in Teradata using RECRUSIVE queries or views.

Eg for LEVEL

Oracle:

create table corporate_slaves (

slave_id integer primary key,

supervisor_id references corporate_slaves,

name varchar(100)

);

insert into corporate_slaves values (1, NULL, 'Big Boss Man');

insert into corporate_slaves values (2, 1, 'VP Marketing');

insert into corporate_slaves values (3, 1, 'VP Sales');

insert into corporate_slaves values (4, 3, 'Joe Sales Guy');

insert into corporate_slaves values (5, 4, 'Bill Sales Assistant');

insert into corporate_slaves values (6, 1, 'VP Engineering');

insert into corporate_slaves values (7, 6, 'Jane Nerd');

insert into corporate_slaves values (8, 6, 'Bob Nerd');

select

lpad(' ', (level - 1) * 2) || name as padded_name,

slave_id,

supervisor_id,

level

from corporate_slaves

connect by prior slave_id = supervisor_id

start with slave_id = 1;

PADDED_NAME SLAVE_ID SUPERVISOR_ID LEVEL

------------------------------ ---------- - ------------ ----------

Big Boss Man 1 1

VP Marketing 2 1 2

VP Sales 3 1 2

Joe Sales Guy 4 3 3

Bill Sales Assistant 5 4 4

VP Engineering 6 1 2

Jane Nerd 7 6 3

Bob Nerd 8 6 3

Equivalent TD implementation using Recursive table:

CREATE SET TABLE KOCH.corporate_slaves ,NO FALLBACK ,

NO BEFORE JOURNAL,

NO AFTER JOURNAL,

CHECKSUM = DEFAULT

(

slave_id INTEGER,

supervisor_id INTEGER,

name VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC)

PRIMARY INDEX ( slave_id );

insert into corporate_slaves values (1, NULL, 'Big Boss Man');

insert into corporate_slaves values (2, 1, 'VP Marketing');

insert into corporate_slaves values (3, 1, 'VP Sales');

insert into corporate_slaves values (4, 3, 'Joe Sales Guy');

insert into corporate_slaves values (5, 4, 'Bill Sales Assistant');

insert into corporate_slaves values (6, 1, 'VP Engineering');

insert into corporate_slaves values (7, 6, 'Jane Nerd');

insert into corporate_slaves values (8, 6, 'Bob Nerd');

with recursive temp(slave_id,supervisor_id,name,level) as

(select slave_id,supervisor_id,name,1 as level

from corporate_slaves where

supervisor_id is null

union all

select e2.slave_id,e2.supervisor_id,e2.name,level+1

from corporate_slaves e2,temp e3

where e2.supervisor_id=e3.slave_id

)

select lpad(' ',level ,'__')||name,slave_id,supervisor_id,level from temp

order by slave_id

(lpad(' ',level,'__')||name) slave_id supervisor_id level

Big Boss Man 1 ? 1

VP Marketing 2 1 2

VP Sales 3 1 2

Joe Sales Guy 4 3 3

Bill Sales Assistant 5 4 4

VP Engineering 6 1 2

Jane Nerd 7 6 3

Bob Nerd 8 6 3

Eg For NULLS LAST:

Oracle:

SELECT employee_id, department_id, hire_date,manager_id,

ROW_NUMBER( ) OVER (PARTITION BY department_id ORDER BY manager_id NULLS LAST) SRLNO

FROM emp

where department_id in(60,90)

ORDER BY department_id, SRLNO;

EMPLOYEE_ID	DEPARTMENT_ID	HIRE_DATE	MANAGER_ID	Rank(MANAGER_ID ASC)
102	90	1/13/1993	100	2
101	90	9/21/1989	100	2
103	60	1/3/1990	102	1
106	60	2/5/1998	103	2
107	60	2/7/1999	103	2
105	60	6/25/1997	103	2
104	60	5/21/1991	103	2
100	90	6/17/1987	-	1

Equivalent TD:

SELECT employee_id, department_id, hire_date, manager_id

,rank() over (PARTITION BY department_id ORDER BY manager_id)

FROM emp

WHERE department_id IN (60, 90)

order by case when manager_id is null then 999 else manager_id end

EMPLOYEE_ID	DEPARTMENT_ID	HIRE_DATE	MANAGER_ID	Rank(MANAGER_ID ASC)
102	90	1/13/1993	100	2
101	90	9/21/1989	100	2
103	60	1/3/1990	102	1
106	60	2/5/1998	103	2
107	60	2/7/1999	103	2
105	60	6/25/1997	103	2
104	60	5/21/1991	103	2
100	90	6/17/1987	?	1

TD Equivalent for some ORACLE Grouping functions

TD GROUPING is similar to ORACLE Grouping:

GROUPING can be used to represent the nulls that represent the empty set, by reporting instances of the empty set

using a character string instead of the QUESTION MARK character. In this example, the

character string representing the empty set is the phrase (-all-):

SELECT CASE GROUPING(state) WHEN 1 THEN '(-all-)' ELSE state END AS state,

CASE GROUPING(county)

WHEN 1

THEN '(-all-)'

ELSE county

END AS county,

CASE GROUPING(city) WHEN 1 THEN '(-all-)' ELSE city END AS city,

SUM(margin)

FROM sales_view

GROUP BY ROLLUP (state, county, city);

State	County	City	SUM(margin)

CA	San Diego	San Diego	19500
CA	San Diego	?	3000
CA	Los Angeles	Avalon	14400
CA	Los Angeles	Long Beach	24300
CA	Los Angeles	(-all-)	38700
CA	San Diego	(-all-)	22500
CA	(-all-)	(-all-)	61200
(-all-)	(-all-)	(-all-)	61200

GROUPING_ID() function in oracle:

SELECT

city, description,

GROUPING_ID(city, description) AS grp_id,

SUM(salary)

FROM employee

GROUP BY CUBE(city, description)

HAVING GROUPING_ID(city, description) > 0;

	CITY	DESCRIPTION	GRP_ID	SUM(SALARY)
1	?	?	3	32,574.02
2	?	Tester	2	21,096.90
3	?	Manager	2	10,242.56
4	?	Programmer	2	1,234.56
5	Toronto	?	1	1,234.56
6	New York	?	1	12,220.56
7	Vancouver	?	1	19,118.90

TD Equivalent:

SELECT

city, description,

--GROUPING_ID(city, description) AS grp_id,

(grouping(city)*2+grouping(description)*1) as grp_id

,SUM(salary)

FROM employee

GROUP BY CUBE(city, description)

HAVING grp_id > 0

order by 1,2;

	CITY	DESCRIPTION	GRP_ID	SUM(SALARY)
1	?	?	3	32,574.02
2	?	Tester	2	21,096.90
3	?	Manager	2	10,242.56
4	?	Programmer	2	1,234.56
5	Toronto	?	1	1,234.56
6	New York	?	1	12,220.56
7	Vancouver	?	1	19,118.90

In case the level of grouping increases by 1 more level will have to multiply by 4

( GROUPING(generic_product_desc) * 4.0 + GROUPING(consolidating_product_desc) * 2.0 + GROUPING(product_desc) * 1.0) product_grouping_id,

GROUP_ID Oracle function:

Group_id() = 0 is used to eliminate duplicate rows

SELECT city, description,GROUP_ID()

,SUM(salary)

FROM employee

GROUP BY city, ROLLUP(city, description)

HAVING GROUP_ID() = 0

order by 1,2;

	CITY	DESCRIPTION	GROUP_ID()	SUM(SALARY)
1	New York	Manager	0	7,897.78
2	New York	Tester	0	4,322.78
3	New York	?	0	12,220.56
4	Toronto	Programmer	0	1,234.56
5	Toronto	?	0	1,234.56
6	Vancouver	Manager	0	2,344.78
7	Vancouver	Tester	0	16,774.12
8	Vancouver	?	0	19,118.90

TD Equivalent

SELECT city, description

,SUM(salary)

,row_number() over (partition by city,description

order by city,description) as rn

FROM employee

GROUP BY city, ROLLUP(city, description)

qualify row_number() over (partition by city,description

order by city,description) = 1

order by 1,2;

	City	Description	Sum(Salary)	rn
1	New York	?	12,220.56	1.00
2	New York	Manager	7,897.78	1.00
3	New York	Tester	4,322.78	1.00
4	Toronto	?	1,234.56	1.00
5	Toronto	Programmer	1,234.56	1.00
6	Vancouver	?	19,118.90	1.00
7	Vancouver	Manager	2,344.78	1.00
8	Vancouver	Tester	16,774.12	1.00

KEEP DENSE RANK IN ORACLE :

SELECT city,

last_name,

salary,

MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY salary) OVER (PARTITION BY city) "Lowest",

MAX(salary) KEEP (DENSE_RANK LAST ORDER BY salary) OVER (PARTITION BY city) "Highest"

FROM employee

ORDER BY city, salary;

CITY	LAST_NAME	SALARY	Lowest	Highest
New York	Green	4322.78	4322.78	7897.78
New York	Larry	7897.78	4322.78	7897.78
Toronto	Martin	1234.56	1234.56	1234.56
Vancouver	Cat	1232.78	1232.78	6661.78
Vancouver	Black	2334.78	1232.78	6661.78
Vancouver	Rice	2344.78	1232.78	6661.78
Vancouver	Smith	6544.78	1232.78	6661.78
Vancouver	Mathews	6661.78	1232.78	6661.78

TD Equivalent :

SELECT city,

last_name,

salary,

MIN(salary) OVER (PARTITION BY city order by salary DESC) "Lowest",

MAX(salary) OVER (PARTITION BY city order by salary DESC) "Highest"

FROM employee

ORDER BY city, salary;

	CITY	LAST_NAME	SALARY	Lowest	Highest
1	New York	Green	4,322.78	4,322.78	7,897.78
2	New York	Larry	7,897.78	4,322.78	7,897.78
3	Toronto	Martin	1,234.56	1,234.56	1,234.56
4	Vancouver	Cat	1,232.78	1,232.78	6,661.78
5	Vancouver	Black	2,334.78	1,232.78	6,661.78
6	Vancouver	Rice	2,344.78	1,232.78	6,661.78
7	Vancouver	Smith	6,544.78	1,232.78	6,661.78
8	Vancouver	Mathews	6,661.78	1,232.78	6,661.78

SYS_GUID()

Some Oracle tables may have SYS_GUID() function used. Example:

FINANCIALS_USER_DEF_FIELD_ID VARCHAR(32 BYTE) DEFAULT SYS_GUID()

We may need an IDENTITY column in Teradata.

Teradata Keywords

If tables names or column names in Oracle use Teradata keywords, use the same names with double quotes around them. For example, ZONE should be used as “ZONE”. Queries need to be modified accordingly to use double quoted words.

Materialized Views:

The equivalent in Teradata is JOIN INDEX. These need not be converted as is, to JOIN INDEXes, unless a customer specifically wants to do that. Instead, analyze your queries and if needed, create Join indexes to improve performance. Remember that Join indexes take up a lot of space and cause performance degradation during data loading to the underlying tables.

Sreedhar's Blog

Tuesday, February 10, 2015

Oracle to Teradata Migration Guidelines

Primary Index creation rules:

Oracle:

TD:

Julian Date:

ORACLE:

No comments:

Post a Comment

About Me

Blog Archive