Monday, February 16, 2015

Handling informatica failure with an MLoad job

Some of you may have experienced a failure in an MLoad job.  Due to the way the Informatica session was set up, when the job was re-run, a new error, similar to the following was given:

[ERROR] Type:(Teradata DBS Error), Error: (Worktable is missing for xxx during MLoad restart.)

This is happening because the "Drop Log/Error/Work Tables" attribute on the target in the session is checked.  This attribute tells Informatica to drop those tables before it starts the MLoad regardless if an MLoad was in progress in the prior run.  Unfortunately, by dropping the work and log tables, an MLoad that needed to be restarted cannot.  This is when the DBA gets called to drop and recreate the target table.  Well, we shouldn't need to do that anymore!!!

Here are the steps that should be followed when a failure occurs:

1)      Do a "select *" from the target table with a "sample 1" to see if a lock exist on the table.  Note: a "select count(*)" will not work for this purpose.
a.       If the query returns no error, then the job can be restarted.  Proceed to #2.
b.      If the query returns an error that a lock exists on the table, then a DBA needs to be engaged to drop that lock.  While waiting for that, you can do #2, but do not do #3 until the lock has been released.
Ex: Release mload on <tablename> ; --> if failed in aquisition phase.
release mload <tabname> in apply; -- if failed in appln phase.

2)      Ready the job - Copy the job to the support folder to make sure the "Drop Log/Error/Work Tables" attribute is unchecked.  The standard will change for development to have this box unchecked.  If the session already has this box unchecked, then there will not be a need to copy it to the support folder first.

3)      Re-run the job - Remember, Informatica has no idea that the MLoad was in progress and that an MLoad restart will commence.  Informatica is simply submitting the MLoad to Teradata.  Teradata, on the other hand, due to the data in the log table and work table, knows to do an MLoad restart.  Rerun the job.


4)      If there is another failure after this, start at #1 again.

Friday, February 13, 2015

Identifying Used, Unused and Missing Statistics in Teradata14.10

New DBQL logging options USECOUNT and STATSUSAGE, introduced in Teradata Database 14.10, enable the logging of used and missing statistics.  The output of this logging can be utilized to find used, unused, and missing statistics globally (for all queries) or for just a subset of queries.
The USECOUNT logging option, which starting in Teradata Database 14.10 supports the object use count (OUC) feature, is associated with the database that owns the table.  USECOUNT logs usage from all requests against that table irrespective of the user who submitted the query.  This option logs the object usage from such objects as databases, tables, indexes, join indexes, and statistics. Additionally, it logs insert, update, and delete counts against the database’s tables.  The STATSUSAGE logging option, similar to other DBQL logging options, is associated directly to a user and logs the used and missing statistics at the query level within an XML document.
In this article, we focus on the logging of the used statistics, how to retrieve them and join them to other dictionary tables to find the global list of used and unused statistics and also the missing statistics at query level.
Enabling the DBQL USECOUNT Option
Each statistic you define becomes an object that USECOUNT logging can track.  Each time the optimizer actually makes use of a specific statistic, an access counter in the new DBC.ObjectUsage table gets incremented.  So at any point in time you can look in the data dictionary and evaluate how frequently (or even if ever) a particular statistic is being used.
The descriptions of the statements to enable object use count (OUC) logging on a given database or a user are given below.  It is especially useful to enable the USECOUNT option on the databases that own permanent tables.  It can be enabled on users or user accounts also, but you would do that only when these accounts have tables on which you want the logging enabled. 
Description
Command
To enable on a database (which is not a user). Note that other DBQL options are not allowed to be enabled on a database.
BEGIN QUERY LOGGING WITH USECOUNT ON <database Name>;
To enable on a user with no DBQL options enabled.
BEGIN QUERY LOGGING WITH USECOUNT ON <User Name>;
To enable on a user having some existing DBQL options enabled (use SHOW QUERY LOGGING ON <User> to find the current DBQL options).
REPLACE QUERY LOGGING WITH <current options>, USECOUNT ON <User Name>;
 
The following are exceptions and special scenarios in OUC logging:
  1. OUC logging is not applicable for DBC tables.  In other words, the OUC feature doesn’t log the use counts for statistics you may be collecting on dictionary objects.
  2. OUC logging is not applicable for temporary tables.
  3. Dropping statistics also drops the corresponding entries in DBC.ObjectUsage table.  So, if you are dropping and recollecting statistics, the use counts reflect usage from the last collection only.
  4. Unlike other DBQL options, disabling USECOUNT invalidates (resets the counts and timestamp) the corresponding rows in DBC.ObjectUsage. It is not recommended to disable this option unless there is a specific reason to do so.
The following sections describe how to find these statistics for each category along with examples.
Identifying Used Statistics
After enabling DBQL USECOUNT and letting some time pass, the following query can be used to query the dictionary table DBC.ObjectUsage to get the use counts of the existing statistics.  The number of days a statistic has been continuously under the control of USECOUNT logging is given by the column DaysStatLogged; a value of -1 for this column indicates that USECOUNT option has been disabled after being enabled for some time in the past.  You need to consider the number of days a statistic was a candidate for logging along with use counts to normalize the comparison across different statistics.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
SELECT DBC.DBase.DatabaseName AS DatabaseName
      ,DBC.TVM.TVMName        AS TableName
      ,COALESCE(DBC.StatsTbl.StatsName
         ,DBC.StatsTbl.ExpressionList
               ,'SUMMARY')    AS StatName
      ,OU.UserAccessCnt       AS UseCount
      ,CAST(OU.LastAccessTimeStamp AS DATE) AS DateLastUsed
      ,CASE
       WHEN DBC.DBQLRuleTbl.TimeCreated IS NULL
       THEN -1  -- Logging disabled
       WHEN DBC.DBQLRuleTbl.TimeCreated > DBC.StatsTbl.CreateTimeStamp
       THEN CURRENT_DATE - CAST(DBC.DBQLRuleTbl.TimeCreated AS DATE)
       ELSE CURRENT_DATE - CAST(DBC.StatsTbl.CreateTimeStamp AS DATE)
       END AS DaysStatLogged
FROM DBC.ObjectUsage OU
    ,DBC.Dbase
    ,DBC.TVM
    ,DBC.StatsTbl LEFT JOIN DBC.DBQLRuleTbl
           ON DBC.StatsTbl.DatabaseId = DBC.DBQLRuleTbl.UserID
          AND DBQLRuleTbl.ExtraField5 = 'T'   /* Comment this line if TD 15.0 or above   */
        /*AND DBQLRuleTbl.ObjectUsage = 'T'*/ /* Uncomment this line if TD 15.0 or above */
WHERE DBC.Dbase.DatabaseId    = OU.DatabaseId
  AND DBC.TVM.TVMId           = OU.ObjectId
  AND DBC.StatsTbl.DatabaseId = OU.DatabaseId
  AND DBC.StatsTbl.ObjectId   = OU.ObjectId
  AND DBC.StatsTbl.StatsId    = OU.FieldId
  AND OU.UsageType            = 'STA'
ORDER BY 1, 2, 3;
Customize the above query for your databases and tables of interest, number of days a statistic is logged, etc.  A sample output of the above query is given below. 
Database
Name
TableName
StatName
Use
Count
DateLast
Used
DaysStat
Logged
DBA_TEST
PARTY_DATA
CURR_FLAG
209
11/15/2014
-1
PNR_DB
CODE_SHARE
FK_ITIODPAD_ITINERARY_INFO_ODP
253
11/1/2014
34
PNR_DB
ITINERARY_TBL
AD_ITINERARY_INFO_ODP
203
11/1/2014
34
PNR_DB
ITINERARY_TBL
CD_ID_PRODUCT
203
11/1/2014
34
PNR_DB
ITINERARY_TBL
CD_ORIGINAL_STATUS_CODE,CD_STATUS,
FK_PNRODP_AD_PNR_HEADER_ODP
203
11/2/2014
34
PNR_DB
PNR_HEADER_ODP
AD_PNR_HEADER_ODP
600
11/28/2014
34
PNR_DB
PNR_HEADER_ODP
AD_PNR_HEADER_ODP,CD_CONTROL_NUMBER_LOC
700
11/28/2014
34

 Identifying Unused Statistics
Using USECOUNT to find unused statistics requires consideration of multiple time dimensions.  The basic question being answered here is:  Am I collecting statistics that have not been used for some duration?  To answer this question, you need to consider not only the usage counts, but also how long the USECOUNT logging was active, the most recent usage time stamp and also the age of the statistics.  For example, if USECOUNT logging has been enabled only for the past few days, some statistics which get used on a monthly/quarterly workload may not yet have been logged as used.  Similarly, if you collect new statistics (not re-collections), you need to let them get exposed to different workloads for certain period of time to properly identify whether they are being used or not.
The following query is designed to consider these aspects and to list out the statistics that meet the following criteria.
  1. The age of the statistic is more than N days (first collected more than N days ago).
  2. DBQL USECOUNT logging is active for more than N days on the owning database.
  3. The statistic has not been used in the last N days.
The value of N can be customized based on your requirements. In the example query given below, the value of N is set to 30 (note that N is used in two predicates; both need to be updated if you customize this value).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
SELECT DBC.DBase.DatabaseName AS DatabaseName
      ,DBC.TVM.TVMName        AS TableName
      ,COALESCE(DBC.StatsTbl.StatsName
               ,DBC.StatsTbl.ExpressionList
               ,'SUMMARY')    AS StatName
      ,CURRENT_DATE - CAST(DBC.StatsTbl.CreateTimeStamp AS DATE) AS StatAge
      ,CASE
       WHEN DatabaseName = 'DBC'
       THEN -2  -- Logging Not Applicable
       WHEN DBC.StatsTbl.StatsType IN ('B', 'M')
       THEN -2  -- Logging Not Applicable on Temp tables (base and materialized)
       WHEN DBC.DBQLRuleTbl.TimeCreated IS NULL
       THEN -1  -- Logging Not Enabled
       WHEN DBC.DBQLRuleTbl.TimeCreated > DBC.StatsTbl.CreateTimeStamp
       THEN CURRENT_DATE - CAST(DBC.DBQLRuleTbl.TimeCreated AS DATE)
       ELSE CURRENT_DATE - CAST(DBC.StatsTbl.CreateTimeStamp AS DATE)
 END AS DaysStatLogged 
FROM   DBC.StatsTbl LEFT JOIN  DBC.DBQLRuleTbl
           ON DBC.StatsTbl.DatabaseId = DBC.DBQLRuleTbl.UserID
          AND DBQLRuleTbl.ExtraField5 = 'T'    /* Comment this line if TD 15.0 or above   */
        /*AND DBQLRuleTbl.ObjectUsage = 'T'*/  /* Uncomment this line if TD 15.0 or above */
      ,DBC.Dbase
      ,DBC.TVM     
WHERE DBC.StatsTbl.DatabaseId = DBC.DBASE.DatabaseId
  AND DBC.StatsTbl.ObjectId   = DBC.TVM.TVMId
  AND NOT EXISTS (SELECT 100 FROM DBC.ObjectUsage OU
                  WHERE OU.UsageType  = 'STA'
                    AND OU.DatabaseId = DBC.StatsTbl.DatabaseId
                    AND OU.ObjectId   = DBC.StatsTbl.ObjectId
                    AND OU.FieldId    = DBC.StatsTbl.StatsId
                    AND CURRENT_DATE - CAST(OU.LastAccessTimeStamp AS DATE) < 30
                 )
  AND DaysStatLogged > 30
  AND DBC.StatsTbl.StatsId <> 0  -- Do not qualify table-level SUMMARY statistics as unused
                                 -- May get implicitly used but not recorded as used
ORDER BY 1, 2, 3;

You can customize the above query to adjust the number of days the statistic has not been used, for the databases and tables of your interest, the age of the statistic, etc.
DatabaseName
TableName
StatName
Stat
Age
DaysStat
Logged
PNR_DB
ITINERARY_TBL
PARTITION
34
34
PNR_DB
ITINERARY_TBL
ST_281020293276_0_ITINERARY_INFO_ODP
34
34
PNR_DB
ITINERARY_TBL
TS_ULT_ALZ
34
34
PNR_DB
PNR_HEADER_ODP
CD_CREATION_OFFICE_ID
34
34
PNR_DB
PNR_HEADER_ODP
CD_CREATOR_IATA_CODE
34
34
PNR_DB
PNR_HEADER_ODP
PARTITION
34
34


 Identifying Missing Statistics
Two additional DBQL logging options, STATSUSAGE and XMLPLAN, create XML documents that identify which statistics were used, and ones that the optimizer looked for but did not find.  Contrary to USECOUNT, these two logging options should be turned on temporarily and only as needed for analysis.  These two options are enabled the same way as other DBQL logging is, by User or Account.  Their output can be found in DBC.DBQLXMLTbl.
STATSUSAGE logs the usage of existing statistics within a query, as well as recommendations for new statistics that were found to be missing when the query was optimized.  It does this without tying the recommendation to a particular query step in the plan.  The relationship to query steps can only be seen if XMLPLAN logging is also enabled.  XMLPLAN logs detailed step data in XML format.
Enable STATSUSAGE when looking for missing statistics always, with or without XMLPLAN.  If you enable XMLPLAN by itself without STATSUSAGE, no statistics-related information of any kind is logged into DBQLXMLTbl.  STATSUSAGE provides all the statistics recommendations and, if both are being logged, those statistic recommendations can be attached to specific steps in the plan.
Because XMLPLAN comes with increased overhead, for the purposes of identifying missing statistics, it usually is sufficient to start with STATSUSAGE logging without XMPLPLAN.  The step information available in XMLPLAN is more useful when in analysis mode.

(Courtesy: Teradata Forum )