Wednesday, 8 February 2012

performance tuning

This is the first of the number of articles on the series of Data Warehouse Application performance tuning scheduled to come every week. This one is on Informatica performance tuning.

Source Query/ General Query Tuning

1.1 Calculate original query cost
1.2 Can the query be re-written to reduce cost?
  • Can IN clause be changed with EXISTS?
  • Can a UNION be replaced with UNION ALL if we are not using any DISTINCT cluase in query?
  • Is there a redundant table join that can be avoided?
  • Can we include additional WHERE clause to further limit data volume?
  • Is there a redundant column used in GROUP BY that can be removed?
  • Is there a redundant column selected in the query but not used anywhere in mapping?
1.3 Check if all the major joining columns are indexed
1.4 Check if all the major filter conditions (WHERE clause) are indexed
  • Can a function-based index improve performance further?
1.5 Check if any exclusive query hint reduce query cost
  • Check if parallel hint improves performance and reduce cost
1.6 Recalculate query cost
  • If query cost is reduced, use the changed query

Tuning Informatica LookUp

2.1 Redundant Lookup transformation
  • Is there a lookup which is no longer used in the mapping?
  • If there are consecutive lookups, can those be replaced inside a single lookup override?
2.2 LookUp conditions
  • Are all the lookup conditions indexed in database? (Uncached lookup only)
  • An unequal condition should always be mentioned after an equal condition
2.3 LookUp override query
  • Should follow all guidelines from 1. Source Query part above
2.4 There is no unnecessary column selected in lookup (to reduce cache size)
2.5 Cached/Uncached
  • Carefully consider whether the lookup should be cached or uncached
  • General Guidelines
  • Generally don't use cached lookup if lookup table size is > 300MB
  • Generally don't use cached lookup if lookup table row count > 20,000,00
  • Generally don't use cached lookup if driving table (source table) row count < 1000
2.6 Persistent Cache
  • If found out that a same lookup is cached and used in different mappings, Consider persistent cache
2.7 Lookup cache building
  • Consider "Additional Concurrent Pipeline" in session property to build cache concurrently
  • "Prebuild Lookup Cache" should be enabled, only if the lookup is surely called in the mapping

Tuning Informatica Joiner

3.1 Unless unavoidable, join database tables in database only (homogeneous join) and don't use joiner
3.2 If Informatica joiner is used, always use Sorter Rows and try to sort it in SQ Query itself using Order By (If Sorter Transformation is used then make sure Sorter has enough cache to perform 1-pass sort)
3.3 Smaller of two joining tables should be master

Tuning Informatica Aggregator

4.1 When possible, sort the input for aggregator from database end (Order By Clause)
4.2 If Input is not already sorted, use SORTER. If possible use SQ query to Sort the records.

Tuning Informatica Filter

5.1 Unless unavoidable, use filteration at source query in source qualifier
5.2 Use filter as much near to source as possible

Tuning Informatica Sequence Generator

6.1 Cache the sequence generator

Setting Correct Informatica Session Level Properties

7.1 Disable "High Precision" if not required (High Precision allows decimal upto 28 decimal points)
7.2 Use "Terse" mode for tracing level
7.3 Enable pipeline partitioning (Thumb Rule: Maximum No. of partitions = No. of CPU/1.2) (Also remember increasing partitions will multiply the cache memory requirement accordingly)

Tuning Informatica Expression

8.1 Use Variable to reduce the redundant calculation
8.2 Remove Default value " ERROR('transformation error')" for Output Column.
8.3 Try to reduce the Code complexity like Nested If etc.
8.4 Try to reduce the Unneccessary Type Conversion in Calculation

tracing level


                 
                  Tracing level represents the amount of information that informatcia server writes in a log file.
Types of tracing level
Normal
Verbose
Verbose init
Verbose data
When you configure a transformation, you can set the amount of detail the Integration Service writes in the session log.
The session log tracing levels are:

1. Normal: Integration Service logs initialization and status information, errors encountered, and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows.

2. Terse: Integration Service logs initialization information and error messages and notification of rejected data.

3. Verbose Initialization: In addition to normal tracing, Integration Service logs additional initialization details, names of index and data files used, and detailed transformation statistics.

4. Verbose Data: In addition to verbose initialization tracing, Integration Service logs each row that passes into the mapping. Also notes where the Integration Service truncates string data to fit the precision of a column and provides detailed transformation statistics.
By default, the tracing level for every transformation is Normal.