This is the first of the number of articles on the series of Data Warehouse Application performance tuning scheduled to come every week. This one is on Informatica performance tuning.
1.2 Can the query be re-written to reduce cost?
1.4 Check if all the major filter conditions (WHERE clause) are indexed
2.5 Cached/Uncached
3.2 If Informatica joiner is used, always use Sorter Rows and try to sort it in SQ Query itself using Order By (If Sorter Transformation is used then make sure Sorter has enough cache to perform 1-pass sort)
3.3 Smaller of two joining tables should be master
4.2 If Input is not already sorted, use SORTER. If possible use SQ query to Sort the records.
5.2 Use filter as much near to source as possible
7.2 Use "Terse" mode for tracing level
7.3 Enable pipeline partitioning (Thumb Rule: Maximum No. of partitions = No. of CPU/1.2) (Also remember increasing partitions will multiply the cache memory requirement accordingly)
8.2 Remove Default value " ERROR('transformation error')" for Output Column.
8.3 Try to reduce the Code complexity like Nested If etc.
8.4 Try to reduce the Unneccessary Type Conversion in Calculation
Source Query/ General Query Tuning
1.1 Calculate original query cost1.2 Can the query be re-written to reduce cost?
- Can IN clause be changed with EXISTS?
- Can a UNION be replaced with UNION ALL if we are not using any DISTINCT cluase in query?
- Is there a redundant table join that can be avoided?
- Can we include additional WHERE clause to further limit data volume?
- Is there a redundant column used in GROUP BY that can be removed?
- Is there a redundant column selected in the query but not used anywhere in mapping?
1.4 Check if all the major filter conditions (WHERE clause) are indexed
- Can a function-based index improve performance further?
- Check if parallel hint improves performance and reduce cost
- If query cost is reduced, use the changed query
Tuning Informatica LookUp
2.1 Redundant Lookup transformation- Is there a lookup which is no longer used in the mapping?
- If there are consecutive lookups, can those be replaced inside a single lookup override?
- Are all the lookup conditions indexed in database? (Uncached lookup only)
- An unequal condition should always be mentioned after an equal condition
- Should follow all guidelines from 1. Source Query part above
2.5 Cached/Uncached
- Carefully consider whether the lookup should be cached or uncached
- General Guidelines
- Generally don't use cached lookup if lookup table size is > 300MB
- Generally don't use cached lookup if lookup table row count > 20,000,00
- Generally don't use cached lookup if driving table (source table) row count < 1000
- If found out that a same lookup is cached and used in different mappings, Consider persistent cache
- Consider "Additional Concurrent Pipeline" in session property to build cache concurrently
- "Prebuild Lookup Cache" should be enabled, only if the lookup is surely called in the mapping
Tuning Informatica Joiner
3.1 Unless unavoidable, join database tables in database only (homogeneous join) and don't use joiner3.2 If Informatica joiner is used, always use Sorter Rows and try to sort it in SQ Query itself using Order By (If Sorter Transformation is used then make sure Sorter has enough cache to perform 1-pass sort)
3.3 Smaller of two joining tables should be master
Tuning Informatica Aggregator
4.1 When possible, sort the input for aggregator from database end (Order By Clause)4.2 If Input is not already sorted, use SORTER. If possible use SQ query to Sort the records.
Tuning Informatica Filter
5.1 Unless unavoidable, use filteration at source query in source qualifier5.2 Use filter as much near to source as possible
Tuning Informatica Sequence Generator
6.1 Cache the sequence generatorSetting Correct Informatica Session Level Properties
7.1 Disable "High Precision" if not required (High Precision allows decimal upto 28 decimal points)7.2 Use "Terse" mode for tracing level
7.3 Enable pipeline partitioning (Thumb Rule: Maximum No. of partitions = No. of CPU/1.2) (Also remember increasing partitions will multiply the cache memory requirement accordingly)
Tuning Informatica Expression
8.1 Use Variable to reduce the redundant calculation8.2 Remove Default value " ERROR('transformation error')" for Output Column.
8.3 Try to reduce the Code complexity like Nested If etc.
8.4 Try to reduce the Unneccessary Type Conversion in Calculation