Thursday 25 August 2011

Informatica interview questions and answers part 2


Informatica interview questions and answers part 2
Informatica Interview Questions and answers
1. What are the different modes of data processing available in Informatica server?
The Informatica server can have the data movement and processing done in Unicode or ASCII (1 byte format) modes. Code page validation and relationships are affected by the mode in which it operates. The IS can be configured to work in either of these modes with configuration parameter changes and a fresh start. The Unicode takes up 2 bytes for the normal ASCII characters and 3 bytes for non-ASCII values like characters from Japanese, Chinese language.

2. Explain code page compatibility.
The code page compatibility refers to the data movement characteristic between two code pages. If the codes pages are similar to each other with same characters sets, then there is no loss of data. The target code page can be superset of the other code page which means that it has all the characters that are represented in the other code page. But if the target is just a subset (not all unicode characters in the code page can be matched to characters in the target page) then these pages are not compatible as there will be data losses in the transformation.

3. What are the different threads that are created by the DTM?
The various threads that are created by the Data Transformation Manager are:
●        The single master thread that coordinates all the other threads that are created.
●        The mapping thread that manages each session and mapping parameters.
●        The sessions have the pre and post session operations carried out by separate threads.
●        There is a reader and writer thread for reading from source and to write to the final target from the source pipeline respectively.
●        Each partition is also associated with its own transformation thread.

4. What are sessions? How can you combine executions using batches?
The session refers to the set of instructions that have to be executed to transform/move the data from the sources to the specified targets. The pmcmd command or the session manager can be used to execute the session. The executions of the sessions can be clubbed in a serial or parallel manner by using batch execution. Each batch consists of multiple sessions that may execute in Sequential style or in a Concurrent manner (parallel execution) in the IS.

5. How is mapping variable different from mapping parameter?
The mapping variable represents the value that can change during the execution of the session. The final value used for the variable is stored by the Informatica server after the completion and is used again when the session is run again. The mapping parameters on the other hand are specific values that are maintained constant throughout the execution. You can define the parameters and their usage within the mapping procedure. Before start, these parameters are initialized with the specific value you assign.

6. Why is the aggregator cache file required?
The aggregator transformations have to be completed in chunks of instructions per run and the aggregator stores the intermediate values encountered in the buffer memory locally. If the processing required additional memory then the aggregator creates additional cache files to store the values of the transformation.

7. What does look up transformation mean?
The look up transformations are the those which have to access the RDBMS based datasets. The accesses are quickened up by the Informatica server by having the look up table with indexes pointing to specific data in the tables or views of the database. The look up condition is matched for all the look up ports that are issued in the transformation and the final data is retrieved.

8. What are stand-alone sessions? How can they be recovered in case of failure?
A session which is not attached to a batch is referred to as stand-alone session. The stand-alone session can be recovered from failed run by using the pmcmd command or by using the server manager interface. The server manager has the “server requests” menu for each session. You just highlight the session you want to recover and then select server requests -> stop it and then start it again. The same can be done with the use of pmcmd command in the command line.
9. How can you obtain the reports about the repository without having to deal with any SQL or other transformations?
The metadata reporter is the one that is used for the generation of reports regarding the repository. The web app does not require any knowledge about the various SQL queries etc.
10. Is the Fact table normalized or denormalized one?
The FACT table is always maintained a denormalized one with foreign keys present in it against the various primary key attributes of the dimension tables.

11. What is the difference between filter transformations and router transformations?
The filter transformations test for a single condition and the records that do not satisfy the condition are taken off. The router transformations on the other hand can check for multiple conditions (better to use this instead of multiple filter transformations) and the rows that do not satisfy the specific condition can be categorized into separate categories.

12. What do you mean by Enterprise Data Warehousing?
The EDW is the concept of generating a single point of access of organization information. There is a single data warehouse which stores all the information and it gives a global view of the data that is present on the server. This also gives the opportunity for the analysis to be done periodically over the same source. The development time for the centralized warehouse however is reasonably higher, but with better results.

13. What do you mean by a domain?
A domain refers to the collection of all the related nodes and other relationships that come under the single administrative point. This makes the entire data base more manageable.

14. What is a mystery dimension and a junk dimension?
The mystery dimension is the one which stores the sensitive data that has to be hidden from the view of others. The junk dimension is the one which accumulates all the unwanted information from the specific record. Those parameters which do not follow under a specific category or column are organized into a manageable dimension referred to as junk dimension. A fact sheet that is related to company but not really important for the database operations can be seen as an example.

15. What is the difference between the repository server and the powerhouse server?
The power house refers the management entity which is used to control the execution of the different processes across the components in the informatica server’s database repository. The repository server on the other hand is the control center for the the entire repository, including the various tables and other procedures etc. It maintains the consistency and integrity of the repository.

16. Why do we partition a session?
The partitioning of a session pipeline refers to the various independent execution sequences in the session. This can be used to improve the efficiency and the performance of the server. The extraction, transformations and the related output for each of these can be carried out in parallel for each partition that is being created.

17. What is a rank index?
The values are generated for each of the ranks that area associated with the different ports is referred to as rank indices.
18. What is the difference between active and passive transformations?
The active transformations are the ones that do the changes to the number of rows. For example the filter transformations actually move the rows involved in the transformation if they are not meeting the condition. The passive transformations are those which just parse all the rows and create some inference from those. Example for such type of transformations are the ones like expression transformations.

19. How can you eliminate duplicate rows from a flat file?
The sorter transformation applied with the distinct operation over the rows and keys will take every port as a part of the sort key. This when done will eliminate the entries of rows with the same values for the rows.

20. How can you created indexes after the load process is completed?
The session level command tasks can be used. The scripts for creating the indexes can be associated with session’s workflow or with the post-session execution sequence. But such index generations cannot be handled at the transformation level after the load process.

No comments:

Post a Comment