Friday, May 11, 2007
Look up Transformation FAQs
Lookup Transformation : It is a passive and both connected and unconnected transformation. It is used to lookup data from in a relational table, view or synonym. Lookup definition can be imported either from source or target tables
In lookup which one is better one connected and unconnected lookup?
performance wise i am asking?if i am using five connected lookup in mapping that time what happens?
Ans>Exactly Depends.
Definitely unconnected is better performance wise. But its not wise to use 5 unconnected lookups. In this case better to use one connected with the required 5 ports as output port. If u need only one or two output port, you can use unconected for better performance.
Ashok>how can u say that unconnected is better performance then connected lookup. in connected lookup we are going to use only lookup table in unconnected we are going to add an expersion with lookup table then the performance in unconnected lookup is decreased due to another transformation than connected lookup
Vinod >It depends on the scenario u r using, if u want to lookup multiple times for the same condition in one mapping it is better to use unconnected lookup and in the situation like when u want multiple return values then u can only use connected lookup. In connected lookup, we can use both static and dynamic cache but in case of unconnected lookup, we can use only static cache.
=================================
Can any one explain about the new port that is created in the lookup after enabling the dynamic lookup cache?and what is the associate port??????
Hi,Associate port is either an i/p or o/p . Data in this port is used by IS[Infa Server] to insert or update lookup cache
Manasa Says> A Lookup transformation using a dynamic cache has the following properties:
New Lookup Row. The Designer adds this port to a Lookup transformation configured to use a dynamic cache. Indicates with a numeric value whether the PowerCenter Server inserts or updates the row in the cache, or makes no change to the cache. To keep the lookup cache and the target table synchronized, you pass rows to the target when the NewLookupRow value is equal to 1 or 2. Associated Port. Associate lookup ports with either an input/output port or a sequence ID. The PowerCenter Server uses the data in the associated ports to insert or update rows in the lookup cache. If you associate a sequence ID, the PowerCenter Server generates a primary key for inserted rows in the lookup cache.Ignore Null Inputs for Updates. The Designer activates this port property for lookup/output ports when you configure the Lookup transformation to use a dynamic cache. Select this property when you do not want the PowerCenter Server to update the column in the cache when the data in this column contains a null value. Ignore in Comparison. The Designer activates this port property for lookup/output ports not used in the lookup condition when you configure the Lookup transformation to use a dynamic cache. The PowerCenter Server compares the values in all lookup ports with the values in their associated input ports by default. Select this property if you want the PowerCenter Server to ignore the port when it compares values before updating a row.
==============================
can we lookup a table from a source qualifer transformation-unconnected lookup
nandam>no we cant lookup data premlahari>No. we can't do. I will explain you why. 1) Unless you assign the output of the source qualifier to another transformation or to target no way it will include the feild in the query.2) source qualifier don't have any variables feilds to utalize as expression
==========
nayana>How do i get data in a unconnected lookup transformationi.e if i want to compare customer_id in the source with the customer_id in the unconnected lookup, how can i get the data from the source for the customer_id if it is an unconnected lookupmanichamp123 Says>You need to create input port to take the input in_customer_id then from your source flow i.e. from the expression you can call the unconnected lookup :LKP.lkp_customer(customer_id) then in unconnected lookup condition you can put the condition like in_customer_id=customer_idnayana> i did the same, its advising invalid mapping
=========
Q >in which particular situation we use unconnected lookup transformation?Ans> We can use the unconnected lookup transformation when i need to return the only one port at that time i will use the unconnected lookup transformation instead of connected. We can also use the connected to return the one port but if u r taking unconnected lookup transformation it is not connected to the other transformation and it is not a part of data flow that why performance will increase.Ans2> The major advantage of unconnected lookup is its reusability. We can call an unconnected lookup multiple times in the mapping unlike connected lookup.=======
Q> Why we use lookup transformations?Ans>Lookup Transformations can access data from relational tables that are not sources in mapping. With Lookup transformation, we can accomplish the following tasks:Get a related value-Get the Employee Name from Employee table based on the Employee IDPerform Calculation.Update slowly changing dimension tables - We can use unconnected lookup transformation to determine whether the records already exist in the target or not.=======Q>How do we lookup data in the target and the source to compare them without using lookup transformation? basically i mean getting lookup functionality without using lookup transformation?Ans> You Can user a Joiner Transformation. Use the Target as a Source JOIN it with the Source 'Set the Joiner Condition and do the Required Tasks,Sidhu>Hey u can use a stored procedure to get the functionality of the look up.Pass the source rows as input to the SP,the SP will compare the input args to that of the target table , if value found can return some flag.Depending upon the outcome of the flag u can set ur conditionAns 3> You could have a SQL override in the Source Qualifier with something like: select * from source_table where not exisits in target_tableAns4>Hi,you can have source left outer joined to target table using the key columns to verify if an incoming row already exists in the target.e.g. consider A as source & B as target. And the key column is key_id in both tables, the join would be likeselect A.*, b.key_idfrom A, B whereA.key_id = B.key_id(+)now, in the mapping you can have logic to consider the row as new b.key_id is NULL and as existing if b.key_id is not NULL
==============
Tuesday, April 17, 2007
Informatica transformations
Aggregator Transformation (Transformation type: Connected, Active)
The Aggregator transformation allows you to perform aggregate calculations, such as averages and sums. The Aggregator transformation is unlike the Expression transformation, in that you can use the Aggregator transformation to perform calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only. When using the transformation language to create aggregate expressions, you can use conditional clauses to filter rows, providing more flexibility than SQL language ( ? ).
The PowerCenter Server performs aggregate calculations as it reads, and stores necessary data group and row data in an aggregate cache.
After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the PowerCenter Server performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally. For details on incremental aggregation, see “Using Incremental Aggregation” in the Workflow Administration Guide.
Ports in the Aggregator Transformation
To configure ports in the Aggregator transformation, complete the following tasks:
♦ Enter an expression in any output port, using conditional clauses or non-aggregate functions in the port.
♦ Create multiple aggregate output ports.
♦ Configure any input, input/output, output, or variable port as a group by port.
♦ Improve performance by connecting only the necessary input/output ports to subsequent transformations, reducing the size of the data cache.
♦ Use variable ports for local variables.
♦ Create connections to other transformations as you enter an expression.
Components of the Aggregator Transformation
The Aggregator is an active transformation, changing the number of rows in the pipeline. The
Aggregator transformation has the following components and options:
♦ Aggregate expression. Entered in an output port. Can include non-aggregate expressions
and conditional clauses.
♦Group by port. Indicates how to create groups. The port can be any input, input/output,
output, or variable port. When grouping data, the Aggregator transformation outputs the
last row of each group unless otherwise specified.
♦ Sorted input. Use to improve session performance. To use sorted input, you must pass
data to the Aggregator transformation sorted by group by port, in ascending or descending
order.
♦ Aggregate cache. The PowerCenter Server stores data in the aggregate cache until it
completes aggregate calculations. It stores group values in an index cache and row data in
the data cache.
Aggregate Caches
When you run a session that uses an Aggregator transformation, the PowerCenter Server creates index and data caches in memory to process the transformation. If the PowerCenter Server requires more space, it stores overflow values in cache files. You can configure the index and data caches in the Aggregator transformation or in the session properties. For more information, see “Creating an Aggregator Transformation” on page 11.
Note: The PowerCenter Server uses memory to process an Aggregator transformation with
sorted ports. It does not use cache memory. You do not need to configure cache memory for
Aggregator transformations that use sorted ports.
Aggregate Expressions
The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions. It can also include one aggregate function nested within another aggregate function, such as:
MAX( COUNT( ITEM ))
The result of an aggregate expression varies depending on the group by ports used in the transformation. For example, when the PowerCenter Server calculates the following aggregate expression with no group by ports defined, it finds the total quantity of items sold:
SUM( QUANTITY )
However, if you use the same expression, and you group by the ITEM port, the PowerCenter
Server returns the total quantity of items sold, by item.
You can create an aggregate expression in any output port and use multiple aggregate ports in a transformation.
Aggregate Functions
You can use the following aggregate functions within an Aggregator transformation. You can
nest one aggregate function within another aggregate function.
The transformation language includes the following aggregate functions:
♦ AVG
♦ COUNT
♦ FIRST
♦ LAST
♦ MAX
♦ MEDIAN
♦ MIN
♦ PERCENTILE
♦ STDDEV
♦ SUM
♦ VARIANCE
When you use any of these functions, you must use them in an expression within an
Aggregator transformation. For a description of these functions, see “Functions” in the
Transformation Language Reference.
Nested Aggregate Functions
You can include multiple single-level or multiple nested functions in different output ports in
an Aggregator transformation. However, you cannot include both single-level and nested
functions in an Aggregator transformation. Therefore, if an Aggregator transformation
contains a single-level function in any output port, you cannot use a nested function in any
other port in that transformation. When you include single-level and nested functions in the
same Aggregator transformation, the Designer marks the mapping or mapplet invalid. If you
need to create both single-level and nested functions, create separate Aggregator transformations.
Conditional Clauses
You can use conditional clauses in the aggregate expression to reduce the number of rows used
in the aggregation. The conditional clause can be any clause that evaluates to TRUE or
FALSE.
For example, you can use the following expression to calculate the total commissions of
employees who exceeded their quarterly quota:
SUM( COMMISSION, COMMISSION > QUOTA )
Non-Aggregate Functions
You can also use non-aggregate functions in the aggregate expression.
The following expression returns the highest number of items sold for each item (grouped by
item). If no items were sold, the expression returns 0.
IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))
Null Values in Aggregate Functions
When you configure the PowerCenter Server, you can choose how you want the PowerCenter
Server to handle null values in aggregate functions. You can choose to treat null values in
aggregate functions as NULL or zero. By default, the PowerCenter Server treats null values as
NULL in aggregate functions.
For details on changing this default behavior, see “Installing and Configuring the
PowerCenter Server on Windows” and “Installing and Configuring the PowerCenter Server
on UNIX” chapters in the Installation and Configuration Guide.
Group By Ports
The Aggregator transformation allows you to define groups for aggregations, rather than performing the aggregation across all input data. For example, rather than finding the total company sales, you can find the total sales grouped by region.
To define a group for the aggregate expression, select the appropriate input, input/output, output, and variable ports in the Aggregator transformation. You can select multiple group by ports, creating a new group for each unique combination of groups. The PowerCenter Server then performs the defined aggregation for each group.
When you group values, the PowerCenter Server produces one row for each group. If you do not group values, the PowerCenter Server returns one row for all input rows. The PowerCenter Server typically returns the last row of each group (or the last row received) with the result of the aggregation. However, if you specify a particular row to be returned (for example, by using the FIRST function), the PowerCenter Server then returns the specified row.
When selecting multiple group by ports in the Aggregator transformation, the PowerCenter
Server uses port order to determine the order by which it groups. Since group order can affect
your results, order group by ports to ensure the appropriate grouping. For example, the results
of grouping by ITEM_ID then QUANTITY can vary from grouping by QUANTITY then
ITEM_ID, because the numeric values for quantity are not necessarily unique.
The following Aggregator transformation groups first by STORE_ID and then by ITEM:
If you send the following data through this Aggregator transformation:
STORE_ID ITEM QTY PRICE
101 ‘battery’ 3 2.99
101 ‘battery’ 1 3.19
101 ‘battery’ 2 2.59
101 ‘AAA’ 2 2.45
201 ‘battery’ 1 1.99
201 ‘battery’ 4 1.59
301 ‘battery’ 1 2.45
The PowerCenter Server performs the aggregate calculation on the following unique groups:
STORE_ID ITEM
101 ‘battery’
101 ‘AAA’
201 ‘battery’
301 ‘battery’
The PowerCenter Server then passes the last row received, along with the results of theaggregation, as follows:
STORE_ID ITEM QTY PRICE SALES_PER_STORE
101 ‘battery’ 2 2.59 17.34
101 ‘AAA’ 2 2.45 4.90
201 ‘battery’ 4 1.59 8.35
301 ‘battery’ 1 2.45 2.45
Non-Aggregate Expressions
You can use non-aggregate expressions in group by ports to modify or replace groups. For
example, if you want to replace ‘AAA battery’ before grouping, you can create a new group by
output port, named CORRECTED_ITEM, using the following expression:
IIF( ITEM = ‘AAA battery’, battery, ITEM )
Default Values
You can use default values in the group by port to replace null input values. For example, if
you define a default value of ‘Misc’ in the ITEM column below, the PowerCenter Server
replaces null groups with ‘Misc’. This allows the PowerCenter Server to include null item
STORE_ID ITEM
101 ‘battery’
101 ‘AAA’
201 ‘battery’
301 ‘battery’
STORE_ID ITEM QTY PRICE SALES_PER_STORE
101 ‘battery’ 2 2.59 17.34
101 ‘AAA’ 2 2.45 4.90
201 ‘battery’ 4 1.59 8.35
301 ‘battery’ 1 2.45 2.45
Using Sorted Input
You can improve Aggregator transformation performance by using the sorted input option.
When you use sorted input, the PowerCenter Server assumes all data is sorted by group. As
the PowerCenter Server reads rows for a group, it performs aggregate calculations. When
necessary, it stores group information in memory. To use the Sorted Input option, you must
pass sorted data to the Aggregator transformation. You can gain performance with sorted
ports when you configure the session with multiple partitions.
When you do not use sorted input, the PowerCenter Server performs aggregate calculations as
it reads. However, since data is not sorted, the PowerCenter Server stores data for each group
until it reads the entire source to ensure all aggregate calculations are accurate.
For example, one Aggregator transformation has the STORE_ID and ITEM group by ports,
with the sorted input option selected. When you pass the following data through the
Aggregator, the PowerCenter Server performs an aggregation for the three rows in the
101/battery group as soon as it finds the new group, 201/battery:
STORE_ID ITEM QTY PRICE
101 ‘battery’ 3 2.99
101 ‘battery’ 1 3.19
101 ‘battery’ 2 2.59
201 ‘battery’ 4 1.59
201 ‘battery’ 1 1.99
If you use sorted input and do not presort data correctly, you receive unexpected results.
Sorted Input Conditions
Do not use sorted input if either of the following conditions are true:
♦ The aggregate expression uses nested aggregate functions.
♦ The session uses incremental aggregation.
If you use sorted input and do not sort data correctly, the session fails.
Pre-Sorting Data
To use sorted input, you pass sorted data through the Aggregator.
Data must be sorted as follows:
♦ By the Aggregator group by ports, in the order they appear in the Aggregator
transformation.
♦ Using the same sort order configured for the session. If data is not in strict ascending or
descending order based on the session sort order, the PowerCenter Server fails the session.
For example, if you configure a session to use a French sort order, data passing into the Aggregator transformation must be sorted using the French sort order.
For relational and file sources, you can use the Sorter transformation to sort data in the mapping before passing it to the Aggregator transformation. You can place the Sorter transformation anywhere in the mapping prior to the Aggregator if no transformation changes the order of the sorted data. Group by columns in the Aggregator transformation must be in the same order as they appear in the Sorter transformation. For details on sorting data using the Sorter transformation, see “Sorter Transformation” on page 283.
If the session uses relational sources, you can also use the Number of Sorted Ports option in the Source Qualifier transformation to sort group by columns in the source database. Group by columns must be in the same order in both the Aggregator and Source Qualifier transformations. For details on sorting data in the Source Qualifier, see “Using Sorted Ports” on page 317.
Figure 1-1 illustrates the mapping with a Sorter transformation configured to sort the source
data in descending order by ITEM_NAME:

The Sorter transformation sorts the data as follows:
ITEM_NAME QTY PRICE
Soup 4 2.95
Soup 1 2.95
Soup 2 3.25
Cereal 1 4.49
Cereal 2 5.25
With sorted input, the Aggregator transformation returns the following results:
ITEM_NAME QTY PRICE INCOME_PER_ITEM
Cereal 2 5.25 14.99
Soup 2 3.25 21.25
Creating an Aggregator Transformation
To use an Aggregator transformation in a mapping, you add the Aggregator transformation to the mapping, then configure the transformation with an aggregate expression and group by ports, if desired.
To create an Aggregator transformation:
1. In the Mapping Designer, choose Transformation-Create. Select the Aggregator
transformation.
2. Enter a name for the Aggregator, click Create. Then click Done.
The Designer creates the Aggregator transformation.
3. Drag the desired ports to the Aggregator transformation.
The Designer creates input/output ports for each port you include.
4. Double-click the title bar of the transformation to open the Edit Transformations dialog
box.
5. Select the Ports tab.
6. Click the group by option for each column you want the Aggregator to use in creating
groups.
You can optionally enter a default value to replace null groups.
If you want to use a non-aggregate expression to modify groups, click the Add button and enter a name and data type for the port. Make the port an output port by clearing Input (I). Click in the right corner of the Expression field, enter the non-aggregate expression using one of the input ports, then click OK. Select Group By.
7. Click Add and enter a name and data type for the aggregate expression port. Make the port an output port by clearing Input (I). Click in the right corner of the Expression field to open the Expression Editor. Enter the aggregate expression, click Validate, then click
OK.
Make sure the expression validates before closing the Expression Editor.
8. Add default values for specific ports as necessary.
If certain ports are likely to contain null values, you might specify a default value if the
target database does not handle null values.

Aggregator Setting Description
Cache Directory -- Local directory where the PowerCenter Server creates the index and data cache files.
By default, the PowerCenter Server uses the directory entered in the Workflow Manager
for the server variable $PMCacheDir. If you enter a new directory, make sure the
directory exists and contains enough disk space for the aggregate caches.
Tracing Level -- Amount of detail displayed in the session log for this transformation.
Sorted Input -- Indicates input data is presorted by groups. Select this option only if the mapping
passes sorted data to the Aggregator transformation.
Aggregator Data Cache Size -- Data cache size for the transformation. Default cache size is 2,000,000 bytes ( 2 MB). If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must
run the session on a 64-bit PowerCenter Server.
Aggregator Index Cache Size -- Index cache size for the transformation. Default cache size is 1,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must
run the session on a 64-bit PowerCenter Server.
Transformation Scope - Specifies how the PowerCenter Server applies the transformation logic to incoming
data:
- Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions.
- All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.
For more information about transformation scope, see .Understanding Commit Points.
in the Workflow Administration Guide.
0. Click OK.
11. Choose Repository-Save to save changes to the mapping.
Tips
You can use the following guidelines to optimize the performance of an Aggregator transformation.
Use sorted input to decrease the use of aggregate caches.
Sorted input reduces the amount of data cached during the session and improves session
performance. Use this option with the Sorter transformation to pass sorted data to the
Aggregator transformation.
Limit connected input/output or output ports.
Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.
Filter before aggregating.
If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation.
Troubleshooting
I selected sorted input but the workflow takes the same amount of time as before.
You cannot use sorted input if any of the following conditions are true:
♦ The aggregate expression contains nested aggregate functions.
♦ The session uses incremental aggregation.
♦ Source data is data driven.
When any of these conditions are true, the PowerCenter Server processes the transformation
as if you do not use sorted input.
A session using an Aggregator transformation causes slow performance.
The PowerCenter Server may be paging to disk during the workflow. You can increase session
performance by increasing the index and data cache sizes in the transformation properties. For more information about caching, see “Session Caches” in the Workflow Administration Guide.
I entered an override cache directory in the Aggregator transformation, but the
PowerCenter Server saves the session incremental aggregation files somewhere else.
You can override the transformation cache directory on a session level. The PowerCenter
Server notes the cache directory in the session log. You can also check the session properties
for an override cache directory.
===========================================================
Custom Transformation ( Transformation type: Active/Passive , Connected)
Custom transformations operate in conjunction with procedures you create outside of the Designer interface to extend Power Center functionality. You can create a Custom transformation and bind it to a procedure that you develop using the functions described in “Custom Transformation Functions” on page 51.
You can use the Custom transformation to create transformation applications, such as sorting and aggregation, which require all input rows to be processed before outputting any output rows. To support this process, the input and output functions occur separately in Custom transformations compared to External Procedure transformations.
The PowerCenter Server passes the input data to the procedure using an input function. The
output function is a separate function that you must enter in the procedure code to pass
output data to the PowerCenter Server. In contrast, in the External Procedure transformation,
an external procedure function does both input and output, and its parameters consist of all
the ports of the transformation.
You can also use the Custom transformation to create a transformation that requires multiple
input groups, multiple output groups, or both. A group is the representation of a row of data entering or leaving a transformation. For example, you might create a Custom transformation with one input group and multiple output groups that parses XML data. Or, you can create a Custom transformation with two input groups and one output group that merges two streams of input data into one stream of output data.
Code Page Compatibility
The Custom transformation procedure code page is the code page of the data the Custom transformation procedure processes. The following factors determine the Custom transformation procedure code page:
♦ PowerCenter Server data movement mode
♦ The INFA_CTChangeStringMode() function
♦ The INFA_CTSetDataCodePageID() function
The Custom transformation procedure code page must be two-way compatible with the
PowerCenter Server code page. The PowerCenter Server passes data to the procedure in the
Custom transformation procedure code page. Also, the data the procedure passes to the
PowerCenter Server must be valid characters in the Custom transformation procedure code
page.
By default, when the PowerCenter Server runs in ASCII mode, the Custom transformation procedure code page is ASCII. Also, when the PowerCenter Server runs in Unicode mode, the
Custom transformation procedure code page is UCS-2, but the PowerCenter Server only
passes characters that are valid in the PowerCenter Server code page.
However, you can use the INFA_CTChangeStringMode() functions in the procedure code to
request the data in a different format. In addition, when the PowerCenter Server runs in
Unicode mode, you can request the data in a different code page using the
INFA_CTSetDataCodePageID() function.
Changing the format or requesting the data in a different code page changes the Custom
transformation procedure code page to the code page the procedure requests:
♦ ASCII mode. You can write the external procedure code to request the data in UCS-2
format using the INFA_CTChangeStringMode() function. When you use this function,
the procedure must pass only ASCII characters in UCS-2 format to the PowerCenter
Server. Do not use the INFA_CTSetDataCodePageID() function when the PowerCenter
Server runs in ASCII mode.
♦ Unicode mode. You can write the external procedure code to request the data in MBCS
using the INFA_CTChangeStringMode() function. When the external procedure requests
the data in MBCS, the PowerCenter Server passes the data in the PowerCenter Server code
page. When you use the INFA_CTChangeStringMode() function, you can write the
external procedure code to request the data in a different code page from the PowerCenter
Server code page using the INFA_CTSetDataCodePageID() function. The code page you
specify in the INFA_CTSetDataCodePageID() function must be two-way compatible with
the PowerCenter Server code page.
Note: You can also use the INFA_CTRebindInputDataType() function to change the format
for a specific port in the Custom transformation.
Distributing Custom Transformation Procedures
You can copy a Custom transformation from one repository to another. When you copy a
Custom transformation between repositories, you must verify that the PowerCenter Server
machine the target repository uses contains the Custom transformation procedure.
Creating Custom Transformations
You can create reusable Custom transformations in the Transformation Developer, and add instances of the transformation to mappings. You can create non-reusable Custom transformations in the Mapping Designer or Mapplet Designer.
Each Custom transformation specifies a module and a procedure name. You can create a Custom transformation based on an existing shared library or DLL containing the procedure, or you can create a Custom transformation as the basis for creating the procedure. When you
create a Custom transformation to use with an existing shared library or DLL, make sure you define the correct module and procedure name.
When you create a Custom transformation as the basis for creating the procedure, select the transformation and generate the code. The Designer uses the transformation properties when it generates the procedure code. It generates code in a single directory for all transformations sharing a common module name.
The Designer generates the following files:
♦ m_
m_
PowerCenter Server to run when it loads the module. Similarly, this file includes a
deinitialization function, m_
code you want the PowerCenter Server to run before it unloads the module.
♦ p_
that implements the procedure logic, such as data cleansing or merging data.
♦ makefile.aix, makefile.aix64,makefile.hp, makefile.hp64, makefile.linux, makefile.sol.
Make files for the UNIX platforms. Use makefile.aix64 for 64-bit AIX platforms and
makefile.hp64 for 64-bit HP-UX (Itanium) platforms.
Rules and Guidelines
Use the following rules and guidelines when you create a Custom transformation:
♦ Custom transformations are connected transformations. You cannot reference a Custom
transformation in an expression.
♦ You can include multiple procedures in one module. For example, you can include an
XML writer procedure and an XML parser procedure in the same module.
♦ You can bind one shared library or DLL to multiple Custom transformation instances if you write the procedure code to handle multiple Custom transformation instances.
♦ When you write the procedure code, you must make sure it does not violate basic mapping rules. For more information about mappings and mapping validation, see “Mappings” in the Transformation Guide.
♦ The Custom transformation sends and receives high precision decimals as high precision
decimals.
♦ You can use multi-threaded code in Custom transformation procedures.
---------------------------------------------------------------------------------------------------------
Working with Groups and Ports
A Custom transformation has both input and output groups. It also can have input ports,
output ports, and input/output ports. You create and edit groups and ports on the Ports tab of
the Custom transformation. You can also define the relationship between input and output
ports on the Ports tab.
Figure 2-1 shows the Custom transformation Ports tab:
Figure 2-1. Custom Transformation Ports Tab
Creating Groups and Ports
You can create multiple input groups and multiple output groups in a Custom transformation. You must create at least one input group and one output group. To create an input group, click the Create Input Group icon. To create an output group, click the Create Output Group icon. When you create a group, the Designer adds it as the last group. When you create a passive Custom transformation, you can only create one input group and one output group.
To create a port, click the Add button. When you create a port, the Designer adds it below the currently selected row or group. Each port contains attributes defined on the Port Attribute Definitions tab. You can edit the attributes for each port. For more information about creating and editing user-defined port attributes, see “Working with Port Attributes” on page 25.
Editing Groups and Ports
Use the following rules and guidelines when you edit ports and groups in a Custom transformation:
♦ You can change group names by typing in the group header.
♦ You can only enter ASCII characters for port and group names.
♦ Once you create a group, you cannot change the group type. If you need to change the
group type, delete the group and add a new group.
♦ When you delete a group, the Designer deletes all ports of the same type in that group.
However, all input/output ports remain in the transformation, belong to the group above them, and change to input ports or output ports, depending on the type of group you delete. For example, an output group contains output ports and input/output ports. You delete the output group. The Designer deletes the output ports. It changes the input/output ports to input ports. Those input ports belong to the input group with the header directly above them.
♦ To move a group up or down, select the group header and click the Move Port Up or Move Port Down button. The ports above and below the group header remain the same, but the groups to which they belong might change.
Defining Port Relationships
By default, an output port in a Custom transformation depends on all input ports. However, you can define the relationship between input and output ports in a Custom transformation.
When you do this, you can view link paths in a mapping containing a Custom transformation and you can see which input ports an output port depends on. You can also view source column dependencies for target ports in a mapping containing a Custom transformation.
To define the relationship between ports in a Custom transformation, create a portdependency. A port dependency is the relationship between an output or input/output port and one or more input or input/output ports. When you create a port dependency, base it on the procedure logic in the code.
To create a port dependency, click Custom Transformation on the Ports tab and choose Port
Dependencies.
Tuesday, April 10, 2007
Informatica DW Interview Questions
When should you use a STAR and when a SNOW-FLAKE schema?
The snowflake and star schema are methods of storing data which are multidimensional in nature (i.e. which can be analysed by any or all of a number of independent factors) in a relational database. The snowflake schema (sometimes called snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized. Snowflake schema is nothing but one dimension table will be connected to another dimension table and so on.
Snowflake
• If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snow flaking may be appropriate.
• A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalized. A snowflake schema will hence be easier to implement.
• A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database.
• Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snow flaking would typically be required to permit simple query tools such as Cognos Power play to form such a query, especially if provision for these forms of query weren't anticipated when the data warehouse was first designed.
Star Schema -
The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single "fact table" with a compound primary key, with one segment for each "dimension" and with additional columns of additive, numeric facts.The star schema makes multi-dimensional database (MDDB) functionality possible using a traditional relational database. Because relational databases are the most common data management system in organizations today, implementing multi-dimensional views of data using a relational database is very appealing. Even if you are using a specific MDDB solution, its sources likely are relational databases. Another reason for using star schema is its ease of understanding. Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables are in de-normalized second normal form (2NF). If you want to normalize dimensional tables, they look like snowflakes (see snowflake schema) and the same problems of relational databases arise - you need complex queries and business users cannot easily understand the meaning of data. Although query performance may be improved by advanced DBMS technology and hardware, highly normalized tables make reporting difficult and applications complex.
Q. When should one use an MD-database (multi-dimensional
database) and not a relational one?
Ans - 1 Because More than one dimensions can be shareble for Other
Department
2 The Physical Load will be less.
3 Less Complexity of Fact
Q. What is the difference between an ODS and a W/H?
An ODS is an environment that pulls together, validates, cleanses and integrates data from disparate source
application systems. This becomes the foundation for providing the end-user community with an integrated view of enterprise data to enable users anywhere in the organisation to access information for strategic and/or tactical decision support, day-to-day operations and management reporting.
The defination of Data Warehouse is as follows.
• Subject-oriented, meaning that the data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;
• Time-variant, meaning that the changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;
• Non-volatile, meaning that data in the database is never over-written or deleted, but retained for future
reporting;
• Integrated, meaning that the database contains data from most or all of an organization's operational applications, and that this data is made consistent.
Difference -
------------
Ods - Transactions similar to those of anOnline Transaction Processing System
Data Warehouse - Queries process larger volumes of data
----------------------------------------------------------
Ods - Contains current and near current data
Data Warehouse - Contains historical data
-----------------------------------------------------------
Ods - Typically detailed data only, often resulting in very large data volumes
Data Warehouse - Contains summarised and detailed data, generally smaller in size than on ODS
-----------------------------------------------------------
Ods - Real-time and near real-time data loads
Data Warehouse - Typically batch data loads
----------------------------------------------------------------
Ods - Generally modeled to support rapid data update
Data Warehouse - Generally dimensionally modeled and tunes to optimise query performance
-----------------------------------------------------------------
Ods - Updated at the data field leve
Data Warehouse - Data is appended, not updated
-------------------------------------------------
Ods - Used for detailed decision making and operational reporting
Data Warehouse - Used for ling-term decision making and management reporting
---------------------------------------------------------------
Ods - Knowledge workers (customer service representatives, line managers)
Data Warehouse - Strategic audience (executives, business unit management)
===============================================================
Q. What is the difference between OLAP, ROLAP, MOLAP and HOLAP?
OLAP: Online analytical process. For analysis.
ROLAP: Relational Online analytical processing.
MOLAP: Multi dimensional analytical process.
HOLAP: Hybrid Online analytical process.
---------------------------------------------------
Q. What is the difference between a W/H and an OLTP application
Warehouse is used for high level data analysis purpose.It is used for predictions, timeseries analysis, financial analysis , what -if simulations etc. Basically it is used for better decision making.
OLTP is NOT used for analysis purpose.It is used for transaction and data processing.Its basically used for storing day-to-day transactions that take place in an organisation.The main focus of OLTP is easy and fast inputing of data,
while the main focus in data warehouse is easy retrieval of data.
OLTP doesnt store historical data.(this is the reason why it cant be used for analysis)
DW stores historical data.
------------------------------------------------------
Q. What is the difference between a data warehouse and a data mart?
Dataware house: It is a collection of data marts. Represents historical data.
Data mart: It is a sub set of data ware housing. It can provide the data to analyze query reporting
& analysis.
------------------------------------------------------------
Q. What is a Data Warehouse?
Collection of data marts.
-----------------------------------------------------------
Q. what are the types of dimension tables
1. Dimension2. Detail3. Measure
--------------------------------------------------
Q. what are the types of dimensions
1.Confirmed Dimension. 2.Junk Dimension. 3.Degenerated Dimension. 4.Slowly changing Dimensions.
------------------------------------------------
Q. what is the difference between etl tool and olap tools
etl tool means extract,transform and load the data into the ware house.
based on this data by using olap tool we can generate reports
--------------------------------------
Q. What are the different types of schemas?
two types of schemas r there: Star schema and snow flake Schema: in Snow flake schema normalozation is promoted where as in star schema denormalization is promoted. incase of snowflake schema DB size will be saved but no of joins will be increased and poor performance compare with star schema
three types of schemas are availble.Star schema,star flake schema & snow flake schema.
Star schema:It is highly denormalised,so we can retrieve the data very fast.
Star flake schema:Only one dimension contain one level of hierachy key.
Snow flake schema:It is highly normalised,so retrievel of data is slow.
----------------------------------------
Q. What is confirmed dimension and fact?
A dimension which links with more than one fact table is called as Confirmed dimesion.
-----------------------------------------
Q. What are slowly changing dimensions ( SCD) ?
there are three types SCD:
type-1: in this we can over write original recourd with new record type-
2: we can create new record type-
3: we can create new attribute
-----------------------------------------------------------
Question - What are teh different tasks that can be created in workflow manager?
total 8 types of task r there:
Assignment, Controll, Command, Decision, e-mail, Event-Wait, Event-Raise, timer. generally in workflow manager we used to create command, session, email notification
---------------------------------------------------
Question . What are the new features of informatica 7.1?
Lookup on flat file is possible.
Union Transformation is present.
Version Control.
LDAP Authentication.
supporting of 64mb architecture.
------------------------------------------
Question - What is a common data source for the central enterprise data warehouse?
Answer - operational data stores (ODS)
---------------------------------------------
Question - What are two examples of an OLTP environment?
Answer - On Line Banking
On Line Reservation(Transportation like Rail, Air etc.)
-----------------------------------------------
Question - Explain the different stages in creating a report in cognos report net?
open cognos connection in that select Report studio or query studio it will displys the u have packeges in (ex: gosales and goretailr (defult)) clik on that it disply Reportstudio in that select Object pane select requried qureysubject then click run it displys u r report on report viewer screen
-------------------------------------------------
Question - How you create IQD In ReportNet FrameWork?
Answer - open framework manager,u click on any query subject and go the properties dailog box and u can observe the EXTERNALIZE METHOD u change the iqd.
-----------------------------------------
Question - What is IQD?What is contained in IQD?
Answer - IQD standts for Impromptu query definition. it is report save extention with .iqd it is use for creating a cube in powerplay transfarmar
Answer - IQD is Impromptu Query Definition.It contains the SQL query to display the data from the database and SQL query is automatically created based on the columns selected from database.
------------------------------------------
Question - what is loop in frameworkmanager ?
Answer - loop is closed path in reportnet it called as ambigous relation ship. that means a query subject contians multipule paths to retrive the data it is a exception to reslove to create a shortcut on that
query subjet other wise it displys wrong results in reports and perfomens is digreegs.
-----------------------------------------------
Question - what is cube size?
Answer -2.0 GB
Answer - that depends on ur project requirements. so u can this 1GB. But the maximum size of the cube is 2GB
contact for any queries: 09371849894
--------------------------------------------------
Question -what is exact catalog size?
Answer - There is no limit for catalog size. It may be 3MB Or 4MB.
Answer - 3.5mb
------------------------------------------------
Question - What is the difference between a DB config and a CFG file?
Answer - .db consists info used by gde to connect to database
.cfg consists info used by co>op to connect to database
---------------------------------------------------
Question -Have you eveer encountered an error called "depth not equal"?
Answer - when 2 components connected with stright flow not having same depth or layout
------------------------------------------------
Question - What are Cartesian joins?
Answer - joining 2 tables with out conditon
select * from emp,dept; emp having 10 rec dept having 5 rec out == 50 rec
--------------------------------------------
Question - What is the difference between partitioning with key and round robin?
Answer - 1)Partition by key needs a key where round robin doesnot need
2)Round Robin always tries to distribute the records equally.where partition by key not.
-----------------------------------------
Question - What is a ramp limit?
Answer - When We set the Reject Threshold is set to ramp/Limit then we need to give ramp , imit values.
Ramp - Is a real number defined the rate of Rject records.
Limit - Number of records that can Reject.
Answer - Ramp - Rate of toleration of reject events in the number of records processed.
Limit - A number representing reject events.
-----------------------------------------------
Question - What are the basic needs to join two sources in a source qualifier?
Answer - 1) Both sources should be from same Database. 2) They should have a common field which can be used for a join.
----------------------------------------------
Question - What is the difference between Normal load and Bulk load?
Answer - If you enable bulk loading, the PowerCenter Server bypasses the database log.This improves session performance. But the disadvantage is that target database cannot perform rollback as there is no database log.
In normal load the databse log is not bypassed and therefore the target databse can recover from an incomplete session.The session performance is not as high as is in the case of bulk load
Answer - normal load performance recovery is possible where as in bulk mode it is not possible bez, there is no databaselog to perform rollback
-----------------------------------------
Question - what is a junk dimension ?
Answer - A junk dimension is a convenient grouping of flags and indicators. It's helpful, but not absolutely
required, if there's a positive correlation among the values. The benefits of a junk dimension
include: · Provide a recognizable, user-intuitive location for related codes, indicators and their
descriptors in a dimensional framework.
· Clean up a cluttered design that already has too many dimensions. There might be five or more indicators that could be collapsed into a single 4-byte integer surrogate key in the fact table.
· Provide a smaller, quicker point of entry for queries compared to performance from constraining directly on these attributes in the fact table. If your database supports bitmapped indices, this potential benefit may be irrelevant, although the others are still valid.
------------------------------------------------
Question - What is difference between maplet and reusable transformation?
Answer - Mapplet consists of set of transformations that is reusable. A reusable transformation is a
single transformation that can be reusable.
----------------------------------------
Question - What are the types of metadata that stores in repository?
Answer - Following r the types of metadata that stores in the repository
Database connections
Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target definitions
Transformations
------------------------------------------------------
Question - what is difference between dimention table and fact table?
Answer - dimensional table stores the textual description of data whereas fact table stores numerical values.
-------------------------------------------------------
Question - what is Partitioning ? where we can use Partition?
Answer - The Partitioning Option increases Power Center’s performance through parallel data processing, and this option provides a thread-based architecture and automatic data partitioning that optimizes parallel processing on multiprocessor and grid-based hardware environments. Partitions are used to optimize the session performance. We can select in session properties for partitions Types- Default----pass through partition, key range partition, round robin partition, hash partition.
------------------------------------------------------
Question - Can we take report on Excel in Business Objects?
Answer - yes. The businessobjects separately provided the option of get the report data from universe or others(personal data option)in the business objects entered wizard. Click or Select the others option and it will let u different files or extension files. here select .xls file and follow normal report style.
----------------------------------------------------
Question - How does data mining and data warehousing work together?
Answer - data warehousing is used to store the historical data.by using dwh bsiness users can analize thier business. data mining is used to predict the future.dwh will act as the source for the data mining
----------------------------------------------
Question - Why do we use DSS database for OLAP tools?
Answer - dission supporting system bo, cognos, etc
-----------------------------------------------------------------
Question - What is staging area?
Answer - Its an intermidiate stage where data is put before its put into the Datawarehouse database.It is also called Operational Data Source(ODS).If a report is needed very urgently though its not a nice one to have then we can generate from the staging area or ODS. Ex: if i am so hungry that i cannot wait until my food is
cooked i can have it though it does not taste good.but because i am too hungry i would not mind that taste.later once its cooked i can enjoy it much better. that was just to give an understanding of the concept.
any one please correct me if i am wrong.
Regards
---------------------------------------------------------------
Question - Which kind of index is preferred in DWH?
Answer - Bitmap index is the best one. why because B-tree is suited for unique values(eg: empid) and Bitmap is best for repeated values(eg: gender m/f)
---------------------------------------------------------------
Question - What is IQD file?
Answer - IQD means Impromptu queru defeniton it is a data source to create cube in powerplay user multidimesniol viewings
------------------------------------------------------------
Question - What are cursors?
Answer - They are used to display information of a table into effective number and also display more than one record at a time.
Answer - A cursor is a handle or name for sql area two types of cursors they are Explicit cursor: Created by user Implicit Cursor: created by oracle automatically when ever DML is performed
------------------------------------------------------------
Question - how to test reports in businessobjects?
Answer - if we are using oracle as a data source we can use TODD (fornt end for oracle) to test the report . Take the SQL statement from the SQL editor of Query panel and run the query in the Todd .
Answer - In BO intially to check the report, we go for Integrity check . One button is there to check the integrity,parse error, joins,loops and soon. The second stage of testing involves testing the SQL query
of the report if it uses free hand SQL else we need to test the universe which the report is using.
----------------------------------------------------
Question - What is "sas macros"
Answer - it,s nothing but a tool
Answer - its is use to customaze the sas environment, it can be used to do a repeatative task
Answer - MACRO IS A STORED TEXT THAT CAN BE USED ANY WHERE IN THE SAS SYSTEM.THE STORED CODE CAN BE USED REPETATIVLY WORK.
-------------------------------------------------------------------
Question - SAS System ?
Answer - The SAS System, originally Statistical Analysis System, is an integrated system of software products provided by the SAS Institute that enables the programmer to perform:
1. data entry, retrieval, management, and mining
2. report writing and graphics
3. statistical and mathematical analysis
4. business planning, forecasting, and decision support
5. operations research and project management
6. quality improvement
7. applications development
8. warehousing (extract, transform, load)
Answer - The SAS System is a Comprehensive and Flexible Information Delivery System that supports Data Access, Data Management,Data Analysis and Data Representation.
----------------------------------------------
Question - hi Here ive question on Reportnet when we save a report in reportstudio with what ext it save
Answer - It saves as .XML
Answer - It will save as .CRR
Answer - when we save save the report it save our spesifications like u can choose(in report studio RUN MENU by cliking down icon) HTML,XML,CSV,OR PDF Format if u save the report HTML u can also run report defferent format as u like or diffrent language aslo
=============================================
Question - what is main functional differences between BO and COGNOs?
Answer - BO : Business objects, objects in object-oriented programming meant to represent business entities.
Business Objects, a company that creates and markets business intelligence applications analyzing internal and external data.
COGNOS : which makes business intelligence (BI) and performance planning software
Answer -
Business Objects
1.BO -- ROLAP Reporting Tool.
2. It does not support MOLAP cubes.
Cognos :
1. MOLAP Reporting Tool.
2. u can create CUBE based on Business Req
3. Develop reports both ROLAP and MOLAP
===================================================
Question - can BO generate reports with out CUBICAL data to repository?
Answer - In BO the cubic structre cannot be perfomed it's not supported for building the cubicstructure where as the cognos will support.
----------------------------------------------------------------
Question - what are surrogate keys?
Answer# 1 - A surrogate key in a data warehouse is more than just a substitute for a natural key.
(OR)
A surrogate key is frequently a sequential number (e.g. a Sybase "identity column") but doesn't have to be. Having the key independent of all other columns insulates the database relationships from changes in data values or database design and guarantees uniqueness.
------------------------------------------------------------------
Question- What is the Rankindex in Ranktransformation?
nswer# 1 - The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses the Rank Index port to store the ranking position for each record in a group. For example, if you create a Rank transformationthat ranks the top 5 salespersons for each quarter, the rank
index numbers the salespeople from 1 to 5:
------------------------------------------------------------------
Question - What is the target load order?
Answer# 1 - yoU specify the target load order based on source qualifiers in a maping. If u have the multiple source qualifiers connected to the multiple targets, yoU can designatethe order in which informatica server loads data into the targets
-----------------------------------------------------------
Question - Why use the lookup transformation ?
Answer# 1- To perform the following tasks.
Get a related value. For example, if your source table includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read. Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales). Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records already exist in the target.
-------------------------------------------------------------
Question - What is up date strategy and what are the options for update strategy?
Answer# 1 - We can use update strategy at two different levels
1) within a session :- When you are configuring a session you can give instructions to treat
a)all rows as insert b)all rows as update c)data driven (use instructions coded into the session mapping to flag rows for different database operations.)
2) within mapping :- You can flag rows for insert,update,delete or reject.
Don't forget to set "Treat source rows as" to Data Driven in the session properties if you are flagging rows within the mapping
-------------------------------------------------------------
Question - What is factless fact schema?
Answer# 1 - The fact which does not contain any facts or mesaurables. Ex:the fact table which is used to store the students information is he came to school or not cannot have weather they are attending all classes.
which is a measure
OR - Factless fact schema does not contain any measures but only keys to dimension table.
-----------------------------------------------------
Question - What are the diffrences between joiner transformation and source qualifier transformation?
Answer# 1- yoU can join hetrogenious data sources in joiner transformation which we can not achieve in source qualifier transformation. yoU need matching keys to join two relational sources in source qualifier transformation. Where as you doesn't need matching keys to join two sources. Two relational sources should come from same datasource in source qualifier. yoU can join relatinal sources which aer
coming from different sources also.
-----------------------------------------------------
Question - What are the circumstances that infromatica server results an unreciverable session?
Answer# 1 The source qualifier transformation does not use sorted ports.
If you change the partition information after the initial session fails. Perform recovery is disabled in the informatica server configuration. If the sources or targets changes after initial session fails.
If the maping consists of sequence generator or normalizer transformation.
If a concuurent batche contains multiple failed sessions.
----------------------------------------------------
Question - After draging the ports of three sources(sql server,oracle,informix) to a single source qualifier, can
you map these three ports directly to target?
Answer# 1 - NO.Unless and until u join those three ports in source qualifier you cannot map them directly.
---------------------------------------------------
Question - Can you start a batches with in a batch?
Answer# 1 - yoU can not. If you want to start batch that resides in a batch,create a new independent batch and copy the necessary sessions into the new batch.
-------------------------------------------------------
Question - Explain about Recovering sessions?
Answer# 1 - If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
Run the session again if the Informatica Server has not issued a commit.
Truncate the target tables and run the session again if the session is not recoverable.
Consider performing recovery if the Informatica Server has issued at least one commit.
--------------------------------------------------------------------------------------
Question- How can you access the remote source into your session?
Answer# 1 -
Relational source: To acess relational source which is situated in a remote place , you need to configure database connection to the datasource.
File Source : To access the remote source file you must configure the FTP connection to the host machine before you create the session.
Hetrogenous : When your maping contains more than one source type,the server manager creates a hetrogenous session that displays source options for all types.
---------------------------------------------------------
Question - What are the reusable transforamtions?
Answer# 1 - Reusable transformations can be used in multiple mappings. When you need to incorporate this transformation into maping, U add an instance of it to maping. Later if yoU change the definition of the transformation ,all instances of it inherit the changes. Since the instance of reusable transforamation is a pointer to that transforamtion, U can change the transforamation in the transformation developer,
its instances automatically reflect these changes. This feature can save yoU great deal of work.
Answer# 2 - If you want to perform similar task for diff mappings, and if your logic is also similar for the diff requirements. in that case instead of creating multiple mappings, we will create it once and we will select the option as resuable. so that we can reuse it in other mappings also. but if you want to do the modifications in reusable mapping its not possible. you need to do it in original transformation it will will reflect all the other instances of that particular transformation
---------------------------------------------------------------------------------
Question - What is difference between maplet and reusable transformation?
Answer - Mapplet consists of set of transformations that is reusable. A reusable transformation is a single transformation that can be reusable.
----------------------------------------
Question - What are the mapings that we use for slowly changing dimension table?
Answer# 1 - Type1: Rows containing changes to existing dimensions are updated in the target by overwriting the existing dimension. In the Type 1 Dimension mapping, all rows contain current dimension data.
Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.
Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target. Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you want to keep a full history of dimension data in the table. Version numbers and versioned primary keys track the order of changes to each dimension.
Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and inserts only those found to be new dimensions to the target. Rows containing changes to existing dimensions are updated in the target.
When updating an existing dimension, the Informatica Server saves existing data in different columns of the same row and replaces the existing data with the updates.
Answer# 2 - slowly changing dimensions (SCD) can be classified by three types.
1)Type1:In this dimension we can store only current data and here insert are treated as inserts and updates are treated as updates.
2)Type2:In this dimension we can store complete historic data and it can divided into three types
a)Flag
b)Version
c)Date Range
But in type2 inserts are treated as inserts and updates are treated as inserts.
3)Type3:In this dimension we can store One time historic data with current data.Here also inserts are treated as insert and updates are treated as updates.
It's depend upon the granularity of the organization.
----------------------------------------------------------------------------
Question - How many ways you create ports?
Answer# 1 - Two ways
1.Drag the port from another transforamtion
2.Click the add buttion on the ports tab.
------------------------------------------------------------
Question - Why we use partitioning the session in informatica?
Answer# 1 - Partitioning achieves the session performance by reducing the time period of reading the source and loading the data into target
---------------------------------------------------------
Question - what is Partitioning ? where we can use Partition?
Answer# 1 - The Partitioning Option increases Power Center’s performance through parallel data processing, and this option provides a thread-based architecture and automatic data partitioning that optimizes parallel processing on multiprocessor and grid-based hardware environments.
Partitions are used to optimize the session performance. We can select in session properties for partitions
Types- Default----pass through partition, key range partition, round robin partition, hash partition.
----------------------------------------------------
Question - Can you start a session inside a batch individually?
Answer# 1 - We can start our required session only in case of sequential batch. in case of concurrent batch we cant do like this.
-------------------------------------------------------------------------------
Question - What are the session parameters?
Answer# 1 - Session parameters r like maping parameters,represent values U might want to change between
sessions such as database connections or source files. Server manager also allows U to create userdefined session parameters. Following are user defined session parameters.
Database connections , location of Source file names: use this parameter , when u want to change the name or
session, source file between session runs, location of Target file name : Use this parameter when u want to change the name or session, target file between session runs.
location of Reject file name : Use this parameter when u want to change the name or session reject files between session runs.
-------------------------------------------------------------------------------
Question - why do u use shortcuts in informatica.?
Answer# 1 - Short cut is used create copy of the object in shared folder,that copied folder inherit changes.It maintains same repository.
---------------------------------------------------------------------------
Question - Which transformation should u need while using the cobol sources as source defintions?
Answer# 1 - Normalizer transformaiton which is used to normalize the data. Since cobol sources r oftenly consists of Denormailzed data.
--------------------------------------------------------------------------
Question - What are the different types of Type2 dimension maping?
Answer# 1 - source will gets inserted in target along with a new version number. And newly added dimension in source will inserted into target with a primary key.
Type2 Dimension/Flag current Maping: This maping is also used for slowly changing dimensions.In addition it creates a flag value for changed or new dimension. Flag indiactes the dimension is new or slowly updated. Recent dimensions will gets saved with cuurent flag value 1. And updated dimensions r saved with the value 0. Type2 Dimension/Effective Date Range Maping: This is also one flavour of Type2 maping used for slowly changing dimensions. This maping also inserts both new and changed dimensions in to the target. And changes r tracked by the effective date range for each version of each dimension.
---------------------------------------------------------------------------
Question - What is Performance tuning in Informatica?
Answer# 1 - The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server. Increase the session performance by following.
The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections ofteny affect on session performance.So aviod network connections.
Flat files: If u'r flat files stored on a machine other than the informatca server, move those files to the machine that consists of informatica server.
Relational datasources: Minimize the connections to sources ,targets and informatica server to improve session performance.Moving target database into server system may improve session performance.
Staging areas: If you use staging areas u force informatica server to perform multiple datapasses. Removing of staging areas may improve session performance.
yoU can run the multiple informatica servers againist the same repository. Distibuting the session load to multiple informatica servers may improve session performance.
Run the informatica server in ASCII datamovement mode improves the session performance. Because ASCII datamovement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character.
If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
We can improve the session performance by configuring the network packet size,which allows data to cross the network at one time.To do this go to server manger ,choose server configure database connections.
If your target consists key constraints and indexes u slow the loading of data. To improve the session performance in this case drop constraints and indexes before you run the session and rebuild them after completion of session.
Running a parallel sessions by using concurrent batches will also reduce the time of loading the data. So concurent batches may also increase the session performance. Partittionig the session improves the session performance by creating multiple connections to sources and targets and loads data in paralel pipe lines.
In some cases if a session contains a aggregator transformation ,you can use incremental aggregation to
improve session performance. Aviod transformation errors to improve the session performance.
If the session containd lookup transformation you can improve the session performance by enabling the look up cache. If your session contains filter transformation ,create that filter transformation nearer to the sources or you can use filter condition in source qualifier. Aggreagator,Rank and joiner transformation may oftenly
decrease the session performance .Because they must group data before processing it. To improve session performance in this case use sorted ports option.
--------------------------------------------------------------------
Question - What are the different threads in DTM process?
Answer# 1 -
Master thread: Creates and manages all other threads
Maping thread: One maping thread will be creates for each session. Fectchs session and maping information.
Pre and post session threads: This will be created toperform pre and post session operations.
Reader thread: One thread will be created for each partition
of a source.It reads data from source.
Writer thread: It will be created to load data to the target.
Transformation thread: It will be created to tranform data.
--------------------------------------------------------------
Question - what is incremantal aggregation?
Answer# 1 - When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This allows the Informatica Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session.
-------------------------------------------
Question - What is a command that used to run a batch?
Answer# 1 - pmcmd is used to start a batch.
------------------------------------------------
Question - How can you recognise whether or not the newly added rows in
the source r gets insert in the target ?
Answer# 1 - In the Type2 maping we have three options to recognise the newly added rows
Version number
Flagvalue
Effective date Range
-------------------------------------------------
Question - What is the Router transformation?
Answer# 1 - A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. However, a Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. A Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group. If you need to test the same input data based on multiple conditions, use a Router Transformation in a mapping instead of creating multiple Filter transformations to perform the same task.
------------------------------------------------
Question - How can u work with remote database in informatica?did you work directly by using remote connections?
Answer# 1 - To work with remote datasource u need to connect it with remote connections. But it is not preferable to work with that remote source directly by using remote connections .Instead u bring that source into U r local machine where informatica server resides. If u work directly with remote source the session performance will decreases by passing less amount of data across the network in a particular time.
-----------------------------------------------
Question - How to recover the standalone session?
Answer# 1 - A standalone session is a session that is not nested in a batch. If a standalone session fails, you can run recovery using a menu command or pmcmd. These options are not available for batched sessions.
To recover sessions using the menu:
1. In the Server Manager, highlight the session you want to recover.
2. Select Server Requests-Stop from the menu.
3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from the menu.
To recover sessions using pmcmd:
1.From the command line, stop the session.
2. From the command line, start recovery.
---------------------------------------------------------
Question connected and unconnected lookups?
Answer# 1 - Connect lookup returns multiple columns for same whereas UnConnect Lookup has only one out put port for each rows
--------------------------------------------------
Question - What are the types of data that passes between informatica server and stored procedure?
Answer# 1 - 3 types of data
Input/Out put parameters
Return Values
Status code.
--------------------------------------------------
Question - Can you copy the batches?
Ans - No
----------------------------------------------
Question - What are the different options used to configure the sequential batches?
Answer# 1 - Two options
Run the session only if previous session completes sucessfully. Always runs the session
---------------------------------------
Question - What is DTM process
Answer# 1 - After the loadmanger performs validations for session,it creates the DTM process.DTM is to create and manage the threads that carry out the session tasks.I creates the master thread.Master thread creates and manges all the other threads.
---------------------------------------
Question - How can yoU improve session performance in aggregator transformation?
Answer# 1 - Use sorted input. The aggregator stores data in the aggregate cache until it completes aggregate calculations. When u run a session that uses an aggregator transformation, the informatica server
creates index and data caches in memory to process the transformation. If the informatica server requires more space, it stores overflow values in cache files.
Answer# 2 - use sorted input option to decrease the use of aggregator cache. use filter transformation before aggregator transformation to to rduce unnecessary aggregation. Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.
---------------------------------------------------------
Question - What are the rank caches?
Answer# 1 - During the session ,the informatica server compares an inout row with rows in the datacache. If the input row out-ranks a stored row, the informatica server replaces the stored row with the input row. The informatica server stores group information in an index cache and row data in a data cache.
--------------------------------------------------------
Question - What are the types of lookup caches?
Answer# 1 - Persistent cache: yoU can save the lookup cache files and reuse them the next time the informatica server processes a lookup transformation configured to use the cache. Recache from database: If the persistent cache is not synchronized with the lookup table, yoU can configure the lookup transformation to rebuild the lookup cache.
Static cache: U can configure a static or readonly cache for only lookup table. By default informatica server creates a static cache. It caches the lookup table and lookup values
in the cache for each row that comes into the transformation. when the lookup condition is true, the
informatica server does not update the cache while it prosesses the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache and the target, you can create a look up transformation to use dynamic cache. The informatica server dynamically inerts data to the target table.
shared cache: yoU can share the lookup cache between multiple transactions.yoU can share unnamed cache between transformations inthe same maping.
---------------------------------------------------------------
Question - How can yoU create or import flat file definition in to the warehouse designer?
Answer# 1 - yoU can not create or import flat file defintion in to warehouse designer directly. Instead U must analyze the file in source analyzer,then drag it into the warehouse designer. When U drag the flat file source defintion into warehouse desginer workspace, the warehouse designer creates a relational target defintion not a file defintion. If u want to load to a file, configure the session to write to a flat file. When the informatica server runs the session, it creates and loads the flatfile.
-----------------------------------------------------------
Question - If a session fails after loading of 10,000 records in to the target. How can you load the records from 10001 th record when u run the session next time?
Answer# 1 - As explained above informatcia server has 3 methods to recovering the sessions.Use performing recovery to load the records from where the session fails.
-------------------------------------------------------
Question - What are the session parameters?
Answer# 1 Session parameters r like maping parameters,represent values U might want to change between
sessions such as database connections or source files.
Server manager also allows U to create userdefined session parameters.Following are user defined session parameters.
Database connections
location of Source file names: use this parameter when u want to change the name or session
source file between session runs
location of Target file name : Use this parameter when u want to change the name or session
target file between session runs.
location of Reject file name : Use this parameter when u want to change the name or session
reject files between session runs.
-------------------------------------------------------------
Question - What are the types of lookup caches?
Answer# 1 - Persistent cache: yoU can save the lookup cache files and reuse them the next time the informatica server processes a lookup transformation configured to use the cache.
Recache from database: If the persistent cache is not synchronized with he lookup table, yoU can configure the lookup transformation to rebuild the lookup cache.
Static cache: U can configure a static or readonly cache for only lookup table. By default informatica server creates a static cache. It caches the lookup table and lookup values in the cache for each row that comes into the transformation. when the lookup condition is true, the informatica server does not update the cache while it prosesses the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache and the target, you can create a look up transformation to use dynamic cache.The informatica server dynamically inserts data to the target table.
shared cache: yoU can share the lookup cache between multiple transactions.yoU can share unnamed cache between transformations inthe same maping.
----------------------------------------------------------
=====================================
Differences between connected and unconnected lookup?
Connected lookup:-
1> Receives input values diectly from the pipe line.
2> You can use a dynamic or static cache.
3> Cache includes all lookup columns used in the maping.
4> Support user defined default values.
Unconnected lookup:-
1> Receives input values from the result of a lkp expression in a another transformation.
2> You can use a static cache.
3> Cache includes all lookup out put ports in the lookup condition and the lookup/return port.
4> Does not support user defiend default values.
----------------------------------------------------------------------------------
What is meant by lookup caches?
The informatica server builds a cache in memory when it processes the first row af a data in a cached look up transformation.It allocates memory for the cache based on the amount you configure in the transformation or session properties.The informatica server stores condition values in the index cache and output values in the data cache
-------------------------------------------
What are the rank caches?
During the session ,the informatica server compares an input row with rows in the datacache.If the input row out-ranks a stored row,the informatica server replaces the stored row with the input row.The informatica server stores group information in an index cache and row data in a data cache.
----------------------------------------------------------
Question - What is difference between IIF and DECODE function
You can use nested IIF statements to test multiple conditions. The following example tests for various conditions and returns 0 if sales is zero or negative:
IIF( SALES > 0, IIF( SALES <>
You can use DECODE instead of IIF in many cases. DECODE may improve readability. The following shows how you can use DECODE instead of IIF :
SALES > 0 and SALES <>
SALES > 49 AND SALES <>
SALES > 99 AND SALES <>
SALES > 199, BONUS)
------------------------------------------------------------------------------------
What are the diffrence between joiner transformation and source qualifier transformation?
You can join hetrogenious data sources in joiner transformation which we can not achieve in source qualifier transformation
You need matching keys to join two relational sources in source qualifier transformation.Where as you doesn’t need matching keys to join two sources
Two relational sources should come from same datasource in sourcequalifier.You can join relatinal sources which are coming from diffrent sources also.
---------------------------------------------------------
What is aggregate cache in aggregator transforamtion?
The aggregator stores data in the aggregate cache until it completes aggregate calculations.When you run a session that uses an aggregator transformation,the informatica server creates index and data caches in memory to process the transformation.If the informatica server requires more space,it stores overflow values in cache files.
-----------------------------------------------------
What is the default join that source qualifier provides?
Inner equi join.
-------------------------------------------
Can we use aggregator/active transformation after update strategy transformation
You can use aggregator after update strategy. The problem will be, once you perform the update strategy, say you had flagged some rows to be deleted and you had performed aggregator transformation for all rows, say you are using SUM function, then the deleted rows will be subtracted from this aggregator transformation
-------------------------------------------------------------------
Can we run a group of sessions without using workflow manager
ya Its Posible using pmcmd Command with out using the workflow Manager run the group of session as per my knowledge i give the answer ?????????????????
--------------------------------------------------
How to get two targets T1 containing distinct values and T2 containing duplicate values from one source S1.
Use filter transformation for loading the target with no duplicates. and for the other transformation load it directly from source.
----------------------------------------------
Use filter transformation for loading the target with no duplicates. and for the other transformation load it directly from source.
views- view is a combination of one or more table.view does not stores the data,it just store the query in file format.If we excutes the query the query will fetch the data from the tables and just make it to view for us.
materilized view - It stores the query result, which is used most often. OR it stores the data/query , which is previously computed.
----------------------------------------------------
What is incremantal aggregation?
When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This allows the Informatica Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session.
---------------------------------------------
Explain about Recovering sessions?
If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
· Run the session again if the Informatica Server has not issued a commit.
· Truncate the target tables and run the session again if the session is not recoverable.
· Consider performing recovery if the Informatica Server has issued at least one commit
------------------------------------------
What are various types of Aggregation?
Various types of aggregation are SUM, AVG, COUNT, MAX, MIN, FIRST, LAST, MEDIAN, PERCENTILE, STDDEV, and VARIANCE.
---------------------------------------
What is Router transformation?
Router transformation allows you to use a condition to test data. It is similar to filter transformation. It allows the testing to be done on one or more conditions.
------------------------------------
How many ways you can update a relational source defintion and what are they?
Two ways:- 1. Edit the definition 2. Reimport the defintion.
------------------------------------
What is incremantal aggregation?
When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This allows the Informatica Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session.
-----------------------------------
How can you create or import flat file definition in to the warehouse designer?
You can not create or import flat file defintion in to warehouse designer directly. Instead you must analyze the file in source analyzer,then drag it into the warehouse<>
-------------------------------------
What are the designer tools for creating tranformations?
Mapping designer
Tansformation developer
Mapplet designer.
------------------------------
How many ways you create ports?
Two ways:-
1.Drag the port from another transforamtion
2.Click the add buttion on the ports tab.
-----------------------------
Can you use the maping parameters or variables created in one maping into any other reusable transformation?
Yes.Because reusable tranformation is not contained with any maplet or maping
-----------------------------
How can you improve session performance in aggregator transformation?
Use sorted input.
------------------------------
What are the joiner caches?
When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and builds index and data caches based on the
master rows.
After building the caches, the Joiner transformation reads records from the detail source and perform joins
------------------------------
Why use the lookup transformation?
To perform the following tasks:-
Get a related value. For example, if your source table includes employee ID, but you want to include the employee name in your target table to make your summary data
easier to read.
Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net
sales).
Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records already exist in the target.
---------------------------------
How the informatica server sorts the string values in Ranktransformation?
When the informatica server runs in the ASCII data movement mode it sorts session data using Binary sortorder.If you configure the seeion to use a binary sort order,the
informatica server caluculates the binary value of each string and returns the specified number of rows with the higest binary values for the string. ?????????????????????
--------------------------------------
What is the target load order?
You specify the target loadorder based on source qualifiers in a maping.If you have the multiple source qualifiers connected to the multiple targets,You can designatethe order in which informatica server loads data into the targets.
--------------------------------
What is the default join that source qualifier provides?
Inner equi join.
------------------------
What are the basic needs to join two sources in a source qualifier?
Two sources should have primary and Foreign key relation ships.
Two sources should have matching data types.
-------------------------
What is update strategy transformation?
This transformation is used to maintain the history data or just most recent changes in to target table
---------------------
Describe two levels in which update strategy transformation sets?
Within a session: When you configure a session, yoYou can instruct the Informatica Server to either treat all records in the same way (for example, treat all records as inserts), or use instructions coded into the session mapping to flag records for different database operations.
Within a mapping: Within a mapping, you use the Update Strategy transformation to flag records for insert, delete, update, or reject.
-----------------------------
What are the types of maping wizards that are to be provided in Informatica?
The Designer provides two mapping wizards to help you create mappings quickly and easily. Both wizards are designed to create mappings for loading and maintaining star
schemas, a series of dimensions related to a central fact table.
Getting Started Wizard. Creates mappings to load static fact and dimension tables, as well as slowly growing dimension tables.
Slowly Changing Dimensions Wizard. Creates mappings to load slowly changing dimension tables based on the amount of historical dimension data you want to keep and the method you choose to handle historical dimension data.
----------------------------------
What are the types of maping in Getting Started Wizard?
Simple Pass through maping :
Loads a static fact or dimension table by inserting all rows. Use this mapping when you want to drop all existing data from your table before loading new data.
Slowly Growing target :
Loads a slowly growing fact or dimension table by inserting new rows. Use this mapping to load new data when existing data does not require updates.
--------------------------------
Can you generate reports in Informatcia?
Yes. By using Metadata reporter we can generate reports in informatica
--------------------------
What are the out put files that the informatica server creates during the session running?
Informatica server log: Informatica server(on unix) creates a log for all status and error messages (default name: pm.server.log). It also creates an error log for error
messages.
These files will be created in informatica home directory:-
Session log file: Informatica server creates session log file for each session.It writes information about session into log files such as initialization process,creation of sql
commands for reader and writer threads,errors encountered and load summary.The amount of detail in session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each targets in mapping.Session detail include information such as table name,number of rows written or rejected.U
can view this file by double clicking on the session in monitor window
Performance detail file: This file contains information known as session performance details which helps you where performance can be improved.To genarate this file select
the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does notwrite to targets.
Control file: Informatica server creates control file and a target file when you run a session that uses the external loader.The control file contains the information about the
target flat file such as data format and loading instructios for the external loader.
Post session email: Post session email allows you to automatically communicate information about a session run to designated recipents.You can create two different
messages.One if the session completed sucessfully the other if the session fails.
Indicator file: If you use the flat file as a target,You can configure the informatica server to create indicator file.For each target row,the indicator file contains a number to indicate
whether the row was marked for insert,update,delete or reject.
output file: If session writes to a target file,the informatica server creates the target file based on file prpoerties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache files.
For the following circumstances informatica server creates index and datacache files:-
Aggreagtor transformation
Joiner transformation
Rank transformation
Lookup transformation
===================================================
What is source qualifier transformation?
When you add a relational or a flat file source definition to a maping,U need to connect it to a source qualifer transformation.The source qualifier transformation represnets the records that the informatica server reads when it runs a session
-----------------------------------------------
What are the tasks that source qualifier performs?
Join data originating from same source data base.
Filter records when the informatica server reads source data.
Specify an outer join rather than the default inner join specify sorted records.
Select only distinct values from the source.
Creating custom query to issue a special SELECT statement for the informatica server to read source data.
--------------------------------------------
Why you use repository connectivity?
When you edit,schedule the sesion each time,informatica server directly communicates the repository to check whether or not the session and users are valid.All the metadata of sessions and mappings will be stored in repository.
----------------------------------------
What are the out put files that the informatica server creates during the session running?
Informatica server log: Informatica server(on unix) creates a log for all status and error messages(default name: pm.server.log). It also creates an error log for error
messages.
These files will be created in informatica home directory:-
Session log file: Informatica server creates session log file for each session.It writes information about session into log files such as initialization process,creation of sql
commands for reader and writer threads,errors encountered and load summary.The amount of detail in session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each targets in mapping.Session detail include information such as table name,number of rows written or rejected.U
can view this file by double clicking on the session in monitor window
Performance detail file: This file contains information known as session performance details which helps you where performance can be improved.To genarate this file select
the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does notwrite to targets.
Control file: Informatica server creates control file and a target file when you run a session that uses the external loader.The control file contains the information about the
target flat file such as data format and loading instructios for the external loader.
Post session email: Post session email allows you to automatically communicate information about a session run to designated recipents.You can create two different
messages.One if the session completed sucessfully the other if the session fails.
Indicator file: If you use the flat file as a target,You can configure the informatica server to create indicator file.For each target row,the indicator file contains a number to indicate
whether the row was marked for insert,update,delete or reject.
output file: If session writes to a target file,the informatica server creates the target file based on file prpoerties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache files.
For the following circumstances informatica server creates index and datacache files:-
Aggreagtor transformation
Joiner transformation
Rank transformation
Lookup transformation
----------------------------------------------------------
In which circumstances that informatica server creates Reject files?
When it encounters the DD_Reject in update strategy transformation.
Violates database constraint
Filed in the rows was truncated or overflowed.
--------------------------------------------------------
What is polling?
It displays the updated information about the session in the monitor window. The monitor window displays the status of each session when you poll the informatica server.
-------------------------------------------------
What are the different options used to configure the sequential batches?
Two options
Run the session only if previous session completes sucessfully. Always runs the session.
-----------------------------------------------------
In a sequential batch can you run the session if previous session fails
Yes.By setting the option always runs the session
----------------------------------------------------
Can you start a batches with in a batch?
You can not. If you want to start batch that resides in a batch,create a new independent batch and copy the necessary sessions into the new batch.
-------------------------------------
Can you start a session inside a batch idividually?
We can start our required session only in case of sequential batch. in case of concurrent batch we cant do like this.
----------------------------------
Performance tuning in Informatica?
The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server.Increase the session
performance by following.
The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk
moves data five to twenty times faster. Thus network connections ofteny affect on session performance.So aviod netwrok connections.
Flat files: If u’r flat files stored on a machine other than the informatca server, move those files to the machine that consists of informatica server.
Relational datasources: Minimize the connections to sources ,targets and informatica server to
improve session performance.Moving target database into server system may improve session
performance.
Staging areas: If u use staging areas u force informatica server to perform multiple datapasses.
Removing of staging areas may improve session performance.
You can run the multiple informatica servers againist the same repository.Distibuting the session load to multiple informatica servers may improve session performance.
Run the informatica server in ASCII datamovement mode improves the session performance.Because ASCII datamovement mode stores a character value in one
byte.Unicode mode takes 2 bytes to store a character.
If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or
GROUP BY clause may benefit from optimization such as adding indexes.
We can improve the session performance by configuring the network packet size,which allows
data to cross the network at one time.To do this go to server manger ,choose server configure database connections.
If u are target consists key constraints and indexes u slow the loading of data.To improve the session performance in this case drop constraints and indexes before u run the
session and rebuild them after completion of session.
Running a parallel sessions by using concurrent batches will also reduce the time of loading the
data.So concurent batches may also increase the session performance.
Partittionig the session improves the session performance by creating multiple connections to sources and targets and loads data in paralel pipe lines.
In some cases if a session contains a aggregator transformation ,You can use incremental aggregation to improve session performance.
Aviod transformation errors to improve the session performance.
If the sessioin containd lookup transformation You can improve the session performance by enabling the look up cache.
If U’r session contains filter transformation ,create that filter transformation nearer to the sources
or You can use filter condition in source qualifier.
Aggreagator,Rank and joiner transformation may oftenly decrease the session performance .Because they must group data before processing it.To improve session
performance in this case use sorted ports option.
---------------------------------------------
==============================
===========================
Less Important Questions -
------
How can you stop a batch?
By using server manager or pmcmd.
----------------------------------
What are the session parameters?
Session parameters are like maping parameters,represent values you might want to change between sessions such as database connections or source files.
Server manager also allows you to create userdefined session parameters.Following are user defined session parameters:-
Database connections
Source file names: use this parameter when you want to change the name or location of
session source file between session runs.
Target file name : Use this parameter when you want to change the name or location of
session target file between session runs.
Reject file name : Use this parameter when you want to change the name or location of
session reject files between session runs.
-----------------------------
What is parameter file?
Parameter file is to define the values for parameters and variables used in a session.A parameter
file is a file created by text editor such as word pad or notepad.
You can define the following values in parameter file:-
Maping parameters
Maping variables
session parameters.
------------------------------
How can you access the remote source into your session?
Relational source: To acess relational source which is situated in a remote place ,u need to configure database connection to the datasource.
FileSource : To access the remote source file you must configure the FTP connection to the host machine before you create the session.
Hetrogenous : When U’r maping contains more than one source type,the server manager creates a hetrogenous session that displays source options for all types.
--------------------------------
What is difference between partioning of relatonal target and partitioning of file targets?
If u parttion a session with a relational target informatica server creates multiple connections to the target database to write target data concurently.If u partition a session with a file target the informatica server creates one target file for each partition.You can configure session properties to merge these target files.
----------------------------
What are the transformations that restricts the partitioning of sessions?
Advanced External procedure tranformation and External procedure transformation: This
transformation contains a check box on the properties tab to allow partitioning.
Aggregator Transformation: If u use sorted ports You can not parttion the assosiated source
Joiner Transformation : You can not partition the master source for a joiner transformation
Normalizer Transformation
XML targets.
-------------------------------
Define maping and sessions?
Maping: It is a set of source and target definitions linked by transformation objects that define the rules for transformation.
Session : It is a set of instructions that describe how and when to move data from source to targets.
------------------------
Which tool you use to create and manage sessions and batches and to monitor and stop the informatica server?
Informatica Server Manager.
------------------------------------------
Why we use partitioning the session in informatica?
Partitioning achieves the session performance by reducing the time period of reading the source and loading the data into target.
-------------------------------
To achieve the session partition what are the necessary tasks you have to do?
Configure the session to partition source data.
Install the informatica server on a machine with multiple CPU’s.
-------------------------------------
How the informatica server increases the session performance through partitioning the source?
For a relational sources informatica server creates multiple connections for each parttion of a single source and extracts seperate range of data for each connection.
Informatica server reads multiple partitions of a single source concurently.Similarly for loading also informatica server creates multiple connections to the target and loads partitions of data concurently.
For XML and file sources,informatica server reads multiple files concurently.For loading the data informatica server creates a seperate file for each partition(of a source file). You can choose to merge the targets.
-----------------------------------------------
What are the tasks that Loadmanger process will do?
Manages the session and batch scheduling: Whe you start the informatica server the load maneger launches and queries the repository for a list of sessions configured to run
on the informatica server.When you configure the session the loadmanager maintains list of list of sessions and session start times.When you sart a session loadmanger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process.
Locking and reading the session: When the informatica server starts a session lodamaager locks the session from the repository.Locking prevents you starting the session again and again.
Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file and verifies that the session level parematers are declared in the file
Verifies permission and privelleges: When the sesson starts load manger checks whether or not the user have privelleges to run the session.
Creating log files: Loadmanger creates logfile contains the status of session.
-----------------------------------------------
What are the data movement modes in informatcia?
Datamovement modes determines how informatcia server handles the charector data. U choose the data movement in the informatica server configuration settings.
Two types of datamovement modes avialable in informatica:-
ASCII mode
Uni code mode.
----------------------------------------
Can you copy the session to a different folder or repository?
Yes. By using copy session wizard You can copy a session in a different folder or repository. But that target folder or repository should consists of mapping of that session.
If target folder or repository is not having the maping of copying session ,
You should have to copy that maping first before you copy the session.
-------------------------------
What is batch and describe about types of batches?
Grouping of session is known as batch. Batches are two types:-
Sequential: Runs sessions one after the other
Concurrent: Runs session at same time.
If you have sessions with source-target dependencies you have to go for sequential batch to start the sessions one after another.If you have several independent sessions You can use concurrent batches which runs all the sessions at the same time
-----------------------------------
How many number of sessions that You can create in a batch?
Any number of sessions.
--------------------------------
When the informatica server marks that a batch is failed?
If one of session is configured to "run if previous completes" and that previous session fails.
---------------------------------------
What is a command that used to run a batch?
pmcmd is used to start a batch
-------------------------------------
What is the status code?
Status code provides error handling for the informatica server during the session.The stored procedure issues a status code that notifies whether or not stored procedure
completed sucessfully.This value can not seen by the user.It only used by the informatica server to determine whether to continue running the session or stop.
---------------------------------
What is the default source option for update stratgey transformation?
Data driven.
-------------------------------
What is Datadriven?
The informatica server follows instructions coded into update strategy transformations with in the session maping determine how to flag records for insert, update, delete or
reject. If you do not choose data driven option setting,the informatica server ignores all update strategy transformations in the mapping.
-----------------------
What are the mapings that we use for slowly changing dimension table?
Type1: Rows containing changes to existing dimensions are updated in the target by overwriting the existing dimension. In the Type 1 Dimension mapping, all rows contain
current dimension data.
Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.
Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target. Changes are tracked in the target table by versioning the primary
key and creating a version number for each dimension in the table.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you want to keep a full history of dimension data in the table. Version numbers and versioned primary keys track the order of changes to each dimension.
Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and inserts only those found to be new dimensions to the target. Rows
containing changes to existing dimensions are updated in the target. When updating an existing dimension, the Informatica Server saves existing data in different columns
of the same row and replaces the existing data with the updates.
----------------------------------------
How can you recognise whether or not the newly added rows in the source are gets insert in the target?
In the Type2 maping we have three options to recognise the newly added rows
Version number
Flagvalue
Effective date Range.
---------------------------------------
What are two types of processes that informatica runs the session?
Load manager Process: Starts the session, creates the DTM process, and sends post-session email when the session completes.
The DTM process. Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session operations.
----------------------------------------
What is metadata reporter?
It is a web based application that enables you to run reports againist repository metadata.
With a meta data reporter,You can access information about U’r repository with out having knowledge of sql,transformation language or underlying tables in the repository
--------------------------------------
In which condtions we can not use joiner transformation(Limitaions of joiner transformation)?
Both pipelines begin with the same original data source.
Both input pipelines originate from the same Source Qualifier transformation.
Both input pipelines originate from the same Normalizer transformation.
Both input pipelines originate from the same Joiner transformation.
Either input pipelines contains an Update Strategy transformation.
Either input pipelines contains a connected or unconnected Sequence Generator transformation
------------------------------------------------------------------
What are the settiings that you use to cofigure the joiner transformation?
Master and detail source
Type of join
Condition of the join.
---------------------------------------------------
What are the join types in joiner transformation
Normal (Default)
Master outer
Detail outer
Full outer.
---------------------------------------------------
What is the look up transformation?
Use lookup transformation in u’r mapping to lookup data in a relational table,view,synonym.
Informatica server queries the look up table based on the lookup ports in the transformation.It compares the lookup transformation port values to lookup table column values based on the look up condition.
------------------------------------------------------
What are the types of lookup?
Connected and unconnected lookups
-------------------------------
What are the tasks that source qualifier performs?
Join data originating from same source data base.
Filter records when the informatica server reads source data.
Specify an outer join rather than the default inner join specify sorted records.
Select only distinct values from the source.
Creating custom query to issue a special SELECT statement for the informatica server to read source data.
-------------------------------
Differences between connected and unconnected lookup?
Connected lookup:-
1> Receives input values diectly from the pipe line.
2> You can use a dynamic or static cache.
3> Cache includes all lookup columns used in the maping.
4> Support user defined default values.
Unconnected lookup:-
1> Receives input values from the result of a lkp expression in a another transformation.
2> You can use a static cache.
3> Cache includes all lookup out put ports in the lookup condition and the lookup/return port.
4> Does not support user defiend default values.
------------------------------------------
What is meant by lookup caches?
The informatica server builds a cache in memory when it processes the first row af a data in a cached look up transformation.It allocates memory for the cache based on the
amount you configure in the transformation or session properties.The informatica server stores condition values in the index cache and output values in the data cache.
---------------------------------------
How the informatica server sorts the string values in Ranktransformation?
When the informatica server runs in the ASCII data movement mode it sorts session data using Binary sortorder.If you configure the seeion to use a binary sort order,the
informatica server caluculates the binary value of each string and returns the specified number of rows with the higest binary values for the string.
--------------------------------------------
What are the rank caches?
During the session ,the informatica server compares an input row with rows in the datacache.If the input row out-ranks a stored row,the informatica server replaces the
stored row with the input row.The informatica server stores group information in an index cache and row data in a data cache.
-------------------------------------------
What is the Rankindex in Ranktransformation?
The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses the Rank Index port to store the ranking position for
each record in a group. For example, if you create a Rank transformation that ranks the top 5 salespersons for each quarter, the rank index numbers the sales people from 1
to 5.
---------------------------------
What is the Router transformation?
A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. However, a Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. A Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group.
If you need to test the same input data based on multiple conditions, use a Router Transformation in a mapping instead of creating multiple Filter transformations to perform the same task.
--------------------------------------------
What are the types of groups in Router transformation?
Input group Output group
The designer copies property information from the input ports of the input group to create a set of output ports for each output group.
Two types of output groups:-
User defined groups
Default group.
You can not modify or delete default groups
---------------------------------------------
Why we use stored procedure transformation?
For populating and maintaining data bases
---------------------------------------
What are the types of data that passes between informatica server and stored procedure?
Three types of data:-
Input/Out put parameters
Return Values
Status code.
-------------------------------------------
Why we use lookup transformations?
Lookup Transformations can access data from relational tables that are not sources in mapping. With Lookup transformation, we can acomplish the following tasks:-
Get a related value-Get the Employee Name from Employee table based on the Employee IDPerform Calculation.
Update slowly changing dimension tables - We can use unconnected lookup transformation to determine whether the records already exist in the target or not.
---------------------------------------------
While importing the relational source defintion from database, what are the meta data of source you import?
Source name
Database location
Column names
Datatypes
Key constraints.
-------------------------------------------
Where should you place the flat file to import the flat file defintion to the designer?
Place it in local folder
-------------------------------
What is a transforamation?
It is a repostitory object that generates,modifies or passes data.
------------------------