Tuesday, April 17, 2007

Informatica transformations

Informatica Transformations





Aggregator Transformation (Transformation type: Connected, Active)


The Aggregator transformation allows you to perform aggregate calculations, such as averages and sums. The Aggregator transformation is unlike the Expression transformation, in that you can use the Aggregator transformation to perform calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only. When using the transformation language to create aggregate expressions, you can use conditional clauses to filter rows, providing more flexibility than SQL language ( ? ).
The PowerCenter Server performs aggregate calculations as it reads, and stores necessary data group and row data in an aggregate cache.
After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the PowerCenter Server performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally. For details on incremental aggregation, see “Using Incremental Aggregation” in the Workflow Administration Guide.



Ports in the Aggregator Transformation
To configure ports in the Aggregator transformation, complete the following tasks:
♦ Enter an expression in any output port, using conditional clauses or non-aggregate functions in the port.
♦ Create multiple aggregate output ports.
♦ Configure any input, input/output, output, or variable port as a group by port.
♦ Improve performance by connecting only the necessary input/output ports to subsequent transformations, reducing the size of the data cache.
♦ Use variable ports for local variables.
♦ Create connections to other transformations as you enter an expression.


Components of the Aggregator Transformation
The Aggregator is an active transformation, changing the number of rows in the pipeline. The
Aggregator transformation has the following components and options:
♦ Aggregate expression. Entered in an output port. Can include non-aggregate expressions
and conditional clauses.
♦Group by port. Indicates how to create groups. The port can be any input, input/output,
output, or variable port. When grouping data, the Aggregator transformation outputs the
last row of each group unless otherwise specified.
Sorted input. Use to improve session performance. To use sorted input, you must pass
data to the Aggregator transformation sorted by group by port, in ascending or descending
order.
Aggregate cache. The PowerCenter Server stores data in the aggregate cache until it
completes aggregate calculations
. It stores group values in an index cache and row data in
the data cache
.


Aggregate Caches
When you run a session that uses an Aggregator transformation, the PowerCenter Server creates index and data caches in memory to process the transformation. If the PowerCenter Server requires more space, it stores overflow values in cache files. You can configure the index and data caches in the Aggregator transformation or in the session properties. For more information, see “Creating an Aggregator Transformation” on page 11.
Note: The PowerCenter Server uses memory to process an Aggregator transformation with
sorted ports. It does not use cache memory. You do not need to configure cache memory for
Aggregator transformations that use sorted ports.


Aggregate Expressions
The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions. It can also include one aggregate function nested within another aggregate function, such as:
MAX( COUNT( ITEM ))
The result of an aggregate expression varies depending on the group by ports used in the transformation. For example, when the PowerCenter Server calculates the following aggregate expression with no group by ports defined, it finds the total quantity of items sold:
SUM( QUANTITY )
However, if you use the same expression, and you group by the ITEM port, the PowerCenter
Server returns the total quantity of items sold, by item
.
You can create an aggregate expression in any output port and use multiple aggregate ports in a transformation.







Aggregate Functions
You can use the following aggregate functions within an Aggregator transformation. You can
nest one aggregate function within another aggregate function
.
The transformation language includes the following aggregate functions:
♦ AVG
♦ COUNT
♦ FIRST
♦ LAST
♦ MAX
♦ MEDIAN
♦ MIN
♦ PERCENTILE
♦ STDDEV
♦ SUM
♦ VARIANCE
When you use any of these functions, you must use them in an expression within an
Aggregator transformation. For a description of these functions, see “Functions” in the
Transformation Language Reference.












Nested Aggregate Functions
You can include multiple single-level or multiple nested functions in different output ports in
an Aggregator transformation
. However, you cannot include both single-level and nested
functions in an Aggregator transformation
. Therefore, if an Aggregator transformation
contains a single-level function in any output port, you cannot use a nested function in any
other port in that transformation.
When you include single-level and nested functions in the
same Aggregator transformation, the Designer marks the mapping or mapplet invalid
. If you
need to create both single-level and nested functions, create separate Aggregator transformations.







Conditional Clauses
You can use conditional clauses in the aggregate expression to reduce the number of rows used
in the aggregation. The conditional clause can be any clause that evaluates to TRUE or
FALSE.
For example, you can use the following expression to calculate the total commissions of
employees who exceeded their quarterly quota:
SUM( COMMISSION, COMMISSION > QUOTA )







Non-Aggregate Functions
You can also use non-aggregate functions in the aggregate expression.
The following expression returns the highest number of items sold for each item (grouped by
item). If no items were sold, the expression returns 0.
IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))






Null Values in Aggregate Functions






When you configure the PowerCenter Server, you can choose how you want the PowerCenter
Server to handle null values in aggregate functions. You can choose to treat null values in
aggregate functions as NULL or zero. By default, the PowerCenter Server treats null values as
NULL in aggregate functions.
For details on changing this default behavior, see “Installing and Configuring the
PowerCenter Server on Windows” and “Installing and Configuring the PowerCenter Server
on UNIX” chapters in the Installation and Configuration Guide.







Group By Ports
The Aggregator transformation allows you to define groups for aggregations, rather than performing the aggregation across all input data. For example, rather than finding the total company sales, you can find the total sales grouped by region.
To define a group for the aggregate expression, select the appropriate input, input/output, output, and variable ports in the Aggregator transformation. You can select multiple group by ports, creating a new group for each unique combination of groups. The PowerCenter Server then performs the defined aggregation for each group.
When you group values, the PowerCenter Server produces one row for each group. If you do not group values, the PowerCenter Server returns one row for all input rows. The PowerCenter Server typically returns the last row of each group (or the last row received) with the result of the aggregation. However, if you specify a particular row to be returned (for example, by using the FIRST function), the PowerCenter Server then returns the specified row.
When selecting multiple group by ports in the Aggregator transformation, the PowerCenter
Server uses port order to determine the order by which it groups. Since group order can affect
your results, order group by ports to ensure the appropriate grouping. For example, the results
of grouping by ITEM_ID then QUANTITY can vary from grouping by QUANTITY then
ITEM_ID, because the numeric values for quantity are not necessarily unique.
The following Aggregator transformation groups first by STORE_ID and then by ITEM:












If you send the following data through this Aggregator transformation:
STORE_ID ITEM QTY PRICE
101 ‘battery’ 3 2.99
101 ‘battery’ 1 3.19
101 ‘battery’ 2 2.59
101 ‘AAA’ 2 2.45
201 ‘battery’ 1 1.99
201 ‘battery’ 4 1.59
301 ‘battery’ 1 2.45












The PowerCenter Server performs the aggregate calculation on the following unique groups:

STORE_ID ITEM
101 ‘battery’
101 ‘AAA’
201 ‘battery’
301 ‘battery’






The PowerCenter Server then passes the last row received, along with the results of theaggregation, as follows:







STORE_ID ITEM QTY PRICE SALES_PER_STORE
101 ‘battery’ 2 2.59 17.34
101 ‘AAA’ 2 2.45 4.90
201 ‘battery’ 4 1.59 8.35
301 ‘battery’ 1 2.45 2.45







Non-Aggregate Expressions
You can use non-aggregate expressions in group by ports to modify or replace groups. For
example, if you want to replace ‘AAA battery’ before grouping, you can create a new group by
output port, named CORRECTED_ITEM, using the following expression:
IIF( ITEM = ‘AAA battery’, battery, ITEM )
Default Values
You can use default values in the group by port to replace null input values. For example, if
you define a default value of ‘Misc’ in the ITEM column below, the PowerCenter Server
replaces null groups with ‘Misc’. This allows the PowerCenter Server to include null item
STORE_ID ITEM
101 ‘battery’
101 ‘AAA’
201 ‘battery’
301 ‘battery’
STORE_ID ITEM QTY PRICE SALES_PER_STORE
101 ‘battery’ 2 2.59 17.34
101 ‘AAA’ 2 2.45 4.90
201 ‘battery’ 4 1.59 8.35
301 ‘battery’ 1 2.45 2.45












Using Sorted Input
You can improve Aggregator transformation performance by using the sorted input option.
When you use sorted input, the PowerCenter Server assumes all data is sorted by group. As
the PowerCenter Server reads rows for a group, it performs aggregate calculations. When
necessary, it stores group information in memory. To use the Sorted Input option, you must
pass sorted data to the Aggregator transformation. You can gain performance with sorted
ports when you configure the session with multiple partitions.
When you do not use sorted input, the PowerCenter Server performs aggregate calculations as
it reads. However, since data is not sorted, the PowerCenter Server stores data for each group
until it reads the entire source to ensure all aggregate calculations are accurate.
For example, one Aggregator transformation has the STORE_ID and ITEM group by ports,
with the sorted input option selected. When you pass the following data through the
Aggregator, the PowerCenter Server performs an aggregation for the three rows in the
101/battery group as soon as it finds the new group, 201/battery:







STORE_ID ITEM QTY PRICE
101 ‘battery’ 3 2.99
101 ‘battery’ 1 3.19
101 ‘battery’ 2 2.59
201 ‘battery’ 4 1.59
201 ‘battery’ 1 1.99







If you use sorted input and do not presort data correctly, you receive unexpected results.







Sorted Input Conditions
Do not use sorted input if either of the following conditions are true:
♦ The aggregate expression uses nested aggregate functions.
♦ The session uses incremental aggregation.
If you use sorted input and do not sort data correctly, the session fails.







Pre-Sorting Data
To use sorted input, you pass sorted data through the Aggregator.
Data must be sorted as follows:
♦ By the Aggregator group by ports, in the order they appear in the Aggregator
transformation.
♦ Using the same sort order configured for the session. If data is not in strict ascending or
descending order based on the session sort order, the PowerCenter Server fails the session.












For example, if you configure a session to use a French sort order, data passing into the Aggregator transformation must be sorted using the French sort order.
For relational and file sources, you can use the Sorter transformation to sort data in the mapping before passing it to the Aggregator transformation. You can place the Sorter transformation anywhere in the mapping prior to the Aggregator if no transformation changes the order of the sorted data. Group by columns in the Aggregator transformation must be in the same order as they appear in the Sorter transformation. For details on sorting data using the Sorter transformation, see “Sorter Transformation” on page 283.
If the session uses relational sources, you can also use the Number of Sorted Ports option in the Source Qualifier transformation to sort group by columns in the source database. Group by columns must be in the same order in both the Aggregator and Source Qualifier transformations. For details on sorting data in the Source Qualifier, see “Using Sorted Ports” on page 317.
Figure 1-1 illustrates the mapping with a Sorter transformation configured to sort the source
data in descending order by ITEM_NAME:

























The Sorter transformation sorts the data as follows:

ITEM_NAME QTY PRICE
Soup 4 2.95
Soup 1 2.95
Soup 2 3.25
Cereal 1 4.49
Cereal 2 5.25





With sorted input, the Aggregator transformation returns the following results:






ITEM_NAME QTY PRICE INCOME_PER_ITEM
Cereal 2 5.25 14.99
Soup 2 3.25 21.25










Creating an Aggregator Transformation





To use an Aggregator transformation in a mapping, you add the Aggregator transformation to the mapping, then configure the transformation with an aggregate expression and group by ports, if desired.






















To create an Aggregator transformation:
1. In the Mapping Designer, choose Transformation-Create. Select the Aggregator
transformation.
2. Enter a name for the Aggregator, click Create. Then click Done.
The Designer creates the Aggregator transformation.
3. Drag the desired ports to the Aggregator transformation.
The Designer creates input/output ports for each port you include.
4. Double-click the title bar of the transformation to open the Edit Transformations dialog
box.
5. Select the Ports tab.
6. Click the group by option for each column you want the Aggregator to use in creating
groups.
You can optionally enter a default value to replace null groups.
If you want to use a non-aggregate expression to modify groups, click the Add button and enter a name and data type for the port. Make the port an output port by clearing Input (I). Click in the right corner of the Expression field, enter the non-aggregate expression using one of the input ports, then click OK. Select Group By.
7. Click Add and enter a name and data type for the aggregate expression port. Make the port an output port by clearing Input (I). Click in the right corner of the Expression field to open the Expression Editor. Enter the aggregate expression, click Validate, then click
OK.
Make sure the expression validates before closing the Expression Editor.
8. Add default values for specific ports as necessary.
If certain ports are likely to contain null values, you might specify a default value if the
target database does not handle null values.






















Aggregator Setting Description
Cache Directory -- Local directory where the PowerCenter Server creates the index and data cache files.
By default, the PowerCenter Server uses the directory entered in the Workflow Manager
for the server variable $PMCacheDir. If you enter a new directory, make sure the
directory exists and contains enough disk space for the aggregate caches.



Tracing Level -- Amount of detail displayed in the session log for this transformation.
Sorted Input -- Indicates input data is presorted by groups. Select this option only if the mapping
passes sorted data to the Aggregator transformation.



Aggregator Data Cache Size -- Data cache size for the transformation. Default cache size is 2,000,000 bytes ( 2 MB). If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must
run the session on a 64-bit PowerCenter Server.



Aggregator Index Cache Size -- Index cache size for the transformation. Default cache size is 1,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must
run the session on a 64-bit PowerCenter Server.




Transformation Scope - Specifies how the PowerCenter Server applies the transformation logic to incoming
data:
- Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions.
- All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.
For more information about transformation scope, see .Understanding Commit Points.
in the Workflow Administration Guide.


0. Click OK.
11. Choose Repository-Save to save changes to the mapping.



Tips
You can use the following guidelines to optimize the performance of an Aggregator transformation.
Use sorted input to decrease the use of aggregate caches.
Sorted input reduces the amount of data cached during the session and improves session
performance
. Use this option with the Sorter transformation to pass sorted data to the
Aggregator transformation.
Limit connected input/output or output ports.
Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.
Filter before aggregating.
If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation.



Troubleshooting
I selected sorted input but the workflow takes the same amount of time as before.
You cannot use sorted input if any of the following conditions are true:
♦ The aggregate expression contains nested aggregate functions.
♦ The session uses incremental aggregation.
♦ Source data is data driven.

When any of these conditions are true, the PowerCenter Server processes the transformation
as if you do not use sorted input.
A session using an Aggregator transformation causes slow performance.
The PowerCenter Server may be paging to disk during the workflow. You can increase session
performance by increasing the index and data cache sizes in the transformation properties
. For more information about caching, see “Session Caches” in the Workflow Administration Guide.
I entered an override cache directory in the Aggregator transformation, but the
PowerCenter Server saves the session incremental aggregation files somewhere else.
You can override the transformation cache directory on a session level. The PowerCenter
Server notes the cache directory in the session log. You can also check the session properties
for an override cache directory.


===========================================================




Custom Transformation ( Transformation type: Active/Passive , Connected)




Custom transformations operate in conjunction with procedures you create outside of the Designer interface to extend Power Center functionality. You can create a Custom transformation and bind it to a procedure that you develop using the functions described in “Custom Transformation Functions” on page 51.
You can use the Custom transformation to create transformation applications, such as sorting and aggregation, which require all input rows to be processed before outputting any output rows. To support this process, the input and output functions occur separately in Custom transformations compared to External Procedure transformations.
The PowerCenter Server passes the input data to the procedure using an input function. The
output function is a separate function that you must enter in the procedure code to pass
output data to the PowerCenter Server
. In contrast, in the External Procedure transformation,
an external procedure function does both input and output, and its parameters consist of all
the ports of the transformation.
You can also use the Custom transformation to create a transformation that requires multiple
input groups, multiple output groups, or both. A group is the representation of a row of data entering or leaving a transformation. For example, you might create a Custom transformation with one input group and multiple output groups that parses XML data. Or, you can create a Custom transformation with two input groups and one output group that merges two streams of input data into one stream of output data.










Code Page Compatibility
The Custom transformation procedure code page is the code page of the data the Custom transformation procedure processes. The following factors determine the Custom transformation procedure code page:




♦ PowerCenter Server data movement mode
♦ The INFA_CTChangeStringMode() function
♦ The INFA_CTSetDataCodePageID() function
The Custom transformation procedure code page must be two-way compatible with the
PowerCenter Server code page. The PowerCenter Server passes data to the procedure in the
Custom transformation procedure code page. Also, the data the procedure passes to the
PowerCenter Server must be valid characters in the Custom transformation procedure code
page.
By default, when the PowerCenter Server runs in ASCII mode, the Custom transformation procedure code page is ASCII. Also, when the PowerCenter Server runs in Unicode mode, the
Custom transformation procedure code page is UCS-2, but the PowerCenter Server only
passes characters that are valid in the PowerCenter Server code page.
However, you can use the INFA_CTChangeStringMode() functions in the procedure code to
request the data in a different format. In addition, when the PowerCenter Server runs in
Unicode mode, you can request the data in a different code page using the
INFA_CTSetDataCodePageID() function.
Changing the format or requesting the data in a different code page changes the Custom
transformation procedure code page to the code page the procedure requests:
♦ ASCII mode. You can write the external procedure code to request the data in UCS-2
format using the INFA_CTChangeStringMode() function. When you use this function,
the procedure must pass only ASCII characters in UCS-2 format to the PowerCenter
Server. Do not use the INFA_CTSetDataCodePageID() function when the PowerCenter
Server runs in ASCII mode.
♦ Unicode mode. You can write the external procedure code to request the data in MBCS
using the INFA_CTChangeStringMode() function. When the external procedure requests
the data in MBCS, the PowerCenter Server passes the data in the PowerCenter Server code
page. When you use the INFA_CTChangeStringMode() function, you can write the
external procedure code to request the data in a different code page from the PowerCenter
Server code page using the INFA_CTSetDataCodePageID() function. The code page you
specify in the INFA_CTSetDataCodePageID() function must be two-way compatible with
the PowerCenter Server code page.



Note: You can also use the INFA_CTRebindInputDataType() function to change the format
for a specific port in the Custom transformation.






Distributing Custom Transformation Procedures
You can copy a Custom transformation from one repository to another. When you copy a
Custom transformation between repositories, you must verify that the PowerCenter Server
machine the target repository uses contains the Custom transformation procedure.






Creating Custom Transformations
You can create reusable Custom transformations in the Transformation Developer, and add instances of the transformation to mappings. You can create non-reusable Custom transformations in the Mapping Designer or Mapplet Designer.
Each Custom transformation specifies a module and a procedure name. You can create a Custom transformation based on an existing shared library or DLL containing the procedure, or you can create a Custom transformation as the basis for creating the procedure. When you
create a Custom transformation to use with an existing shared library or DLL, make sure you define the correct module and procedure name.
When you create a Custom transformation as the basis for creating the procedure, select the transformation and generate the code. The Designer uses the transformation properties when it generates the procedure code. It generates code in a single directory for all transformations sharing a common module name.




The Designer generates the following files:
♦ m_.c. Defines the module. This file includes an initialization function,
m__moduleInit() that allows you to write code you want the
PowerCenter Server to run when it loads the module. Similarly, this file includes a
deinitialization function, m__moduleDeinit(), that allows you to write
code you want the PowerCenter Server to run before it unloads the module.
♦ p_.c. Defines the procedure in the module. This file contains the code
that implements the procedure logic, such as data cleansing or merging data.
♦ makefile.aix, makefile.aix64,makefile.hp, makefile.hp64, makefile.linux, makefile.sol.
Make files for the UNIX platforms. Use makefile.aix64 for 64-bit AIX platforms and
makefile.hp64 for 64-bit HP-UX (Itanium) platforms.






Rules and Guidelines
Use the following rules and guidelines when you create a Custom transformation:
Custom transformations are connected transformations. You cannot reference a Custom
transformation in an expression
.
You can include multiple procedures in one module. For example, you can include an
XML writer procedure and an XML parser procedure in the same module
.
You can bind one shared library or DLL to multiple Custom transformation instances if you write the procedure code to handle multiple Custom transformation instances.
When you write the procedure code, you must make sure it does not violate basic mapping rules. For more information about mappings and mapping validation, see “Mappings” in the Transformation Guide.
The Custom transformation sends and receives high precision decimals as high precision
decimals.
You can use multi-threaded code in Custom transformation procedures.



---------------------------------------------------------------------------------------------------------




Working with Groups and Ports




A Custom transformation has both input and output groups. It also can have input ports,
output ports, and input/output ports. You create and edit groups and ports on the Ports tab of
the Custom transformation. You can also define the relationship between input and output
ports on the Ports tab.




Figure 2-1 shows the Custom transformation Ports tab:




Figure 2-1. Custom Transformation Ports Tab






Creating Groups and Ports
You can create multiple input groups and multiple output groups in a Custom transformation. You must create at least one input group and one output group. To create an input group, click the Create Input Group icon. To create an output group, click the Create Output Group icon. When you create a group, the Designer adds it as the last group. When you create a passive Custom transformation, you can only create one input group and one output group.
To create a port, click the Add button. When you create a port, the Designer adds it below the currently selected row or group. Each port contains attributes defined on the Port Attribute Definitions tab. You can edit the attributes for each port. For more information about creating and editing user-defined port attributes, see “Working with Port Attributes” on page 25.


Editing Groups and Ports
Use the following rules and guidelines when you edit ports and groups in a Custom transformation:
♦ You can change group names by typing in the group header.
♦ You can only enter ASCII characters for port and group names.
♦ Once you create a group, you cannot change the group type. If you need to change the
group type, delete the group and add a new group.
♦ When you delete a group, the Designer deletes all ports of the same type in that group.
However, all input/output ports remain in the transformation, belong to the group above them, and change to input ports or output ports, depending on the type of group you delete. For example, an output group contains output ports and input/output ports. You delete the output group. The Designer deletes the output ports. It changes the input/output ports to input ports. Those input ports belong to the input group with the header directly above them.
♦ To move a group up or down, select the group header and click the Move Port Up or Move Port Down button. The ports above and below the group header remain the same, but the groups to which they belong might change.
Defining Port Relationships
By default, an output port in a Custom transformation depends on all input ports. However, you can define the relationship between input and output ports in a Custom transformation.
When you do this, you can view link paths in a mapping containing a Custom transformation and you can see which input ports an output port depends on. You can also view source column dependencies for target ports in a mapping containing a Custom transformation.
To define the relationship between ports in a Custom transformation, create a portdependency. A port dependency is the relationship between an output or input/output port and one or more input or input/output ports. When you create a port dependency, base it on the procedure logic in the code.
To create a port dependency, click Custom Transformation on the Ports tab and choose Port
Dependencies.

No comments: