Datastage: 5 Tips for better DataStage Design

Friday, October 30, 2015

5 Tips for better DataStage Design

01. Always try to complete the requirement first, Don't spoil your time to develop an optimized job which is practically useless until requirement is completed.

02. To re-arrange an existing job design, or insert new stage types into an existing job flow, first disconnect the links from the stage to be changed, then the links will retain any meta-data associated with them.

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2iYZ7Pd_XtOhSaOpzef43FY6FF8xNua-hiaarGFi8jrAynO0TAjb4M5-ZXy_rq0pyq0CKZHcf-88oFiM7Otwda4xhN456mienXkRQzkP596Kx5CS6UEkCjWm5wEcPoHosEeXXG6jKnwk/s320/wordle+12.png

03. A Lookup stage can only have one input stream, one output stream, and, optionally, one reject stream. Depending on the type of lookup, it can have several reference links. To change the use of particular Lookup links in an existing job flow, disconnect the links from the Lookup stage and then right-click to change the link type, for example, Stream to Reference.

04. The Copy stage is a good place-holder between stages if you anticipate that new stages or logic will be needed in the future without damaging existing properties and derivations. When inserting a new stage, simply drag the input and output links from the Copy place-holder to the new stage. Unless the Force property is set in the Copy stage, InfoSphere DataStage optimizes the actual copy out at runtime.

05. DataStage takes longer to start-up a job for reasons such as it validates the entire environment, nodes and database connectivity before starting the ETL job. By doing so you have overheads upfront by it ensures robust and reliable data loading and integration. Parallel jobs are not recommended for small volume of data and serial fashion of ETL design, as there is an overhead is starting PX processes.

Datastage

Friday, October 30, 2015

5 Tips for better DataStage Design

No comments:

Post a Comment

tMap vs tJoin -Talend

Pages

Pages