Sunday, October 25, 2015

Datastage Scenarios and solutions

Datastage Scenarios and solutions


Field mapping using Transformer stage:

Requirement:
field will be right justified zero filled, Take last 18 characters

Solution:
Right("0000000000":Trim(Lnk_Xfm_Trans.link),18)



Scenario 1:

We have two  datasets with 4 cols each with different names. We should create a  dataset with 4 cols 3 from one dataset and one col with the record count of one dataset.

We can use aggregator with a dummy column and get the count from one dataset and do a look up from other dataset and map it to the 3 rd dataset
Something similar to the below design:
Scenario 2:
Following is the existing job design. But requirement got changed to: Head and trailer datasets should populate even if detail records is not present in the source file. Below job don't do that job.

Hence changed the above job to this following requirement:
Used row generator with a copy stage. Given default value( zero) for col( count) coming in from row generator. If no detail records it will pick the record count from row generator.


We have a source which is a sequential file with header and footer. How to remove the header and footer while reading this file using sequential file stage of Datastage?
Sol:Type command in putty: sed '1d;$d' file_name>new_file_name (type this in job before job subroutine then use new file in seq stage)

IF I HAVE SOURCE LIKE COL1 A A B AND TARGET LIKE COL1 COL2 A 1 A 2 B1. HOW TO ACHIEVE THIS OUTPUT USING STAGE VARIABLE IN TRANSFORMER STAGE?

If keyChange =1 Then 1 Else stagevaraible+1



Suppose that 4 job control by the sequencer like (job 1, job 2, job 3, job 4 )if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in target table remaining are not loaded and your job going to be aborted then.. How can short out the problem.Suppose job sequencer synchronies or control 4 job but job 1 have problem, in this condition should go director and check it what type of problem showing either data type problem, warning massage, job fail or job aborted, If job fail means data type problem or missing column action .So u should go Run window ->Click-> Tracing->Performance or In your target table ->general -> action-> select this option here two option
(i) On Fail -- commit , Continue
(ii) On Skip -- Commit, Continue.
First u check how many data already load after then select on skip option then continue and what remaining position data not loaded then select On Fail , Continue ...... Again Run the job defiantly u get successful massage
Question: I want to process 3 files in sequentially one by one how can i do that. while processing the files it should fetch files automatically .
Ans:If the metadata for all the files r same then create a job having file name as parameter then use same job in routine and call the job with different file name...or u can create sequencer to use the job..

Parameterize the file name.
Build the job using that parameter
Build job sequencer which will call this job and will accept the parameter for file name.
Write a UNIX shell script which will call the job sequencer three times by passing different file each time.
RE: What Happens if RCP is disable ?

In such case Osh has to perform Import and export every time when the job runs and the processing time job is also increased...
Runtime column propagation (RCP): If RCP is enabled for any job and specifically for those stages whose output connects to the shared container input then meta data will be propagated at run time so there is no need to map it at design time.
If RCP is disabled for the job in such case OSH has to perform Import and export every time when the job runs and the processing time job is also increased.
Then you have to manually enter all the column description in each stage.RCP- Runtime column propagation

Question:
Source:                                                   Target

Eno        Ename                                        Eno           Ename
1                     a,b                                                  1              a
2                     c,d                                                  2              b
3                     e,f                                                  3              c



source has 2 fields like

COMPANY             LOCATION
IBM                  HYD
TCS                  BAN
IBM                  CHE
HCL                  HYD
TCS                  CHE
IBM                  BAN
HCL                  BAN
HCL                  CHE

LIKE THIS.......

THEN THE OUTPUT LOOKS LIKE THIS....

Company loc count

TCS  HYD  3
         BAN
         CHE
IBM  HYD  3
         BAN
         CHE
HCL  HYD  3
         BAN
         CHE
2)input is like this:
no,char
1,a
2,b
3,a
4,b
5,a
6,a
7,b
8,a



But the output is in this form  with row numbering of Duplicate occurence


output:

no,char,Count
"1","a","1"
"6","a","2"
"5","a","3"
"8","a","4"
"3","a","5"
"2","b","1"
"7","b","2"
"4","b","3"
3)Input is like this:
file1
10
20
10
10
20
30

Output is like:
file2               file3(duplicates)
10                  10
20                  10
30                  20

4)Input is like:
file1
10
20
10
10
20
30

Output is like Multiple occurrences in one file and single occurrences in one file:
file2      file3
10         30
10
10
20
20
5)Input is like this:
file1
10
20
10
10
20
30

Output is like:
file2               file3
10                  30
20
6)Input is like this:
file1
1
2
3
4
5
6
7
8
9
10

Output is like:
file2(odd)         file3(even)
1                      2
3                      4
5                      6
7                      8
9                      10
7)How to calculate Sum(sal), Avg(sal), Min(sal), Max(sal) with out using Aggregator stage..

8)How to find out First sal, Last sal in each dept with out using aggregator stage

9)How many ways are there to perform remove duplicates function with out using Remove duplicate stage..

Scenario: 

source has 2 fields like

COMPANY             LOCATION
IBM                  HYD
TCS                  BAN
IBM                  CHE
HCL                  HYD
TCS                  CHE
IBM                  BAN
HCL                  BAN
HCL                  CHE

LIKE THIS.......

THEN THE OUTPUT LOOKS LIKE THIS....

Company loc count

TCS  HYD  3
         BAN
         CHE
IBM  HYD  3
         BAN
         CHE
HCL  HYD  3
         BAN
         CHE

Solution:

Seqfile......>Sort......>Trans......>RemoveDuplicates..........Dataset 

Sort                                             Trans: 
Key=Company                                create stage variable as Company1 
Sort order=Asc                  Company1=If(in.keychange=1) then in.Location Else Company1:',':in.Location 
create keychange=True                          Drag and Drop in derivation 
                                                            Company     ....................Company 
                                                           Company1........................Location 
RemoveDup: 
Key=Company 
Duplicates To Retain=Last 


11)The input is 
Shirt|red|blue|green 
Pant|pink|red|blue 


Output should be, 

Shirt:red 
Shirt:blue 
Shirt:green 
pant:pink 
pant:red 
pant:blue 

Solution:
it is reverse to pivote stage 
use 
     seq------sort------tr----rd-----tr----tg 
in the sort stage use create key change column is true 
in trans create stage variable=if colu=1 then key c.value else key v::colum 
rd stage use duplicates retain last 
tran stage use field function superate columns 

similar Scenario: :
source 
col1 col3 
1 samsung 
1 nokia 
1 ercisson 
2 iphone 
2 motrolla 
3 lava 
3 blackberry 
3 reliance 

Expected Output 
col 1 col2       col3   col4 
1     samsung nokia ercission 
2      iphone   motrolla 
3       lava     blackberry reliance

You can get it  by using    Sort stage --- Transformer stage --- RemoveDuplicates --- Transformer --tgt 


Ok 

First  Read and Load the data  into your  source file( For Example  Sequential File ) 

And  in Sort stage    select key change column = True  ( To Generate  Group ids) 

Go to  Transformer stage   

Create one stage variable. 

You can do this by  right click  in stage variable  go to  properties  and name it as  your wish ( For example temp) 

and  in expression  write as below 

if keychange column =1   then column name   else  temp:',':column name 

This column name is  the one  you want  in the required  column with delimited  commas. 

On remove duplicates stage key is col1 and set option duplicates retain to--> Last. 
in transformer drop col3 and define 3 columns like col2,col3,col4 
in col1 derivation give Field(InputColumn,",",1) and 
in col1 derivation give Field(InputColumn,",",2) and 
in col1 derivation give Field(InputColumn,",",3) 

Scenario: 
12)Consider the following employees data as source? 
employee_id, salary 
------------------- 
10,          1000 
20,          2000 
30,          3000 
40,          5000 

 Create a job to find the sum of salaries of all employees and this sum should repeat for all the rows. 

The output should look like as 

employee_id, salary, salary_sum 
------------------------------- 
10,          1000,   11000 
20,          2000,   11000 
30,          3000,   11000 
40,          5000,   11000

Scenario: 

I have two source tables/files numbered 1 and 2. 
In the the target, there are three output tables/files, numbered 3,4 and 5. 

The scenario is that, 

to the out put 4 -> the records which are common to both 1 and 2 should go. 

to the output 3 -> the records which are only in 1 but not in 2 should go 

to the output 5 -> the records which are only in 2 but not in 1 should go.
sltn:src1----->copy1------>----------------------------------->output_1(only left table) 
                                     Join(inner type)----> ouput_1 
src2----->copy2------>----------------------------------->output_3(only right table)


Consider the following employees data as source? 
employee_id, salary 
------------------- 
10,          1000 
20,          2000 
30,          3000 
40,          5000 

Scenario: 

 Create a job to find the sum of salaries of all employees and this sum should repeat for all the rows. 

The output should look like as 

employee_id, salary, salary_sum 
------------------------------- 
10,          1000,   11000 
20,          2000,   11000 
30,          3000,   11000 
40,          5000,   11000 
sltn:

Take Source --->Transformer(Add new Column on both the output links and assign a value  as 1 )------------------------>                           1) Aggregator  (Do group by  using that new column) 
                                              2)lookup/join( join on that  new column)-------->tgt. 

Scenario: 
sno,sname,mark1,mark2,mark3 
1,rajesh,70,68,79 
2,mamatha,39,45,78 
3,anjali,67,39,78 
4,pavani,89,56,45 
5,indu,56,67,78 

out put is 
sno,snmae,mark1,mark2,mark3,delimetercount 
1,rajesh,70,68,79,4 
2,mamatha,39,45,78,4 
3,anjali,67,39,78,4 
4,pavani,89,56,45,4 
5,indu,56,67,78,4 

seq--->trans--->seq 

create one stage variable as delimiter.. 
and put derivation on stage as DSLink4.sno : "," : DSLink4.sname : "," : DSLink4.mark1 :  "," :DSLink4.mark2 :  "," : DSLink4.mark3 


and do mapping and create one more column count as integer type. 

and put derivation on count column as  Count(delimter, ",") 


scenario: 
       sname         total_vowels_count 
       Allen                          2 
       Scott                          1 
       Ward                         1 
Under Transformer Stage Description: 

total_Vowels_Count=Count(DSLink3.last_name,"a")+Count(DSLink3.last_name,"e")+Count(DSLink3.last_name,"i")+Count(DSLink3.last_name,"o")+Count(DSLink3.last_name,"u").

Scenario: 

1)On daily we r getting some huge files data so all files metadata is same we have to load in to target table how we can load? 
Use File Pattern in sequential file 

2) One column having 10 records at run time we have to send 5th and 6th record to target at run time how we can send? 
Can get through,by using UNIX command in sequential file  filter option

How can we get 18 months date data in transformer stage? 
Use transformer  stage after input seq file  and  try  this  one  as constraint in transformer stage : 

DaysSinceFromDate(CurrentDate(), DSLink3.date_18)<=548 OR DaysSinceFromDate(CurrentDate(), DSLink3.date_18)<=546 

where  date_18  column is the column having that  date  which  needs to be less or equal to 18 months and 548 is  no. of days  for 18 months  and  for leap year it is 546(these  numbers  you need to check). 

What is differences between Force Compile and Compile ?


Diff b/w Compile and Validate?
Compile option only checks for all mandatory requirements like link requirements, stage options and all. But it will not check if the database connections are valid. 
Validate is equivalent to Running a job except for extraction/loading of data. That is, validate option will test database connectivity by making connections to databases. 



How to FInd Out Duplicate Values Using Transformer?


You can capture the duplicate records based on keys using Transformer stage variables.

1. Sort and partition the input data of the transformer on the key(s) which defines the duplicate.
2. Define two stage variables, let's say StgVarPrevKeyCol(data type same as KeyCol) and StgVarCntr as Integer with default value 0
where KeyCol is your input column which defines the duplicate.

Expression for StgVarCntr(1st stg var-- maintain order):

If DSLinknn.KeyCol = StgVarPrevKeyCol Then StgVarCntr + 1 Else 1

Expression for StgVarPrevKeyCol(2nd stg var):

DSLinknn.KeyCol

3. Now in constrain, if you filter rows where StgVarCntr = 1 will give you the unique records and if you filter StgVarCntr > 1 will give you duplicate records.

My source is Like
Sr_no, Name
10,a
10,b
20,c
30,d
30,e
40,f

My target Should Like:

Target 1:(Only unique means which records r only once)
20,c
40,f

Target 2:(Records which r having more than 1 time)
10,a
10,b
30,d
30,e

How to do this in DataStage....

**************

use aggregator and transformer stages

source-->aggregator-->transformat-->target

perform count in aggregator, and take two op links in trasformer, filter data count>1 for one llink and put count=1 for second link.
Scenario: 
in my i/p source i have N no.of records

In output i have 3 targets

i want o/p like 1st rec goes to 1st targt and

2nd rec goes to 2nd target and

3rd rec goes to 3rd target again

4th rec goes to 1st taget ............ like this

do this ""without using partition techniques "" remember it.

*****************
source--->trans---->target
in trans use conditions on constraints
mod(empno,3)=1
mod(empno,3)=2
mod(empno,3)=0
Scenario:
im having i/p as
col A
a_b_c
x_F_I
DE_GH_IF

we hav to mak it as

col1 col 2 col3
a b c
x f i
de gh if


*********************

Transformer
create 3 columns with derivation
col1 Field(colA,'_',1)
col2 Field(colA,'_',2)
col3 Field(colA,'_',3)

**************
Field function divides the column based on the delimeter,
if the data in the col is like A,B,C
then
Field(col,',',1) gives A
Field(col,',',2) gives B
Field(col,',',3) gives C

Scenario:

Scenario:
Scenario:

No comments:

Post a Comment

tMap vs tJoin -Talend

  tMap is frequently used component for joins and lookup purpose, it is also use for verity of operations and transformations, whereas tJoin...