Datastage

Saturday, October 19, 2024

tMap vs tJoin -Talend

tMap is frequently used component for joins and lookup purpose, it is also use for verity of operations and transformations, whereas tJoin is used for join and lookups only.

tMap	tJoin
It accepts more than one input one is main and rests of the lookups.	It accepts only two inputs and only one is main and other one is lookup.
We can create more than one output	It has two default outputs one is “Main” and another one is ” Inner join reject”
tMap has “inner join ” and ” left outer join” joining model	tJoin offer`s only “inner join”
tMap offers three match model Unique Match First Match All Matches	tJoin defaulted with Unique match
tMap allows to store data on file option for lookup data processing	tJoin doesn`t offer this feature
In tMap you can filter data using filter expression	tJoin doesn`t offer this feature
You can write transformation using expression builder at each column level	tJoin doesn`t offer this feature

Thursday, September 5, 2024

Azure Data Factory Azure Data Factory - list of activities

List of Activities in ADF

Azure Data Factory Azure Data Factory is a cloud-based data integration service that allows you to create, schedule, and manage data pipelines to move and transform data from various sources to various destinations. Here are some of the different types of activities in Azure Data Factory with examples:

1. Copy Activity:- The Copy activity is used to copy data from one data store to another. For example, you can use the Copy activity to copy data from an on-premises SQL Server database to an Azure SQL database.

2. Execute Pipeline Activity:- The Execute Pipeline activity is used to call another pipeline from within the current pipeline. For example, you can use this activity to execute a pipeline that contains a data transformation activity after the data has been copied.

3. Web Activity:- The Web activity is used to call a REST API endpoint or a web service. For example, you can use this activity to call an API to retrieve data from an external system.

4. Stored Procedure Activity:- The Stored Procedure activity is used to call a stored procedure in a SQL Server database. For example, you can use this activity to execute a stored procedure that performs a data transformation.

5. If Condition Activity:- The If Condition activity is used to create a conditional workflow in your pipeline. For example, you can use this activity to check if a file exists in a data store and only continue with the pipeline if the file is found.

6. For Each Activity:- The For Each activity is used to iterate over a collection of items and perform an action on each item. For example, you can use this activity to loop through a list of files and copy each file to a destination.

7. Lookup Activity:- The Lookup activity is used to retrieve metadata or a single value from a data store. For example, you can use this activity to get the schema of a table in a SQL Server database.

8. Set Variable Activity:- The Set Variable activity is used to set the value of a variable in a pipeline. For example, you can use this activity to set a variable that holds the current date or time.

9. Wait Activity: The Wait activity is used to pause the execution of a pipeline for a specified period of time. For example, you can use this activity to wait for a specific time to start a data transfer operation.

10.Filter Activity:- The Filter activity is used to filter data based on a specified condition. For example, you can use this activity to filter data based on a specific column value before transferring the data to a destination.

11.Join Activity:- The Join activity is used to join data from two or more sources. For example, you can use this activity to join data from two tables in a SQL Server database.

12.Union Activity:- The Union activity is used to combine data from two or more sources. For example, you can use this activity to combine data from two tables in a SQL Server database into a single destination.

13.Lookup Activity:- The Lookup activity is used to retrieve metadata or a single value from a data store. For example, you can use this activity to get the schema of a table in a SQL Server database.

14.Set Variable Activity:- The Set Variable activity is used to set the value of a variable in a pipeline. For example, you can use this activity to set a variable that holds the current date or time.

15.If Condition Activity:- The If Condition activity is used to create a conditional workflow in your pipeline. For example, you can use this activity to check if a file exists in a data store and only continue with the pipeline if the file is found.

16.Until Activity:- The Until activity is used to execute a loop until a specific condition is met. For example, you can use this activity to keep copying data until a specific file is found in a data store.

17.Mapping Data Flow Activity: The Mapping Data Flow activity is used to visually design and build data transformation logic using a drag-and-drop interface. For example, you can use this activity to transform data from one format to another, or to combine data from multiple sources.

18.Databricks Notebook Activity: The Databricks Notebook activity is used to run a Databricks notebook in a Databricks workspace. For example, you can use this activity to run a Python or Scala script to transform data.

19.HDInsight Hive Activity: The HDInsight Hive activity is used to execute Hive queries on an HDInsight cluster. For example, you can use this activity to transform data using HiveQL.

20.HDInsight Pig Activity: The HDInsight Pig activity is used to execute Pig scripts on an HDInsight cluster. For example, you can use this activity to transform data using Pig Latin.

21.HDInsight MapReduce Activity: The HDInsight MapReduce activity is used to execute MapReduce jobs on an HDInsight cluster. For example, you can use this activity to perform complex data transformations on large datasets.

22.Custom Activity: The Custom activity is used to run custom code in a data pipeline. For example, you can use this activity to run a PowerShell script to perform a specific task.

23.Execute SSIS Package Activity: The Execute SSIS Package activity is used to execute an SSIS package stored in an Azure Storage account or a SQL Server Integration Services (SSIS) catalog. For example, you can use this activity to perform data transformations using an existing SSIS package.

24.Delete Activity: The Delete activity is used to delete data from a data store. For example, you can use this activity to delete files from an Azure Blob Storage container.

25.Teradata Query Activity: The Teradata Query activity is used to execute queries on a Teradata database. For example, you can use this activity to extract data from a Teradata database.

26.Amazon S3 Storage Activity: The Amazon S3 Storage activity is used to copy data between an Amazon S3 storage account and an Azure Data Factory-supported data store. For example, you can use this activity to transfer data between an Amazon S3 storage account and an Azure Blob Storage account.

27.Azure Function Activity: The Azure Function activity is used to execute an Azure Function in a pipeline. For example, you can use this activity to perform custom data transformations using an Azure Function.

28.Wait Event Activity: The Wait Event activity is used to pause the execution of a pipeline until a specific event occurs. For example, you can use this activity to wait for a signal from an external system before proceeding with the pipeline.

29.Amazon Redshift Query Activity: The Amazon Redshift Query activity is used to execute queries on an Amazon Redshift database. For example, you can use this activity to extract data from an Amazon Redshift database.

30.Web Activity: The Web activity is used to call a REST API or a web endpoint from a pipeline. For example, you can use this activity to call an API to retrieve data or to perform an action.

31.Azure Analysis Services Activity: The Azure Analysis Services activity is used to execute a command or a query against an Azure Analysis Services database. For example, you can use this activity to refresh a cube in an Azure Analysis Services database.

32.SharePoint Online List Activity: The SharePoint Online List activity is used to copy data between a SharePoint Online list and an Azure Data Factory- supported data store. For example, you can use this activity to transfer data between a SharePoint Online list and an Azure SQL Database.

33.Stored Procedure Activity: The Stored Procedure activity is used to execute a stored procedure in a database. For example, you can use this activity to perform a custom data transformation using a stored procedure.

34.Lookup with a Stored Procedure Activity: The Lookup with a Stored Procedure activity is used to retrieve data from a database using a stored procedure. For example, you can use this activity to retrieve data from a SQL Server database using a stored procedure.

35.Copy Activity: The Copy activity is used to copy data between different data stores. For example, you can use this activity to copy data from an on-premises SQL Server database to an Azure Blob Storage container.

36.IF Condition Activity: The IF Condition activity is used to evaluate a Boolean expression and perform different actions based on the result. For example, you can use this activity to perform different data transformations based on a condition.

37.For Each Activity: The For Each activity is used to loop through a set of items and perform an action for each item. For example, you can use this activity to process a set of files stored in an Azure Blob Storage container.

38.Until Activity: The Until activity is used to repeatedly perform an action until a certain condition is met. For example, you can use this activity to keep polling a system until a certain status is returned.

39.Filter Activity: The Filter activity is used to filter data based on a condition. For example, you can use this activity to filter out data that does not meet certain criteria.

40.Set Variable Activity: The Set Variable activity is used to set the value of a variable that can be used in later activities. For example, you can use this activity to set a variable to the current date and time.

41.Azure Databricks Notebook Activity: The Azure Databricks Notebook activity is used to execute a Databricks notebook in a pipeline. For example, you can use this activity to perform advanced data processing and analytics using Databricks.

42.Lookup Activity: The Lookup activity is used to retrieve data from a data store. For example, you can use this activity to retrieve metadata from a file stored in Azure Blob Storage.

43.Wait Activity: The Wait activity is used to pause the execution of a pipeline for a specified amount of time. For example, you can use this activity to introduce a delay between two activities in a pipeline.

44.If Condition Branch Activity: The If Condition Branch activity is used to define the action that should be taken based on the result of an If Condition activity. For example, you can use this activity to perform different data transformations based on the result of the if Condition activity.

45.Get Metadata Activity: The Get Metadata activity is used to retrieve metadata about a file or folder stored in a data store. For example, you can use this activity to retrieve the size, type, and last modified date of a file stored in Azure Blob Storage.

46.Union Activity: The Union activity is used to combine the results of two or more data sources. For example, you can use this activity to combine the results of two different SQL queries into a single data set

Wednesday, July 31, 2024

Python in 30 min

Python is a versatile and powerful programming language that's widely used in various fields, including web development, data analysis, artificial intelligence, and more. Here’s a beginner-friendly tutorial to get you started with Python:

1. Introduction to Python

Python is known for its readability and simplicity. It uses indentation to define code blocks, which makes it visually clear and easy to follow.

2. Setting Up Python

Download and Install Python:
- Go to the official Python website and download the latest version.
- Follow the installation instructions for your operating system (Windows, macOS, or Linux).
Install an IDE or Text Editor:
- You can write Python code in various IDEs and text editors. Some popular choices are:
  - IDLE: Comes bundled with Python.
  - PyCharm: A powerful IDE for Python.
  - VS Code: A lightweight but powerful editor with Python support.

3. Basic Syntax and Concepts

Hello World

Let's start with a simple program that prints "Hello, World!" to the console.


print("Hello, World!")

Variables and Data Types

Variables are used to store data. Python supports various data types including integers, floats, strings, and booleans.


x = 5           # Integer
y = 3.14        # Float
name = "Alice"  # String
is_student = True  # Boolean

Basic Operations


# Arithmetic operations
a = 10
b = 5
print(a + b)  # Addition
print(a - b)  # Subtraction
print(a * b)  # Multiplication
print(a / b)  # Division

# String concatenation
first_name = "John"
last_name = "Doe"
full_name = first_name + " " + last_name
print(full_name)

Control Structures

Conditional Statements


age = 18

if age >= 18:
    print("You are an adult.")
else:
    print("You are a minor.")

Loops

For Loop


for i in range(5):
    print(i)

While Loop


count = 0
while count < 5:
    print(count)
    count += 1

4. Functions

Functions are reusable blocks of code that perform a specific task.


def greet(name):
    return f"Hello, {name}!"

print(greet("Alice"))

5. Lists and Dictionaries

Lists are ordered collections of items.


fruits = ["apple", "banana", "cherry"]
print(fruits[0])  # Access first item
fruits.append("date")  # Add item

Dictionaries are collections of key-value pairs.


person = {"name": "John", "age": 30}
print(person["name"])  # Access value by key
person["age"] = 31  # Update value

6. File Handling

You can read from and write to files using Python.

Reading a file


with open('example.txt', 'r') as file:
    content = file.read()
    print(content)

Writing to a file


with open('example.txt', 'w') as file:
    file.write("Hello, World!")

7. Modules and Packages

Modules and packages allow you to organize your code into separate files and directories.

Importing a module


import math
print(math.sqrt(16))

Creating your own module Save the following in a file named my_module.py:


def say_hello(name):
    return f"Hello, {name}!"

You can use it in another file:


import my_module
print(my_module.say_hello("Alice"))

8. Error Handling

Python uses try and except blocks to handle errors gracefully.


try:
    result = 10 / 0
except ZeroDivisionError:
    print("You can't divide by zero!")
finally:
    print("This block always executes.")

9. Object-Oriented Programming

Python supports object-oriented programming. You can create classes and objects.




class Dog:
    def __init__(self, name):
        self.name = name

    def bark(self):
        return f"{self.name} says Woof!"

my_dog = Dog("Buddy")
print(my_dog.bark())

10. Next Steps

Explore Libraries: Python has a rich ecosystem of libraries and frameworks. Explore libraries like NumPy for numerical computing, pandas for data analysis, and Flask/Django for web development.
Practice: Work on small projects or problems to reinforce your learning.

Feel free to ask if you have questions about any of these topics or need more detailed explanations!

Tuesday, July 23, 2024

Databricks - File to Table Data Loading

# Step-by-Step Script for File to Table Data Loading in Databricks:

from pyspark.sql import SparkSession

# Initialize Spark session

spark = SparkSession.builder \

.appName("File to Table Data Loading") \

.getOrCreate()

# Load data from CSV files into DataFrame

df = spark.read.format("csv") \

.option("header", "true") \

.load("dbfs:/mnt/data/csv_files/")

# Perform data transformations if needed

df = df.withColumn("amount", df["amount"].cast("double"))

# Save DataFrame to a Delta Lake table

df.write.format("delta") \

.mode("overwrite") \ # or "append" for incremental loading

.saveAsTable("my_database.my_table")

# Optionally, stop Spark session

spark.stop()

Wednesday, May 15, 2024

Talend Cloud Data Connectors

List of supported Talend Cloud Data Connectors

List of the environments and systems to which you can connect.

Unless stated otherwise the latest versions are supported.

Supported connectors and their categories
Supported system	Connection type	Unidirectional / Bidirectional
Amazon Aurora	Databases	Bidirectional
Amazon DynamoDB	Databases	Bidirectional
Amazon Redshift	Databases	Bidirectional
Apache Kudu	Databases	Bidirectional
Azure Cosmos DB	Databases	Bidirectional
Azure Synapse	Databases	Bidirectional
Couchbase	Databases	Bidirectional
Delta Lake	Databases	Bidirectional
Derby	Databases	Bidirectional
Google BigQuery	Databases	Bidirectional
Google Bigtable	Databases	Bidirectional
MariaDB	Databases	Bidirectional
Microsoft SQL Server	Databases	Bidirectional
Microsoft SQL Server - JTDS driver (Deprecated)	Databases	Bidirectional
MongoDB	Databases	Bidirectional
MySQL	Databases	Bidirectional
Oracle	Databases	Bidirectional
PostgreSQL	Databases	Bidirectional
SingleStore	Databases	Bidirectional
Snowflake, including pushdown capabilities	Databases	Bidirectional
Amazon S3	Cloud file systems	Bidirectional
Azure Blob Storage	Cloud file systems	Bidirectional
Azure Data Lake Storage Gen2	Cloud file systems	Bidirectional
Box	Cloud file systems	Bidirectional
Google Cloud Storage	Cloud file systems	Bidirectional
Dynamics 365	Business applications	Bidirectional
Marketo	Business applications	Bidirectional
Google Analytics	Business applications	Input only
Google Analytics 4	Business applications	Input only
NetSuite	Business applications	Bidirectional
Salesforce	Business applications	Bidirectional
Workday	Business applications	Bidirectional
Zendesk	Business applications	Bidirectional
HTTP Client	Web services	Bidirectional
REST (deprecated)	Web services	Bidirectional
FTP	File systems	Bidirectional
HDFS	File systems	Bidirectional
Amazon Kinesis	Messaging	Input only
Apache Pulsar	Messaging	Bidirectional
Azure Event Hubs	Messaging	Bidirectional
Google PubSub	Messaging	Bidirectional
Kafka	Messaging	Bidirectional
RabbitMQ	Messaging	Bidirectional
ElasticSearch v2.4.4 to 6.3.2	Search and index	Bidirectional
Local connection: This built-in connection allows you to store your local file as a dataset.	Local connection (for local data)	Bidirectional
Data generator: This connection allows you to generate random realistic data according to the conditions you define.	Test connection (for test data)	Input only
Test connection: This built-in connection allows you to enter manually your test data as a dataset.	Test connection (for test data)	Bidirectional
Talend Cloud Data Stewardship campaigns can be retrieved and used as pipeline sources and destinations, allowing you to both read from them and write into them.	Talend Cloud platform	Bidirectional

Datastage

Saturday, October 19, 2024

tMap vs tJoin -Talend

Thursday, September 5, 2024

Azure Data Factory Azure Data Factory - list of activities

Wednesday, July 31, 2024

Python in 30 min

1. Introduction to Python

2. Setting Up Python

3. Basic Syntax and Concepts

Hello World

Variables and Data Types

Basic Operations

Control Structures

4. Functions

5. Lists and Dictionaries

6. File Handling

7. Modules and Packages

8. Error Handling

9. Object-Oriented Programming

10. Next Steps

Tuesday, July 23, 2024

Databricks - File to Table Data Loading

# Step-by-Step Script for File to Table Data Loading in Databricks:

Wednesday, May 15, 2024

Talend Cloud Data Connectors

tMap vs tJoin -Talend

Pages

Pages