What are different types of schemas in Talend
Repository schemas
The benefits of using Repository schemas are:
1. They can be re-used across multiple jobs, thus reducing the amount of re-keying.
2. Talend will ensure that changes made to a Repository schema are cascaded to all jobs
that use the schema, thus avoiding the need to scan jobs manually for Built-In schemas
that need to be changed.
3. Impact analysis reports can be generated showing where a Repository schema is
being used within a project. This enables the impact of changes to be more
Generic schemas
Generic schemas aren’t tied to a particular source, so they can be used as a shared
resource across multiple types of data source or they can be used to define data sources
that are generated, such as the output from custom SQL queries.
Shared schemas
Schemas captured from a particular type of data source are stored in the metadata
repository in a folder for that data type (for example, CSV file schemas are stored in the
directory for delimited files).
There are however instances where schemas will be shared across multiple types. For
example, a CSV file and Excel file could be used to directly load a database table.
If you import the metadata from one of the sources, it will be stored in the folder for that
source, which could make it hard to find.
By storing the schema as a Generic schema, it is more obvious that the schema isn’t used
just for a single source.
Generated data sources
It is often necessary to perform a query against a database and return the result set to the
Talend job. It is often the case that the same query is used multiple times in many jobs.
By storing the schema for the result set in a generic schema, it removes the tedious
process of having to create the same schema over and over again manually every time the
query is used.
Fixed schemas and columns
Some components, such as tLogCatcher, have predefined schemas that are read-only.
These can be easily recognized due to the fact that the whole schema is gray.
You may also find that certain flows, for instance the reject flows, have fixed columns that
have been added to the original schema. This is because Talend will add the errorCode and
errorMessage fields to the schema to store the error information. These additional fields will
be green to distinguish them as Talend fields.
No comments:
Post a Comment