Skip to main content

Modeling functions

Functions allow you to create complex calculations, filters or aggregations.

Aggregate functions

Average

Outputs the average of the values in the Input Column, grouped by the categories in the Partition Columns provided.

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing input values. The average will be computed for the values in the Input Column, grouped by the categories in the Partition Columns provided.

Order By - (ORDER BY clause)

Order by column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Count

Counts the number of non-NULL values in the Input Column, grouped by the Partition Column(s) provided.

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Counts the number of items in the Input Column, grouped by the categories in the Grouping Columns provided.

Group By - (PARTITION BY clause)

Grouping column

Pipeline Column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Maximum

Outputs the maximum value in the Input Column, grouped by the categories in the Partition Columns provided (optional).

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select a column to be evaluated using the Maximum function.

Order By - (ORDER BY clause)

Order by column

Pipeline column (Select from the drop-down list.

Optional

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Minimum

Outputs the minimum value in the Input Column, grouped by the categories in the Partition Columns. provided (optional).

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select a column to be evaluated using the Minimum function.

Order By - (ORDER BY clause)

Order by column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Standard deviation

Outputs the standard deviation of values in the Input Column, treated as a sample (n = 'number of items' - 1) and grouped by the categories in the Partition Columns provided.

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select a column to be evaluated.

Order By - (ORDER BY clause)

Order by column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Standard deviation (Population)

Outputs the standard deviation of values in the Input Column, treated as a population (n = 'number of items') and grouped by the categories in the Partition Columns provided.

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select a column to be evaluated.

Order By - (ORDER BY clause)

Order by column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list).

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Sum

Outputs the sum of values in the Input Column, grouped by the categories in the Partition Columns provided.

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select the column to be evaluated.

Order By - (ORDER BY clause)

Order by column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Variance

Outputs the variance of values in the Input Column, treated as a sample (n = 'number of items' - 1) and grouped by the categories in the Partition Columns provided.

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select the column to be evaluated.

Order By - (ORDER BY clause)

Order by column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY BY clause)

Grouping column

Pipeline column (Select from the drop-down list)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Variance (Population)

Outputs the variance of values in the Input Column, treated as a population (n = 'number of items') and grouped by the categories in the Partition Columns provided.

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select the column to be evaluated.

Order By - (ORDER BY clause)

Order by column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order Direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY BY clause)

Grouping column

Pipeline Column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Functions explorer

The FUNCTIONS explorer provides access to functions in Data Hub.

ref functions 1 interface-functions-explorer.png

The following additional features are available with the FUNCTIONS Explorer:

  • Refresh the list, to verify that you are viewing the most recent version, using the Refresh button.

  • Filter the list using the Search text box. This feature allows you to search below the highlighted node or through all nodes if one is not selected.

Note

Functions can also be viewed in the Public directory in RESOURCE EXPLORER. All relative time functions (for example, Current Month) are also available under the Date hierarchies in the using the DIMENSION TREE.

Analytic functions

Cumulative distribution

Returns the relative position of rows based on a cumulative distribution for the column(s) selected in the Order By step.

Output values range between 0.0 and 1.0 (inclusive).

Field

Input Type

Required?

Description

Order By - (ORDER BYclause)

Order by column

Pipeline Column (Select from the drop-down list.)

Required

Select the column to be evaluated.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

First value

This function will return the first value in the Input Column after sorting (Order By) and grouping (Partitioning).

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.)

Required

Select the Input Column for evaluation.

Order By - (ORDER BY BY clause)

Order by column

Pipeline Column (Select from the drop-down list.)

Required

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Last value

This function will return the last value in the Input Column after sorting (Order By) and grouping (Partitioning).

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select the Input Column for evaluation.

Order By - (ORDER BYclause)

Order by column

Pipeline Column (Select from the drop-down list.)

Required

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Lag (Offset)

Returns data from preceding rows in the input column selected.

For example:

ID

ID (Lag=3)

1

NULL

2

NULL

3

NULL

4

1

5

2

6

3

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select the column to be evaluated.

Offset

Expression

Required

Enter the number of values to offset the rows in the Input Column. If the offset is 1, then the output will be the value from the preceding row in the input column.

Default

Expression

Required

Provide a default value to be used when the row offset lies outside the table. For example: if the offset is set to 10, then the first 10 rows of the calculated column will use the default value. (Default is NULL).

Order By - (ORDER BY clause)

Order by column

Pipeline column (Select from the drop-down list.)

Required

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order Direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline Column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

The lead (Offset)

Returns data from subsequent rows in the input column selected.

For example:

ID

ID (Lead=3)

1

4

2

5

3

6

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing input values.

Offset

Expression

Required

Enter the number of values to offset the rows in the Input Column. If the offset is 1, then the output will be the value from the next row in the input column.

Default

Expression

Required

Provide a default value to be used when the row offset lies outside the table. For example: if the offset is set to 10, then the first 10 rows of the calculated column will use the default value. (Default is NULL).

Order By - (ORDER BY clause)

Order by column

Pipeline column (Select from the drop-down list.)

Required

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order Direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline Column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Percentile (Continuous Distribution)

Returns values representing the Percentile for the Order By column(s) within each partition (optional). This function uses a continuous distribution to determine percentiles and returns a computed value that may not exist in the input column.

Note

A percentile of 0.5 represents the Median value of the input column within a group of values (partition).

Field

Input Type

Required?

Description

Percentile

A value is greater than 0.0 and less than or equal 1.0.

Required

Input the percentile to be computed. Input the percentile as a value between 0.0 and 1.0.

Order By - (ORDER BYclause)

Order by column

Pipeline Column (Select from the drop-down list.)

Required

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Percentile (Discrete Distribution)

Returns values representing the Percentile for the Order By column(s) within each partition (optional). This function uses a discrete distribution to determine percentiles and returns a value from the input column.

Note

A percentile of 0.5 represents the Median value of the input column within a group of values (partition).

Field

Input Type

Required?

Description

Percentile

A value is greater than 0.0 and less than or equal 1.0.

Required

Input the percentile to be computed. Input the percentile as a value between 0.0 and 1.0.

Order By - (ORDER BYclause)

Order by column

Pipeline Column (Select from the drop-down list.)

Required

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

Percentile rank

Returns the rank of rows as a percentile for the column(s) selected in the Order By step.

Output values range between 0.0 and 1.0 (inclusive).

Field

Input Type

Required?

Description

Order By - (ORDER BYclause)

Order by column

Pipeline Column (Select from the drop-down list.)

Required

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

JSON functions

JSON String Validity Test

This function determines if there is valid JSON strings in the selected column.

0 - indicates the input string is not valid JSON.

1 - indicates the input string is a valid JSON string.

Note

This function requires SQL Server 2016 or later.

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select the Input column (or columns) for evaluation.

Modify JSON Value

This function returns the input JSON column, with the New Value at the JSON Path specified.

Note

This function requires SQL Server 2016 or later.

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing JSON values to be updated.

JSON Path

Expression (valid JSON path)

Required

Provide the path to the value to be updated.

New Value

Expression

Required

Provide a new value at the provided JSON path.

Query JSON (Object)

This function returns the result of the Input JSON Query Expression.

Note

This function requires SQL Server 2016 or later.

Field

Input Type

Required?

Description

Input JSON Query Expression

Expression

Required

Input the JSON query expression to be evaluated.

JSON Path

Expression (valid JSON path)

Optional

Provide the path to the JSON object or array.

Query JSON (Value)

Returns the value at the specified JSON Path for the Input Column.

Note

This function requires SQL Server 2016 or later.

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.)

Required

Select the input column containing JSON to be queried for values.

JSON Path

Expression (valid JSON path)

Required

Provide the path to the value.

Ranking functions

Provides several ranking functions:

  • Rank: Returns a numerical rank, using duplicate numbers where there are duplicate entries in the Input Column (such as 1, 2, 2, 2, 5, 6).

  • Dense Rank: Similar to Rank but values are not skipped after duplicates (such as 1, 2, 2, 2, 3, 4).

  • Row Number: Similar to Rank but numbers all rows sequentially, regardless of duplicates (such as 1, 2, 3, 4, 5, 6).

Field

Input Type

Required?

Description

Rank function

Select from:

  • Rank

  • Dense Rank

  • Row Number

Required

Select the column containing input values. The average will be computed for the values in the Input Column, grouped by the categories in the Partition Columns provided.

Order By - (ORDER BYclause)

Order by column

Pipeline Column (Select from the drop-down list.)

Required

Select a column (or columns) for sorting (ordering) the input column prior to applying the current function.

Order direction

Select from the drop-down list.

Options:

  • Ascending Order (default)

  • Descending Order

Group By - (PARTITION BY clause)

Grouping column

Pipeline column (Select from the drop-down list.)

Optional

Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function.

String functions

Concatenate

Concatenates (joins) the Input Columns, separated by the Delimiter provided.

Field

Input Type

Required?

Description

Delimiter /Separator

Expression

Optional

Provide a separator for the concatenated columns. This can be a single character (such as ','), a string (such as 'and'), or no separator ('').

Input Column

Pipeline Column (Select from the drop-down list.)

Minimum 2 columns required

Select the columns to be concatenated in the order they are to be concatenated.

Left (Truncate)

This function is used for truncating text, retaining the number of characters Length specified, and will retain characters from the left of the string.

Example

if an initial string in the input column was Expenses, a Length of 3 would return Exp.

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.

Required

Select the column containing the strings to be truncated.

Length

Integer

Required

Enter the length of the input column strings to be retained.

Replace string

This function is used to find and replace content in the Input Column. When the Search Pattern is discovered in the Input Column, it will be replaced with the Replacement String

Example

Input String: qwerty

Search Pattern: 'ert'

Replacement: 'zzz'

Result: qwzzzy

This function will not replace the values in the original column but will create a duplicate version of the original column then perform the replacements.

Field

Input Type

Required?

Description

Input column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing the data strings to be modified by replacement.

Search pattern

Expression

Required

Provide a search pattern. When this string is discovered in the Input Column, it will be replaced with the Replacement String.

Replacement String

Expression

Required

When the Search Pattern is discovered in the Input column, it will be replaced with the Replacement String.

Right (Truncate)

This function is used for truncating text, retaining the number of characters Length specified, and will retain characters from the right of the string.

Example

If an initial string in the input column was Expenses, a Length of 5 would return enses.

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing the strings to be truncated.

Length

Integer

Required

Enter the length of the input column strings to be retained.

Trim after pattern

This function truncates the input column based on the search pattern provided. All content after (and including) the first match for the input search pattern will be removed for each cell.

Example

Cell contents: ABC,DEF,GHI,JKL

Search Pattern: ','

Result: ABC

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing characters (such as text) to be truncated.

Search pattern

Expression

Required

Enter a search term to indicate breakpoints in the cells of the input column. This may be a delimiter (such as ',' , '_' , ':' , ' ' , etc), character, or string of characters. For example Cell Contents: ABC.z.DEF.z.GHI.z.JKL Search Pattern: '.z.' Result: ABC

Trim before and after pattern

This function truncates the input column based on the search pattern provided. All content before the first match and after the last match for the input search pattern will be removed for each cell (including the matched content).

Example

Cell contents: ABC,DEF,GHI,JKL

Search Pattern: ','

Result: DEF,GHI,

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing characters (such as text) to be truncated.

Search pattern

Expression

Required

Enter a search term to indicate breakpoints in the cells of the input column. This may be a delimiter (such as ',' , '_' , ':' , ' ' , etc), character or string of characters. For example Cell Contents: ABC.z.DEF.z.GHI.z.JKL Search Pattern: '.z.' Result: DEF.z.GHI

Trim to pattern

This function truncates the input column based on the search pattern provided. All content before (and including) the first match for the input search pattern will be removed for each cell.

Example

Cell contents: ABC,DEF,GHI,JKL

Search Pattern: ','

Result: DEF,GHI,JKL

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing characters (such as text) to be truncated.

Search pattern

Expression

Required

Enter a search term to indicate breakpoints in the cells of the input column. This may be a delimiter (such as ',' , '_' , ':' , ' ' , etc), character or string of characters. For example Cell Contents: ABC.z.DEF.z.GHI.z.JKL Search Pattern: '.z.' Result: DEF.z.GHI.z.JKL

Trim whitespace

Trims leading and trailing whitespace from data in the Input Column.

Note

This function requires SQL Server 2017 or later.

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.)

Required

Select the column to be evaluated.

Time functions

Create date

Joins the input Year, Month, and Day columns provided to produce a date.

Field

Input Type

Required?

Description

Input year column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing the year (YYYY) value for the output date.

Search month column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing the month (MM) value for the output date.

Input day column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing the day (DD) value for the output date.

Date offset (Date Add)

The date in the Input Date Column will be offset based on the Amount of Offset and the Measure of Time (or Units of Time).

For example:

Input Date: 2000-01-01

Units: days

Amount of Offset: 5

Result: 2000-01-06

Field

Input Type

Required?

Description

Measure of Time (Units)

Select from:

  • Year

  • Quarter

  • Month

  • Day

  • Day Of Year

  • Week

  • Weekday

  • Hour

  • Minute

  • Second

  • Millisecond

  • Microsecond

  • Nanosecond

Required

Select the element of the Input Date Column to be offset.

Amount of Offset

Integer or Float

Required

Enter the size of the offset.

Input Date Column

Pipeline Column (Select from the drop-down list.)

Required

Select an input date column for the function.

Day from date column

Returns the day (DD) value from the Input Date Column.

The Input Date Column can be:

  • A Date type column.

  • A DateTime type column.

  • A Text type column containing a recognized date or date-time format.

Field

Input Type

Required?

Description

Input Date Column

Pipeline Column (Select from the drop-down list.)

Required

Select an input date column for the function.

Difference between dates

Calculates the difference between two date, time, or date-time columns. The result is returned in the Measure of Time (Units) selected.

For example:

Start Date: 2000-01-01

End Date: 2000-02-01

Units: Day

Results: 31

Field

Input Type

Required?

Description

Measure of Time (Units)

Select from:

  • Year

  • Quarter

  • Month

  • Day

  • Day Of Year

  • Week

  • Weekday

  • Hour

  • Minute

  • Second

  • Millisecond

  • Microsecond

  • Nanosecond

Required

Select the unit of time results will be expressed in.

Start Date/Time Colum

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing the start date.

End Date/Time Colum

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing the end date.

Extract date component

This function returns the Date Component selected of the input Date, Time, or DateTime data.

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.)

Required

Select the column containing Date, Time, or Datetime data.

Date Component

Select from:

  • Year

  • Quarter

  • Month

  • Day

  • Day Of Year

  • Week

  • Weekday

  • Hour

  • Minute

  • Second

  • Millisecond

  • Microsecond Nanosecond

  • Time Zone Offset

Required

Select the Component of the Date, Time, or DateTime string to be extracted from the Input Column.

Month from date column

Returns the month (MM) value from the Input Date Column.

The Input Date Column can be:

  • A Date type column.

  • A DateTime type column.

  • A Text type column containing a recognized date or date-time format.

Field

Input Type

Required?

Description

Input Date Column

Pipeline Column (Select from the drop-down list.)

Required

Select an input date column for the function.

Time zone conversion

Converts the source date-time column to another time zone.

Example

Source date-time: 01 Jan 2000 00:00:00

Source Time Zone: UTC (Coordinated Universal Time)

Target Time Zone: (UTC+10:00) Brisbane

Result: 01 Jan 2000 10:00:00

Field

Input Type

Required?

Description

Source column

Pipeline Column (Select from the drop-down list.)

Required

Select an input date time column for the function.

Source Time Zone - Select from the following options.

Source Time Zone

Radio selection

Required

Select one of the following methods to indicate the time zone of the Source Column.

Column

Uses the time zone of the Source Column's Date Time data type.

Parameter

Overrides the Source Column's time zone with the value from the selected parameter.

Time Zone

Overrides the Source Column's time zone with the selected time zone (selected from the drop-down list).

Target Time Zone

Target Time Zone

Select from the drop-down list.

Required

Select the output time zone.

Year from date column

Returns the year (YYYY) value from the Input Date Column.

The Input Date Column can be:

  • A Date type column.

  • A DateTime type column.

  • A Text type column containing a recognized date or date-time format.

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.)

Required

Select the column for the function.

Other functions

First Non-NULL value

Returns the first non-NULL value in each row for the selected Input Columns.

Field

Input Type

Required?

Description

Input Column

Pipeline Column (Select from the drop-down list.)

Required (minimum 2 columns)

Select the column tp be evaluated. Columns will be evaluated in the order they are added.

If (Complex)

The If (Complex) function will evaluate the When, Then, and Else expressions provided.

Example

When [Water_Temp] <= 0 Then 'Solid'

When [Water_Temp] >= 100 Then 'Gas'

When [Water_Temp] > 0 AND [Water_Temp] < 100 Then 'Liquid'

For simple equality statements with a single input column, consider using If (Simple).

Field

Input Type

Required?

Description

Condition - add a condition pair

When

Expression

Required

Enter a boolean expression to evaluate the corresponding Then expression when true.

Then

Expression

Required

Enter an expression to evaluate when the corresponding When expression is met.

The Else Clause

Else

Expression

Required

Enter an expression to evaluate when no When expressions are met.

If (Simple)

The If (Simple) function will evaluate the Input Expression based on the When, Then, and Else expressions provided.

The IF (Simple) only accepts equality statements and is best suited for categorical inputs.

Example

Input Expression: [Motor_Vehicle]

When 'T' Then 'Truck'

When 'C' Then 'Car'

When 'B' Then 'Motorcycle'

Else 'Other MV'

For more complex if conditions, use If (Complex).

Field

Input Type

Required?

Description

Input

Expression

Required

Provide an input expression or column to be evaluated.

Condition - add a condition pair

When

Expression

Required

Enter a boolean expression to evaluate the corresponding Then expression when true.

Then

Expression

Required

Enter an expression to evaluate when the corresponding When expression is met.

The Else Clause

Else

Expression

Required

Enter an expression to evaluate when no When expressions are met.

Random number between

Returns random integers between Upper and Lower limits provided. The Upper and Lower limits are included in the range of values generated.

For example:

Upper Limit: 10

Lower Limit: 5

Result: Each row is randomly assigned one of the following numbers: 5 , 6 , 7 , 8 , 9 , 10.

Field

Input Type

Description

Lower Limit

Expression

Required

Enter the Integer (whole number) to be used as the lower limit for the random numbers to be generated. Only Integers are supported for this function. The lower limit is included in the range of values generated.

Upper Limit

Expression

Required

Enter the Integer (whole number) to be used as the upper limit for the random numbers to be generated. Only Integers are supported for this function. The upper limit is included in the range of values generated.

SQL expression

You can enter a scalar (single value) SQL expression to define a new column using the +Calculation button.

Example

If you want to calculate a column called "Discount Amount", the following SQL expression could be used: [Sales Amount] * [Discount Pct]

  1. Click the + Calculation button.

    A new calculated column, called New Calculated Column, is added to the end of the pipeline schema.

  2. On the right side of the screen, alter the column's settings in the Properties panel.

    • In the Properties panel, enter a meaningful name for the calculated column.

    • Enter a valid SQL expression for the column’s calculation.

      As you type, simple SQL syntax and column reference checking is dynamically performed. This feature saves time as typing errors are found more quickly. The validation also verifies that columns in the current pipeline referred to in the SQL expression are correct and that the SQL has correct syntax.

      Note

      If you change a column's physical name, any calculated columns in the current pipeline that refer to the column are automatically updated. Calculations in other pipelines that refer to the column will display an error indication. Clicking in the SQL pane for each offending calculation will force the column name to be updated.

       

    • Use defined parameters in the expression.

    • Use the code completion options to view lists of available code and syntax options as you type an entry.

    • Drag-and-drop columns from the Out column on the COLUMNS pipeline tab's column list, as shown below.

      FUNCTION_1_mod_column_calc_column_drag_drop.png

      Note

      If you need to enter a conditional expression, use the SQL CASE expression.

  3. Click Apply to test run the SQL code.

    This action locates any run-time errors, such as divide by zero, that are not found by the dynamic syntax checking (which was performed as you typed the code). The associated column is automatically profiled and the information is immediately displayed in the Profile area on the Properties pane.

    The color of the triangle icon on the Apply button represents the status of the SQL expression:

  • Green. The expression has been successfully validated and may be executed.

  • Yellow. The expression is invalid or an error was encountered when executing the SQL expression.

  • Dark gray. The expression is currently being validated.

  • Light gray. The expression has already been executed.

Note

For examples of these colors, hover over the Apply button in the Data Hub interface and view the corresponding info tip.

The calculation’s result appears in the Preview pane, at the bottom of the tab. The example below illustrates the calculation of sick leave days from sick leave hours in the employee pipeline (assuming an 8 hour workday).

FUNCTION 2 mod_pipeline_calculations.png

Note

Lookups are always done before calculations in each step. You can define a lookup and a calculation in the same step, which allows you to define a calculation on a lookup column.

If you wish to perform a lookup on a calculated column, you must configure the calculation in one step, and then add a lookup on the calculated column in a subsequent step.

Defining a calculated column with a subquery

You can define a calculated column by entering SQL to perform a correlated subquery. A correlated subquery is a nested query that uses values from the outer query. The correlated subquery is evaluated once for each row returned by the outer query.

In the context of calculated columns in a data model, the "outer" query is performed transparently within Data Hub, and the subquery is executed within it, as part of the SELECT clause. You need only enter the subquery text in the calculated column.

The @IN and @INALIAS tokens are provided to support this functionality. The @IN token may be used to refer to the input table view to the current step within the pipeline. The @INALIAS token may be used to refer to the table in the otherwise-unseen "outer" section of the query.

In the following example, a complete SQL query to return information, including the amount (for each detail line on an invoice) together with the average value of the detail lines for that invoice, would appear as follows:

Note

The subquery text is highlighted in blue.

Example

Select SalesOrderId, SalesOrderDetailId, LineTotal,

(select avg(LineTotal)

from Sales.SalesOrderDetail as [INNER]

where [INNER].SalesOrderID = [OUTER].SalesOrderID)

As AvgLineForSalesOrder

from Sales.SalesOrderDetail as [OUTER]

In a data model, the "outer" query would be implicit. To enter the subquery as a calculated column, use the @IN and @INALIAS tokens to replace the table names, as shown below:

Example

(select avg(LineTotal)

from @IN as [INNER]

where [INNER].SalesOrderID = @INALIAS.SalesOrderID)

Reference: Common fields

The ORDER BY clause and the PARTITION BY clause are common to most functions in the following groups:

  • Aggregate Functions

  • Analytic Functions

  • Ranking Functions

Order By - (ORDER BY clause)

Field

Input Type

Description

Order By Column

Pipeline Column (Select from the drop-down list.)

Select an input date column for the function.

Order Direction

Select from the drop-down list.

Options:

  • Ascending Order (default

  • Descending Order

Group By - (PARTITION BY clause)

Field

Input Type

Description

Grouping Column

Pipeline Column (Select from the drop-down list.)

Select a column (or columns) for sorting the input column prior to applying the current function.