Long Format

Long format is typically used for transaction data containing many variables and combinations. In long format, categories and their corresponding items are represented vertically and grouped and ordered based on a unique transaction ID.

An example of long format is displayed below.

In the above example, two transactions are shown. Each transaction is identified using a unique transaction_id :101001 and 101002. Transaction 101001 is associated with the purchase of 2 items. Transaction 101002 is associated with the purchase of 3 items. These transactions must be contiguous.

Long format does not support user-defined columns.

Using Long Format for Non-Transaction Data

In addition to transaction data, long format can be used for any data containing unique identifiers and repeating vertical rows.

An example of network data is displayed below.

In the above example, each network event is tracked by a unique identifier. This identifier is used to associate the event with different information in the item_id column.

Long Format Details

Headers are required for each column in long format. The minimum columns required are transaction_id and item_id.  The acceptable columns are transaction_id, transaction_time, item_id, item_name, item_category, transaction_volume, transaction_price, item_volume, item_price.

The first column must be transaction_id.

While the data is allowed to contain any UTF-8 characters the header must be in lower ASCII.

Long format does not support user-defined columns. The required and optional columns are described in the table below.

Column Header Required/Optional Format Description
transaction_id Required String

A unique identifier for the transaction. This identifier can appear in multiple rows but should be different for each transaction. NOTE: Transactions must be contiguous. If you use the same transaction_id in two places in the file, then it will be interpreted as two transactions with the same ID.

Rows sharing the same transaction_id must be adjacent.

Field values must be wrapped in double quotes (") and cannot be longer than 255 characters.

All double quotes inside any string should be escaped with another double quote. For example:

"String 3" Roll" should be changed to "String 3"" Roll"

item_id Required String

A unique identifier for the item. Typically, this column is used for SKU numbers and product numbers.

Field values must be wrapped in double quotes (") and cannot be longer than 255 characters.

All double quotes inside any string should be escaped with another double quote. For example:

"String 3" Roll" should be changed to "String 3"" Roll"

transaction_date
OR
transaction_time
OR transaction_time_unix
Optional Date/ Time

The date and time when the transaction occurred. Field values can also be formatted using ISO or AUS standards. (See "Date and Time" section below)

Field values can contain date only, or date and time. The column heading indicates the date and time format used.

item_category Optional String

A text identifier used to categorize the item. Typically, this column is used for product category.

Field values must be wrapped in double quotes (") and cannot be longer than 255 characters.

All double quotes inside any string should be escaped with another double quote. For example:

"String 3" Roll" should be changed to "String 3"" Roll"

item_name Optional String

A descriptive label of the item used in the item_id column.

For example, an item_id of RTJ00345 has the associated item_name of Green Striped Lawn Chair.

Field values must be wrapped in double quotes (") and cannot be longer than 255 characters.

All double quotes inside any string should be escaped with another double quote. For example:

"String 3" Roll" should be changed to "String 3"" Roll"

item_volume Optional Float

The volume of item associated with the transaction. Typically, this volume is the quantity of a specific item purchased.

Field values must be in the standard float format: a string of digits with a decimal point optionally among them. Precision beyond 4 decimal places is not used.

If the field value is NULL or blank, a default value of 1 is used. Negative values and 0's are not allowed.

item_price Optional Price

The individual price of the item associated with the transaction. Typically, this price is the manufacturer's suggested retail price (MSRP) or other undiscounted price.

Field values must be in dollars, and may begin with an optional $ sign. NULL or blank values are supported. Negative prices are not supported.

transaction_volume Optional Integer

The total volume of the entire transaction.

Note: This column should not be confused with the item_volume column, which is used for the volume associated with specific items.

Field values must be non-negative values consisting only of digits.

If the field value is NULL or blank, a default value of 1 is used.

transaction_price Optional Price

The total price of the entire transaction.

Note: This column should not be confused with the item_price column, which is used for the price associated with specific items.

Field values must be in dollars, and may begin with an optional $ sign. NULL or blank values are supported. Negative prices are not supported.

Date and Time

When using the transaction_date column, dates should be recorded using in the MM-DD-YYYY or MM/DD/YYYY format. Month and day values can be padded with zeros.

To use the ISO data format (YYYY-MM-DD or YYYY/MM/DD), add _ISO or -ISO to the end of the transaction_date column header.

To use the AUS data format (D-M-YY or D/M/YY), add _AUS or -AUS to the end of the transaction_date column header.

When using the transaction_time, or transaction_time_unix column, dates should be recorded using the ISO-8601 or UNIX time formats.

The ISO-8601 format is YYYY-MM-DDTHH:MM:SSZ
T and the Z are literal characters and can be upper or lower case.

The UNIX time format is the number of seconds since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970.

Naming Data File

EmcienPatterns uses a file naming convention to identify the format used. The filename is limited to 200 characters or less.  Files should be named using the following structure:

<filename>.<file type>.<extension>

 filename Filenames should use only the following characters:
  • ASCII alphanumeric characters in upper (A-Z), lower (a-z) case, 0-9
  • Whitespace
  • Periods .
  • Underscores _
  • Hyphens -
  • Parentheses ( )

The following characters are not supported:

  • Backslash (\)
  • Single quote (')
  • Double quotes (")
 file type long
 extension

.csv or .csv.gz

File Name Examples

 area2-sales.long.csv 

 Uncompressed sales data from area 2 in the receipt format 

 clinical.all.long.csv.gz

 Compressed clinical test data in the long format