PostgreSQL offers a wide variety of field types, and you can also create custom types known as "Composite Types." In this discussion, we will focus on the most commonly used field types, especially for those transitioning from other SQL servers.
Characters
In PostgreSQL, there are several text field types designed to handle varying amounts of text data. Here are the main types:
1. text:
- This type stores variable-length strings with no specific limit on size.
- Ideal for storing large amounts of text, such as articles or descriptions.
- The maximum size is 1 GB. This limit is in byte not characters. UTF8 has a variable number of bytes per characters, 1 to 4 bytes.
2. varchar(n): Equivalent to character varying.
- A variable-length character type with a specified maximum length `n`. `n` may not exceed 10,485,760 characters.
- Useful when you want to enforce a limit on the length of the text.
- For example, `VARCHAR(255)` can store up to 255 characters.
3. char(n): Equivalent to character.
- A fixed-length character type that always reserves space for `n` characters. `n` may not exceed 10,485,760 characters.
- If the input string is shorter than `n`, it is padded with spaces.
- Best used for storing data that has a consistent length, such as country codes.
Considerations:
- Errors: If you attempt to store a string that exceeds the defined length in CHAR(n) or VARCHAR(n), PostgreSQL will raise an error, and NOT store a trimmed value.
- Performance: Generally, `TEXT` and `VARCHAR` are treated the same in terms of performance, but using `VARCHAR(n)` can enforce data integrity by limiting the length of entries.
- Usage: Choose `TEXT` for flexibility when dealing with potentially long text, `VARCHAR(n)` when you need constraints, and `CHAR(n)` for fixed-length requirements.
Binary data
- They are used to store any bytes, not related to character encoding.
- Not be be confused with "bit" types, which are the equivalent of flags.
In PostgreSQL, the primary field types that can store binary data are:
1. bytea:
- This data type is specifically designed for storing binary data.
- It allows you to store raw byte sequences and is suitable for data such as images, files, or any other binary formats.
- The maximum size for a `BYTEA` field is 1 GB.
2. Large Objects (LOB):
- For larger binary data, you can use the Large Object feature.
- Large Objects are stored in a separate system table and can hold up to 2 GB per object.
- You interact with Large Objects using special functions (like `lo_*` functions) to read, write, and manage the data.
- These are global to a server, not individual database.
Since the system table used to store large object is not partitioned by default and is then limited to 32 TB, I would recommend using BYTEA fields instead of Large Objects and implement your own splitting of content at 1 GB.
Numeric Types
PostgreSQL provides several numeric field types to accommodate a variety of numerical data needs. Here’s an overview of the different numeric types available:
1. Integer Types:
- smallint: 2 bytes, range from -32,768 to 32,767.
- integer: 4 bytes, range from -2,147,483,648 to 2,147,483,647.
- bigint: 8 bytes, range from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
2. Floating-Point Types:
- real: 4 bytes, single-precision floating-point, with an approximate range of -3.14E-37 to 3.14E+37.
- double precision: 8 bytes, double-precision floating-point, with an approximate range of -1.70E-308 to 1.70E+308.
- numeric aka decimal: This type can store numbers with a user-defined precision and scale. For example, `numeric(10, 2)` can store up to 10 digits, with 2 digits after the decimal point. This type is ideal for exact numerical calculations, such as monetary values.
Only the numeric type will not truncate/round values.
3. Serial Types:
- serial: An auto-incrementing integer (4 bytes) that is essentially an integer with a sequence behind it.
- bigserial: An auto-incrementing bigint (8 bytes) for larger integer values.
Serial types are essentially integer types that are automatically associated with sequences.
- Use integer types for counting and whole numbers.
- Use floating-point types when you need approximate values and can tolerate some precision loss.
- Use numeric types when you require exact precision, such as in financial calculations.
Date and Time
PostgreSQL offers several date and time field types to handle various temporal data needs. Here’s a summary of the different date and time types:
1. Date Types:
- date: Stores the calendar date (year, month, day) without time. The range is from 4713 BC to 5874897 AD.
2. Time Types:
- time: Stores time of day (hour, minute, second) without time zone information. It can be defined with optional precision (e.g., `time(3)` for milliseconds). The range is from 00:00:00 to 24:00:00.
- time with time zone: Similar to `time`, but includes time zone information.
3. Timestamp Types:
- timestamp: Stores both date and time without time zone information. Like `time`, it can also have optional precision (e.g., `timestamp(3)` for milliseconds). The range is from 4713 BC to 5874897 AD.
- timestamp with time zone: Stores both date and time along with time zone information. It automatically converts to UTC for storage and converts back to the specified time zone when queried.
4. Interval Type:
- interval: Represents a span of time, which can be expressed in terms of years, months, days, hours, minutes, and seconds. This type is useful for calculating durations between dates and times.
Use the timestamp with time zone when you need to record a date and time. You can also specify to record up to 6 decimal point of a second. This is the safest way to not loose temporal information.
For example to get as a string the elapsed time since the value in a timestamp with time zone field use the following:
to_char(now()-EnumValue.sysm,'DD "Days" HH24 "Hours" MI "Minutes" SS "Seconds"')
It is always possible to retrieve the value of a timestamp with time zone based on a particular time zone:
SELECT timezone('US/Pacific',"TableName"."DatetimeWithTimezonFieldName") AS "TimeInMyTimeZone", ....
To get the list of all supported time zone names, you can use the following:
SELECT name as "Name",
abbrev as "Abbreviation",
(EXTRACT(EPOCH FROM utc_offset)/60)::int as "UTCOffset",
is_dst as "DaylightSavingTime"
FROM pg_timezone_names
Boolean
1. boolean :
- A boolean value takes 1 byte of storage.
- The boolean type can hold three distinct values: TRUE, FALSE or NULL if the field is NULLABLE.
- You can insert boolean values using: `TRUE` or `FALSE`, `'t'` or `'f'`, `1` (for true) and `0` (for false)
- PostgreSQL provides various functions and operators to work with boolean values, such as logical operators (`AND`, `OR`, `NOT`).
If you are using ODBC to connect to a PostgreSQL database and want to retrieve boolean fields not as numeric 1 and 0, add "BoolsAsChar=0;" in the connection string.
UUID Universally Unique Identifier
The UUID field type in PostgreSQL is a powerful option for generating globally unique identifiers, making it useful for distributed systems and applications where uniqueness across multiple tables or databases is essential.
- A UUID occupies 16 bytes of storage.
- UUIDs are typically represented as 32 hexadecimal characters, displayed in five groups separated by hyphens, like this: `550e8400-e29b-41d4-a716-446655440000`.
- UUIDs are designed to be globally unique. The chance of generating the same UUID is extremely low, making them suitable for use as primary keys in distributed systems.
- PostgreSQL provides several built-in functions to generate UUIDs, such as:
-- uuid_generate_v1(): Timestamp and MAC address-based; predictable, but offers global uniqueness across space and time.
-- uuid_generate_v4(): Randomly generated; highly unique and unpredictable, suitable for privacy-sensitive applications.
-- uuid_generate_v7(): New to PostgreSQL 17. It provides a modern approach that combines timestamp-based generation with randomness, making it ideal for various applications needing unique and sortable identifiers.
- While UUIDs provide uniqueness, they are larger than traditional integer types, which may affect performance in some cases, especially for indexing.
JSON
PostgreSQL provides powerful JSON and JSONB data types to store and manipulate JSON data, with JSONB being the more efficient option for most use cases due to its binary storage format and support for indexing. When deciding between them, consider your application's specific needs for performance and functionality.
There are two primary JSON-related data types:
1. json
- The `JSON` data type stores JSON data as text.
- It validates the input to ensure that it is well-formed JSON before storing it.
- When you query JSON data stored in this format, it can be less efficient for certain operations compared to the `JSONB` type because it requires parsing each time you access it.
2. jsonb
- The `JSONB` data type stores JSON data in a binary format.
- It is optimized for storage and query performance. JSONB supports indexing, making it faster for certain types of queries.
- Allows for efficient querying using operators and functions.
- Supports indexing, enabling faster lookups.
- Automatically eliminates duplicate keys and orders the keys, which can save space.
- JSONB is typically preferred over JSON when you need to perform complex queries or work with large datasets.
Arrays
In PostgreSQL, the array data type allows you to store multiple values of the same data type within a single column. Additionally, all PostgreSQL data types can be defined as arrays.
For example: uuid[] would allow to store a list of uuids.
This feature is not available in other SQL backend and makes it easier to break the 3rd normalization rule. I would highly recommend not using arrays.
PostgreSQL has several functions to deal with arrays: array_length,unnest,any,array_append,array_remove ...
Indexing can not be done on an array, only to one of its element. There is no sorting inside arrays. AVOID this feature if used to replace an entire table or a many to many relationship.
Enumerations
In PostgreSQL, enumerations (enums) are a special data type that allows you to define a column with a fixed set of possible values. This can be particularly useful for situations where a column should only contain a specific set of predefined values, enhancing data integrity and readability.
- An enumeration type is created using the `CREATE TYPE` statement, where you specify the name of the enum and its possible values.
- Once an enum type is defined, the values it can take are fixed. This means that any column of this type can only contain one of the specified values, ensuring consistency.
- Using enums helps prevent invalid data from being inserted into the database, as only the predefined values are allowed.
- Enums improve the readability of your database schema and queries, as the defined values often convey meaning more clearly than generic strings or integers.
- Removing values from an enum is not directly supported, but you can work around it by creating a new enum type and migrating the data.
- If you intend to create a multi-language application you may want to avoid enumeration and create your own association of numbers or code to text/description.