ParquetDst#

Parquet destination arguments.

Parameters:

uri

Path to which the output should be written. Must be a URI to an accessible object store location. If set to "local", the query is executed locally. If None, the result will be written to a temporary location. This is useful for intermediate query results.

compression{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}

Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers.

compression_level

The level of compression to use. Higher compression means smaller files on disk.

“gzip” : min-level: 0, max-level: 10.
“brotli” : min-level: 0, max-level: 11.
“zstd” : min-level: 1, max-level: 22.

statistics

Write statistics to the parquet headers. This is the default behavior.

Possible values:

True: enable default set of statistics (default). Some statistics may be disabled.
False: disable all statistics
“full”: calculate and write all available statistics. Cannot be combined with use_pyarrow.
{ "statistic-key": True / False, ... }. Cannot be combined with use_pyarrow. Available keys:
- “min”: column minimum value (default: True)
- “max”: column maximum value (default: True)
- “distinct_count”: number of unique column values (default: False)
- “null_count”: number of null values in column (default: True)

row_group_size

Size of the row groups in number of rows. Defaults to 512^2 rows.

data_page_size

Size of the data page in bytes. Defaults to 1024^2 bytes.

storage_options

Options that indicate how to connect to a cloud provider.

The cloud providers currently supported are AWS, GCP, and Azure. See supported keys here:

aws
gcp
azure
Hugging Face (hf://): Accepts an API key under the token parameter: {'token': '...'}, or by setting the HF_TOKEN environment variable.

If storage_options is not provided, Polars will try to infer the information from environment variables.

credential_provider

Provide a function that can be called to provide cloud storage credentials. The function is expected to return a dictionary of credential keys along with an optional credential expiry time.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Attributes:

`compression`	Compression algorithm
`compression_level`	Compression level
`credential_provider`	Credential provider
`data_page_size`	Data Page size
`row_group_size`	Size of the row groups
`uri`	Path to which the output should be written

compression: ParquetCompression: Compression algorithm

compression_level: int | None: Compression level

credential_provider: CredentialProviderFunction | Literal['auto'] | None: Credential provider

data_page_size: int | None: Data Page size

row_group_size: int | None: Size of the row groups

uri: str | Path | None: Path to which the output should be written