ParquetDst#

class polars_cloud.ParquetDst(
uri: str | Path,
*,
compression: ParquetCompression = 'zstd',
compression_level: int | None = None,
statistics: bool | str | dict[str, bool] = True,
row_group_size: int | None = None,
data_page_size: int | None = None,
storage_options: dict[str, Any] | None = None,
credential_provider: CredentialProviderFunction | Literal['auto'] | None = 'auto',
)

Parquet destination arguments.

Parameters:
uri

Path to which the output should be written. Must be a URI to an accessible object store location. If set to "local", the query is executed locally. If None, the result will be written to a temporary location. This is useful for intermediate query results.

compression{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}

Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers.

compression_level

The level of compression to use. Higher compression means smaller files on disk.

  • “gzip” : min-level: 0, max-level: 10.

  • “brotli” : min-level: 0, max-level: 11.

  • “zstd” : min-level: 1, max-level: 22.

statistics

Write statistics to the parquet headers. This is the default behavior.

Possible values:

  • True: enable default set of statistics (default). Some statistics may be disabled.

  • False: disable all statistics

  • “full”: calculate and write all available statistics. Cannot be combined with use_pyarrow.

  • { "statistic-key": True / False, ... }. Cannot be combined with use_pyarrow. Available keys:

    • “min”: column minimum value (default: True)

    • “max”: column maximum value (default: True)

    • “distinct_count”: number of unique column values (default: False)

    • “null_count”: number of null values in column (default: True)

row_group_size

Size of the row groups in number of rows. Defaults to 512^2 rows.

data_page_size

Size of the data page in bytes. Defaults to 1024^2 bytes.

storage_options

Options that indicate how to connect to a cloud provider.

The cloud providers currently supported are AWS, GCP, and Azure. See supported keys here:

  • aws

  • gcp

  • azure

  • Hugging Face (hf://): Accepts an API key under the token parameter: {'token': '...'}, or by setting the HF_TOKEN environment variable.

If storage_options is not provided, Polars will try to infer the information from environment variables.

credential_provider

Provide a function that can be called to provide cloud storage credentials. The function is expected to return a dictionary of credential keys along with an optional credential expiry time.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Attributes:

compression

Compression algorithm

compression_level

Compression level

credential_provider

Credential provider

data_page_size

Data Page size

row_group_size

Size of the row groups

uri

Path to which the output should be written

compression: ParquetCompression

Compression algorithm

compression_level: int | None

Compression level

credential_provider: CredentialProviderFunction | Literal['auto'] | None

Credential provider

data_page_size: int | None

Data Page size

row_group_size: int | None

Size of the row groups

uri: str | Path | None

Path to which the output should be written