ParquetDst#
- class polars_cloud.ParquetDst(
- uri: str | Path,
- *,
- compression: ParquetCompression = 'zstd',
- compression_level: int | None = None,
- statistics: bool | str | dict[str, bool] = True,
- row_group_size: int | None = None,
- data_page_size: int | None = None,
- storage_options: dict[str, Any] | None = None,
- credential_provider: CredentialProviderFunction | Literal['auto'] | None = 'auto',
Parquet destination arguments.
- Parameters:
- uri
Path to which the output should be written. Must be a URI to an accessible object store location. If set to
"local"
, the query is executed locally. IfNone
, the result will be written to a temporary location. This is useful for intermediate query results.- compression{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}
Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers.
- compression_level
The level of compression to use. Higher compression means smaller files on disk.
“gzip” : min-level: 0, max-level: 10.
“brotli” : min-level: 0, max-level: 11.
“zstd” : min-level: 1, max-level: 22.
- statistics
Write statistics to the parquet headers. This is the default behavior.
Possible values:
True
: enable default set of statistics (default). Some statistics may be disabled.False
: disable all statistics“full”: calculate and write all available statistics. Cannot be combined with
use_pyarrow
.{ "statistic-key": True / False, ... }
. Cannot be combined withuse_pyarrow
. Available keys:“min”: column minimum value (default:
True
)“max”: column maximum value (default:
True
)“distinct_count”: number of unique column values (default:
False
)“null_count”: number of null values in column (default:
True
)
- row_group_size
Size of the row groups in number of rows. Defaults to 512^2 rows.
- data_page_size
Size of the data page in bytes. Defaults to 1024^2 bytes.
- storage_options
Options that indicate how to connect to a cloud provider.
The cloud providers currently supported are AWS, GCP, and Azure. See supported keys here:
Hugging Face (
hf://
): Accepts an API key under thetoken
parameter:{'token': '...'}
, or by setting theHF_TOKEN
environment variable.
If
storage_options
is not provided, Polars will try to infer the information from environment variables.- credential_provider
Provide a function that can be called to provide cloud storage credentials. The function is expected to return a dictionary of credential keys along with an optional credential expiry time.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Attributes:
compression
Compression algorithm
compression_level
Compression level
credential_provider
Credential provider
data_page_size
Data Page size
row_group_size
Size of the row groups
uri
Path to which the output should be written
- compression: ParquetCompression
Compression algorithm
- credential_provider: CredentialProviderFunction | Literal['auto'] | None
Credential provider