ParquetDst#

class polars_cloud.ParquetDst(
uri: str | Path,
*,
compression: ParquetCompression = 'zstd',
compression_level: int | None = None,
statistics: bool = True,
row_group_size: int | None = None,
data_page_size: int | None = None,
)

Parquet destination arguments.

Parameters:
uri

Path to which the output should be written. Must be a URI to an accessible object store location. If set to "local", the query is executed locally. If None, the result will be written to a temporary location. This is useful for intermediate query results.

compression{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}

Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers.

compression_level

The level of compression to use. Higher compression means smaller files on disk.

  • “gzip” : min-level: 0, max-level: 10.

  • “brotli” : min-level: 0, max-level: 11.

  • “zstd” : min-level: 1, max-level: 22.

statistics

Write statistics to the parquet headers. This is the default behavior.

row_group_size

Size of the row groups in number of rows. Defaults to 512^2 rows.

data_page_size

Size of the data page in bytes. Defaults to 1024^2 bytes.

Attributes:

compression

Compression algorithm

compression_level

Compression level

data_page_size

Data Page size

row_group_size

Size of the row groups

statistics

Write statistics to parquet headers

uri

Path to which the output should be written

compression: ParquetCompression

Compression algorithm

compression_level: int | None

Compression level

data_page_size: int | None

Data Page size

row_group_size: int | None

Size of the row groups

statistics: bool

Write statistics to parquet headers

uri: str | Path | None

Path to which the output should be written