polars_cloud.spawn_many#

polars_cloud.spawn_many(

lf: list[LazyFrame],

*,

dst: Path | str | Dst,

context: ComputeContext | None = None,

engine: Engine = 'auto',

plan_type: PlanTypePreference = 'dot',

labels: None | list[str] = None,

shuffle_compression: ShuffleCompression = 'auto',

distributed: DistributionSettings | None | bool = None,

n_retries: int = 0,

**optimizations: bool,

) → list[BatchQuery] | list[InteractiveQuery]#

Spawn a multiple remote queries and await it asynchronously.

Parameters:

lf

A list of Polars LazyFrame’s which should be executed on the compute cluster.

dst

Destination to which the output should be written. If an URI is passed, it must be an accessible object store location. If set to "local", the query is executed locally.

context

The context describing the compute cluster that should execute the query. If set to None (default), attempts to load a valid compute context from the following locations in order:

The compute context cache. This contains the last Compute context created.
The default compute context stored in the user profile.

engine{‘auto’, ‘streaming’, ‘in-memory’, ‘gpu’}

Execute the engine that will execute the query. GPU mode is not yet supported, consider opening an issue. Setting the engine to GPU requires the compute cluster to have access to GPUs. If it does not, the query will fail.

plan_type: {“dot”, “plain”}

Which output format is preferred.

labels

Labels to add to the query (will be implicitly created)

shuffle_compression{‘auto’, ‘lz4’, ‘zstd’, ‘uncompressed’}

Compress files before shuffling them. Compression reduces disk and network IO, but disables memory mapping. Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “uncompressed” for memory mapped access at the expense of file size.

distributed

Run as as distributed query with these settings. This may run partially distributed, depending on the operation, optimizer statistics and available machines.

n_retries

How often failed tasks should be retried.

**optimizations

Optimizations to enable or disable in the query optimizer, e.g. projection_pushdown=False.

Raises:

grpc.RpcError: If the LazyFrame size is too large. See note below.