polars_cloud.spawn_many#

polars_cloud.spawn_many(
lf: list[LazyFrame],
*,
dst: Path | str | Dst,
context: ComputeContext | None = None,
distributed: None | bool = None,
engine: Engine = 'auto',
plan_type: PlanTypePreference = 'dot',
labels: None | list[str] = None,
shuffle_compression: bool = False,
n_retries: int = 0,
**optimizations: bool,
) list[BatchQuery] | list[InteractiveQuery]#

Spawn a multiple remote queries and await it asynchronously.

Parameters:
lf

A list of Polars LazyFrame’s which should be executed on the compute cluster.

dst

Destination to which the output should be written. If an URI is passed, it must be an accessible object store location. If set to "local", the query is executed locally.

context

The context describing the compute cluster that should execute the query. If set to None (default), attempts to load a valid compute context from the following locations in order:

  1. The compute context cache. This contains the last Compute context created.

  2. The default compute context stored in the user profile.

distributed

Run as as distributed query. This may run partially distributed, depending on the operation, optimizer statistics and available machines.

engine{‘auto’, ‘streaming’, ‘in-memory’, ‘gpu’}

Execute the engine that will execute the query. GPU mode is not yet supported, consider opening an issue. Setting the engine to GPU requires the compute cluster to have access to GPUs. If it does not, the query will fail.

plan_type: {“dot”, “plain”}

Which output format is preferred.

labels

Labels to add to the query (will be implicitly created)

shuffle_compression

Compress files before shuffling them. This reduces disk and network IO, but disable memory mapping.

n_retries

How often failed tasks should be retried.

**optimizations

Optimizations to enable or disable in the query optimizer, e.g. projection_pushdown=False.

Raises:
grpc.RpcError

If the LazyFrame size is too large. See note below.