polars_cloud.spawn_blocking#

polars_cloud.spawn_blocking(
lf: LazyFrame,
*,
dst: Path | str | Dst,
context: ClientContext | None = None,
partitioned_by: None | str | list[str] = None,
broadcast_over: None | list[list[list[Path]]] = None,
engine: Engine = 'auto',
plan_type: PlanTypePreference = 'dot',
labels: None | list[str] = None,
shuffle_compression: ShuffleCompression = 'auto',
distributed: DistributionSettings | None | bool = None,
n_retries: int = 0,
sink_to_single_file: bool | None = None,
optimizations: QueryOptFlags = <polars.lazyframe.opt_flags.QueryOptFlags object>,
) QueryResult#

Spawn a remote query and block the thread until the result is ready.

Parameters:
lf

The Polars LazyFrame which should be executed on the compute cluster.

dst

Destination to which the output should be written. If an URI is passed, it must be an accessible object store location. If set to "local", the query is executed locally.

context

The context describing the compute cluster that should execute the query. If set to None (default), attempts to load a valid compute context from the following locations in order:

  1. The compute context cache. This contains the last Compute context created.

  2. The default compute context stored in the user profile.

partitioned_by

Partition query by a key

broadcast_over

Run this queries in parallel over the given source paths.

engine{‘auto’, ‘streaming’, ‘in-memory’, ‘gpu’}

Execute the engine that will execute the query. GPU mode is not yet supported, consider opening an issue. Setting the engine to GPU requires the compute cluster to have access to GPUs. If it does not, the query will fail.

plan_type: {“dot”, “plain”}

Which output format is preferred.

labels

Labels to add to the query (will be implicitly created)

shuffle_compression{‘auto’, ‘lz4’, ‘zstd’, ‘uncompressed’}

Compress files before shuffling them. Compression reduces disk and network IO, but disables memory mapping. Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “uncompressed” for memory mapped access at the expense of file size.

distributed

Run as as distributed query with these settings. This may run partially distributed, depending on the operation, optimizer statistics and available machines.

n_retries

How often failed tasks should be retried.

sink_to_single_file

Perform the sink into a single file.

Setting this to True can reduce the amount of work that can be done in a distributed manner and therefore be more memory intensive and slower.

optimizations

The optimization passes done during query optimization.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Raises:
grpc.RpcError

If the LazyFrame size is too large. See note below.

See also

spawn

Spawn a remote query and await it asynchronously.