LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.
Key metrics and engagement data
Repository has been active for N/A
Looks like this repository is a hidden gem!
No stargazers yet. Why not be the first to give it a star?
Check back soon, we will update it in background!
⭐0
Want deeper insights? Explore GitObs.com
The mission of Sail is to unify stream processing, batch processing, and compute-intensive (AI) workloads. Currently, Sail features a drop-in replacement for Spark SQL and the Spark DataFrame API in both single-host and distributed settings.
✨Please check out our MCP server that brings data analytics in Spark to both LLM agents and humans!✨
Sail is available as a Python package on PyPI. You can install it along with PySpark in your Python environment.
bash1pip install pysail2pip install "pyspark[connect]"
Alternatively, you can install the lightweight client package pyspark-client
since Spark 4.0.
The pyspark-connect
package, which is equivalent to pyspark[connect]
, is also available since Spark 4.0.
The Installation guide contains more information about installing Sail from source for better performance for your hardware architecture.
Option 1: Command Line Interface You can start the local Sail server using the sail
command.
bash1sail spark server --port 50051
Option 2: Python API You can start the local Sail server using the Python API.
python1from pysail.spark import SparkConnectServer23server = SparkConnectServer(port=50051)4server.start(background=False)
Option 3: Kubernetes You can deploy Sail on Kubernetes and run Sail in cluster mode for distributed processing. Please refer to the Kubernetes Deployment Guide for instructions on building the Docker image and writing the Kubernetes manifest YAML file.
bash1kubectl apply -f sail.yaml2kubectl -n sail port-forward service/sail-spark-server 50051:50051
Once you have a running Sail server, you can connect to it in PySpark. No changes are needed in your PySpark code!
python1from pyspark.sql import SparkSession23spark = SparkSession.builder.remote("sc://localhost:50051").getOrCreate()4spark.sql("SELECT 1 + 1").show()
Please refer to the Getting Started guide for further details.
The documentation of the latest Sail version can be found here.
Contributions are more than welcome!
Please submit GitHub issues for bug reports and feature requests. You are also welcome to ask questions in GitHub discussions.
Feel free to create a pull request if you would like to make a code change. You can refer to the development guide to get started.
LakeSail offers flexible enterprise support options for Sail. Please contact us to learn more.