I have been tasked recently to connect a Python Flask application to a Databricks SQL Warehouse using SQLAlchemy. As a good practice, I wrote a simple Threading wrapper around the query execution to time out the query if it takes too long to avoid wasting resources.
Imagining that setting the timeout like this would be enough, I noticed that queries were left hanging in the cluster and not being killed.
The thing is that Databricks SQL Connector does not leave clear how to set a timeout! Even on their documentation here it says:
You can set this parameter at the session level using the SET statement and at the global level using Configure SQL parameters or SQL Warehouses API.
Not helpful at all.
After some hours of struggling, I found out that the setting timeout
should be set in a slight different way:
from sqlalchemy import create_engine
# Timeout in seconds
engine = create_engine(
url = "<your-url>",
# Timeout in seconds, in this case 1 second
connect_args={"session_configuration": {"statement_timeout": 1}}
)
Be aware that although you can check which parameters are set using:
from sqlalchemy import create_engine
import pandas as pd
# Checking the parameters available to set in the connection
with engine.connect() as conn:
df = pd.read_sql("SET -v", conn)
print(df)
You won’t be able to see the value of the statement_timeout
parameter that you set before.
Fortunately, on my case it worked fine and connections on the cluster were terminated as expected.
Hope this can be useful for you!