The Sphinx Library

Overview

The Sphinx Library (sphinxai) is a Python library that provides your notebook code with direct access to AI capabilities and secure resources. When Sphinx generates code in your notebooks, it can use this library to:

Call LLMs for text generation and analysis
Generate text embeddings for similarity search and clustering
Process images with vision-capable models
Retrieve connection credentials for databases like Snowflake and Databricks
Access user secrets stored securely in Sphinx

The Sphinx Library is automatically available when running code in Sphinx-managed notebooks. No installation required.

Why Use the Sphinx Library?

The library solves several key problems for data scientists:

Model Abstraction: Use size tiers (S, M, L) instead of specific model names, so your code stays independent of model versions
Simplified Authentication: Access LLMs and embeddings without managing API keys in your code
Batch Processing: Built-in concurrent processing with rate limiting for batch operations
Secure Credentials: Retrieve database credentials and secrets without hardcoding sensitive values
Provider Flexibility: Switch between providers (OpenAI, Anthropic, Google) or bring your own API keys

Model Size Tiers

Instead of specifying exact model names, the library uses abstract size tiers:

Tier	Chat Models	Embedding Models	Best For
S (Small)	Fast, cost-effective	Smaller dimensions	Simple tasks, high throughput
M (Medium)	Balanced performance	—	General-purpose tasks
L (Large)	Highest quality	Larger dimensions	Complex reasoning, nuanced analysis

This abstraction means your code continues working when models are upgraded—Sphinx handles the mapping to specific models.

Functions Reference

Chat Completion

`llm()`

Call an LLM with a text prompt.

response = await sphinxai.llm(
    prompt="Explain the concept of gradient descent",
    model_size="M",  # "S", "M", or "L"
    timeout=30.0     # seconds
)
print(response)

prompt

string

required

The text prompt to send to the LLM.

model_size

string

default:"S"

Model size tier: "S" (small/fast), "M" (medium), or "L" (large/capable).

timeout

float

default:"30.0"

Timeout in seconds for the request.

return

string

The LLM’s response text.

`batch_llm()`

Process multiple prompts concurrently with automatic rate limiting.

prompts = [
    "Summarize: Machine learning is...",
    "Summarize: Deep learning is...",
    "Summarize: Neural networks are..."
]

responses = await sphinxai.batch_llm(
    prompts=prompts,
    model_size="S",
    max_concurrent=5,  # parallel requests
    timeout=30.0
)

for prompt, response in zip(prompts, responses):
    print(f"Input: {prompt[:30]}...")
    print(f"Output: {response}\n")

prompts

List[str]

required

List of prompts to process.

model_size

string

default:"S"

Model size tier for all requests.

max_concurrent

int

default:"5"

Maximum number of concurrent requests (rate limiting).

timeout

float

default:"30.0"

Timeout in seconds for each individual request.

return

List[str]

List of responses in the same order as input prompts. Failed requests return error messages.

Vision

`batch_vision_llm()`

Process images with questions using vision-capable models.

import base64

# Load images as base64 strings
with open("chart1.png", "rb") as f:
    image1 = base64.b64encode(f.read()).decode()
with open("chart2.png", "rb") as f:
    image2 = base64.b64encode(f.read()).decode()

responses = await sphinxai.batch_vision_llm(
    images=[image1, image2],
    questions=[
        "What trends do you see in this chart?",
        "Describe the key insights from this visualization."
    ],
    model_size="L",
    mime_type="image/png",
    image_detail="auto"  # "low", "high", or "auto"
)

images

List[str]

required

List of base64-encoded image strings (without the data: URL prefix).

questions

List[str]

required

List of questions, one per image. Must match the length of images.

model_size

string

default:"S"

Model size tier.

max_concurrent

int

default:"5"

Maximum concurrent requests.

timeout

float

default:"30.0"

Timeout per request in seconds.

mime_type

string

default:"image/png"

MIME type of the images (e.g., "image/png", "image/jpeg").

image_detail

string

default:"auto"

Image detail level: "low", "high", or "auto".

Text Embeddings

`embed_text()`

Generate a vector embedding for a single text.

embedding = await sphinxai.embed_text(
    text="Machine learning is a subset of artificial intelligence",
    model_size="S",
    timeout=30.0
)

print(f"Embedding dimensions: {len(embedding)}")
# Use for similarity search, clustering, etc.

text

string

required

The text to embed.

model_size

string

default:"S"

Model size tier: "S" (small/fast) or "L" (large/high-quality). Note: "M" is not available for embeddings.

timeout

float

default:"30.0"

Timeout in seconds.

return

List[float]

The embedding vector as a list of floats.

`batch_embed_text()`

Generate embeddings for multiple texts concurrently.

texts = [
    "The quick brown fox",
    "A lazy dog sleeps",
    "Machine learning models"
]

embeddings = await sphinxai.batch_embed_text(
    texts=texts,
    model_size="S",
    max_concurrent=5
)

# Calculate similarity between first two texts
import numpy as np
similarity = np.dot(embeddings[0], embeddings[1])
print(f"Cosine similarity: {similarity:.4f}")

texts

List[str]

required

List of texts to embed.

model_size

string

default:"S"

Model size tier: "S" or "L".

max_concurrent

int

default:"5"

Maximum concurrent requests.

timeout

float

default:"30.0"

Timeout per request in seconds.

return

List[List[float]]

List of embedding vectors in the same order as input texts. Failed requests return empty lists.

Connection Credentials

`get_connection_credentials()`

Retrieve credentials for configured data integrations.

Snowflake
Databricks

creds = await sphinxai.get_connection_credentials("snowflake")

# Returns:
# {
#     "username": "your_username",
#     "account_identifier": "your_account",
#     "access_token": "..."
# }

import snowflake.connector

conn = snowflake.connector.connect(
    user=creds["username"],
    account=creds["account_identifier"],
    token=creds["access_token"],
    authenticator="oauth"
)

creds = await sphinxai.get_connection_credentials("databricks")

# Returns:
# {
#     "workspace_url": "https://your-workspace.cloud.databricks.com",
#     "access_token": "..."
# }

from databricks import sql

connection = sql.connect(
    server_hostname=creds["workspace_url"].replace("https://", ""),
    http_path="/sql/1.0/warehouses/...",
    access_token=creds["access_token"]
)

integration_name

string

required

The name of the integration: "snowflake" or "databricks".

timeout

float

default:"5.0"

Timeout in seconds.

Connection credentials are configured in the Sphinx Dashboard under Integrations. See Integrations for setup instructions.

Secrets

`get_user_secret_value()`

Retrieve a secret value from the Sphinx secrets store.

api_key = await sphinxai.get_user_secret_value("MY_API_KEY")

# Use the secret
client = SomeExternalAPI(api_key=api_key)

secret_name

string

required

The name of the secret to retrieve.

timeout

float

default:"5.0"

Timeout in seconds.

return

string

The secret value as a string.

Secrets are configured in the Sphinx Dashboard under Secrets. See Secrets for setup instructions.

Configuration

These functions allow you to bring your own API keys or switch providers dynamically.

`set_llm_config()`

Configure the LLM provider programmatically.

import sphinxai

# Use OpenAI directly with your own key
sphinxai.set_llm_config(
    provider="openai",
    api_key="sk-...",
    models={
        "S": "gpt-4.1-nano",
        "M": "gpt-4.1-mini",
        "L": "gpt-4.1"
    }
)

# Or use a custom OpenAI-compatible endpoint
sphinxai.set_llm_config(
    provider="openai",
    api_key="your-key",
    base_url="https://your-api.com/v1"
)

provider

string

required

Provider name: "sphinx", "openai", "anthropic", or "google".

api_key

string

required

API key for the provider.

base_url

string

Optional custom base URL for the provider’s API.

models

Dict[str, str]

Optional mapping of size tiers to model names. Unspecified sizes use provider defaults.

`set_embedding_config()`

Configure the embedding provider programmatically.

import sphinxai

# Use OpenAI for embeddings
sphinxai.set_embedding_config(
    provider="openai",
    api_key="sk-...",
    models={
        "S": "text-embedding-3-small",
        "L": "text-embedding-3-large"
    }
)

provider

string

required

Provider name: "sphinx", "openai", or "google". Note: Anthropic does not support embeddings.

api_key

string

required

API key for the provider.

base_url

string

Optional custom base URL.

models

Dict[str, str]

Optional mapping of size tiers to model names.

`get_llm_config()` / `get_embedding_config()`

Inspect the current configuration.

import sphinxai

llm_config = sphinxai.get_llm_config()
print(llm_config)
# {
#     "provider": "openai",
#     "base_url": "https://api.openai.com/v1",
#     "models": {"S": "gpt-4.1-nano", "M": "gpt-4.1-mini", "L": "gpt-4.1"},
#     "has_api_key": True,
#     "config_source": "programmatic"  # or "environment"
# }

embedding_config = sphinxai.get_embedding_config()

`reset_config()` / `reset_llm_config()` / `reset_embedding_config()`

Reset configuration to environment variable defaults.

import sphinxai

# Reset both LLM and embedding config
sphinxai.reset_config()

# Or reset individually
sphinxai.reset_llm_config()
sphinxai.reset_embedding_config()

Supported Providers

Provider	Chat	Embeddings	Default Models
sphinx (default)	✓	✓	GPT-4.1 family, text-embedding-3
openai	✓	✓	GPT-4.1 family, text-embedding-3
anthropic	✓	✗	Claude Haiku/Sonnet
google	✓	✓	Gemini 2.5 family

When using Anthropic, you must use a different provider for embeddings since Anthropic doesn’t offer embedding models.

Examples

Sentiment Analysis with Batch Processing

import sphinxai

reviews = [
    "This product exceeded my expectations!",
    "Terrible quality, broke after one day.",
    "It's okay, nothing special.",
    "Best purchase I've ever made!"
]

prompts = [
    f"Classify the sentiment as positive, negative, or neutral: '{review}'"
    for review in reviews
]

sentiments = await sphinxai.batch_llm(prompts, model_size="S")

for review, sentiment in zip(reviews, sentiments):
    print(f"Review: {review[:40]}...")
    print(f"Sentiment: {sentiment}\n")

Semantic Search with Embeddings

import sphinxai
import numpy as np

# Your document corpus
documents = [
    "Python is a programming language",
    "Machine learning uses algorithms to learn from data",
    "Neural networks are inspired by the brain",
    "Data science combines statistics and programming"
]

# Generate embeddings for all documents
doc_embeddings = await sphinxai.batch_embed_text(documents, model_size="S")

# Search query
query = "How do computers learn?"
query_embedding = await sphinxai.embed_text(query, model_size="S")

# Find most similar document
similarities = [np.dot(query_embedding, doc_emb) for doc_emb in doc_embeddings]
best_match_idx = np.argmax(similarities)

print(f"Query: {query}")
print(f"Best match: {documents[best_match_idx]}")
print(f"Similarity: {similarities[best_match_idx]:.4f}")

Mixed Provider Configuration

import sphinxai

# Use OpenAI for chat, Google for embeddings
sphinxai.set_llm_config(provider="openai", api_key="sk-...")
sphinxai.set_embedding_config(provider="google", api_key="your-google-key")

# Now llm() uses OpenAI, embed_text() uses Google
response = await sphinxai.llm("Explain transformers", model_size="L")
embedding = await sphinxai.embed_text("Transformers are...", model_size="S")

Home

Extension

CLI

Agent

Configuration

Overview

Why Use the Sphinx Library?

Model Size Tiers

Functions Reference

Chat Completion

`llm()`

`batch_llm()`

Vision

`batch_vision_llm()`

Text Embeddings

`embed_text()`

`batch_embed_text()`

Connection Credentials

`get_connection_credentials()`

Secrets

`get_user_secret_value()`

Configuration

`set_llm_config()`

`set_embedding_config()`

`get_llm_config()` / `get_embedding_config()`

`reset_config()` / `reset_llm_config()` / `reset_embedding_config()`

Supported Providers

Examples

Sentiment Analysis with Batch Processing

Semantic Search with Embeddings

Mixed Provider Configuration

Home

Extension

CLI

Agent

Configuration

​Overview

​Why Use the Sphinx Library?

​Model Size Tiers

​Functions Reference

​Chat Completion

​llm()

​batch_llm()

​Vision

​batch_vision_llm()

​Text Embeddings

​embed_text()

​batch_embed_text()

​Connection Credentials

​get_connection_credentials()

​Secrets

​get_user_secret_value()

​Configuration

​set_llm_config()

​set_embedding_config()

​get_llm_config() / get_embedding_config()

​reset_config() / reset_llm_config() / reset_embedding_config()

​Supported Providers

​Examples

​Sentiment Analysis with Batch Processing

​Semantic Search with Embeddings

​Mixed Provider Configuration

Overview

Why Use the Sphinx Library?

Model Size Tiers

Functions Reference

Chat Completion

`llm()`

`batch_llm()`

Vision

`batch_vision_llm()`

Text Embeddings

`embed_text()`

`batch_embed_text()`

Connection Credentials

`get_connection_credentials()`

Secrets

`get_user_secret_value()`

Configuration

`set_llm_config()`

`set_embedding_config()`

`get_llm_config()` / `get_embedding_config()`

`reset_config()` / `reset_llm_config()` / `reset_embedding_config()`

Supported Providers

Examples

Sentiment Analysis with Batch Processing

Semantic Search with Embeddings

Mixed Provider Configuration