Data Streaming
Access and train on sensitive data without downloading it. Data streaming ensures governance while providing full utility.
The problem with data transfers
Traditionally, AI teams need to download data to use it. This creates immediate problems:
- — Data leakage risk — Once downloaded, controls are lost
- — Regulatory issues — Data crossing borders triggers compliance problems
- — Governance gaps — No way to enforce policies after transfer
- — Legal exposure — Clear chain of liability to the AI team
Xase enables data use without downloads through governed streaming.
How data streaming works
1. Create Access Session
Start with policy-approved access:
import xase
client = xase.Client(api_key="sk_...")
# Get governed access to data
session = client.access(
dataset="patient-records-2025",
purpose="model-training",
duration="30d"
)2. Stream Data
Stream batches for training with full governance:
# Stream batches for training
for batch in session.stream(batch_size=32):
# Train on data without downloading
model.train_on_batch(batch)
# All usage is automatically tracked
# All policy constraints are enforced
# All evidence is automatically generated3. Access Specific Records
Access specific entries when needed:
# Get specific patient record
patient = session.get("patient_45678")
# Apply transformation with tracking
processed = session.transform(
data=patient,
function=anonymize_fields,
metadata={"purpose": "privacy protection"}
)4. Filtering and Queries
Apply filters without downloading all data:
# Stream with filters
filtered_data = session.stream(
filter={
"age": {"$gte": 18},
"diagnosis": {"$in": ["diabetes", "hypertension"]}
},
batch_size=32
)
for batch in filtered_data:
model.train_on_batch(batch)Key features
No Data Downloads
Data never leaves the governed environment. All processing happens through the streaming interface.
Runtime Policy Enforcement
Policies are continuously enforced during streaming. Revoked access stops streams immediately.
Automatic Tracking
Every record access and operation is logged with identity, timestamp, and purpose.
Server-Side Filtering
Apply filters server-side to reduce bandwidth and process only relevant data.
Advanced usage
Streaming Aggregations
Compute aggregations without downloading raw data:
# Get aggregated statistics
stats = session.aggregate(
pipeline=[
{"$match": {"age": {"$gte": 30}}},
{"$group": {
"_id": "$diagnosis",
"count": {"$sum": 1},
"avg_age": {"$avg": "$age"}
}}
]
)
for result in stats:
print(f"Diagnosis: {result['_id']}")
print(f"Count: {result['count']}")
print(f"Avg age: {result['avg_age']}")Streaming with Transforms
Apply transformations during streaming:
# Stream with transformation
for batch in session.stream(
batch_size=32,
transform=lambda data: normalize(data)
):
model.train_on_batch(batch)