Appearance
Using Data Pools in Managed Services
In this hands-on tutorial, you'll build a complete text analysis service that processes documents using our Data Pools feature. You'll learn how to upload datasets, create a service that reads from Data Pools, test it locally, deploy it, and consume it via the SDK.
What You'll Build
By the end of this tutorial, you'll have created:
- A text analysis service that counts words in documents stored in Data Pools
- A working local development environment
- A deployed service on the platform
- A Python client that consumes your service
The full code of this tutorial is available in the examples repository.
Prerequisites
- Node.js 18+ installed on your system
- Python 3.9+ installed
- A platform account with a personal access token
Note: Replace <your-token>, <your-consumer-key>, <your-consumer-secret> and other placeholder values with your actual credentials throughout this tutorial.
Step 1: Set Up Your Development Environment
1.1 Install and Configure the CLI
First, let's install the CLI and verify it's working. Please also run this, if you already have the CLI installed, to ensure you have the latest version:
bash
# Install the current CLI
npm install -g @planqk/planqk-cli
# Verify installation
planqk --versionYou should see a version number. If you get an error, ensure Node.js 18+ is installed.
1.2 Install uv Package Manager
We'll use uv, a fast Python package manager, for managing our Python dependencies:
bash
# Install uv (if not already installed)
# On macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows:
# powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# Verify uv installation
uv --version1.3 Authenticate with the platform
Get your personal access token from the platform (Profile → Access Tokens) and authenticate:
bash
planqk login -t <your-personal-access-token>You should see a success message confirming you're logged in.
1.4 Create Your Service Project
Let's create a new service project for our text analyzer:
bash
planqk init --name text-analyzer
cd text-analyzerThis creates a project structure with:
src/program.py- Your main service logicinput/- Local test data directoryplanqk.json- Service configuration- Other configuration files
1.5 Set Up Python Environment
Now initialize a Python environment within our service project:
bash
# Initialize a Python project with uv in the current directory
uv sync -U
# Activate the environment (optional, uv will handle this automatically)
source .venv/bin/activate # On Windows: .venv\Scripts\activate.[ps1|bat]Step 2: Prepare Sample Data
2.1 Create Sample Text Files
Let's create some sample documents to analyze. We'll create them in two locations - one set for uploading to the Data Pool and another set for local testing:
bash
# Create directories for both upload and local testing
mkdir -p input/documentsCreate sample documents for uploading to Data Pool in input/documents/:
Create input/documents/document1.txt:
bash
cat > input/documents/document1.txt << 'EOF'
Quantum computing is a revolutionary technology that harnesses the principles of quantum mechanics.
It promises to solve complex problems that are intractable for classical computers.
Quantum algorithms like Shor's algorithm and Grover's algorithm demonstrate significant speedups.
EOFCreate input/documents/document2.txt:
bash
cat > input/documents/document2.txt << 'EOF'
Machine learning and artificial intelligence are transforming industries worldwide.
Deep learning models can process vast amounts of data to identify patterns.
Natural language processing enables computers to understand human language.
EOFCreate input/documents/summary.json with metadata:
bash
cat > input/documents/summary.json << 'EOF'
{
"collection": "Sample Documents",
"total_files": 2,
"description": "Demo text files for analysis",
"created": "2025-08-04"
}
EOF2.2 Upload Data to a Data Pool
Now upload the files from input/documents/ to a Data Pool:
bash
planqk datapool upload -f ./input/documents/document1.txt -f ./input/documents/document2.txt -f ./input/documents/summary.jsonThe CLI will prompt you to create a new Data Pool. Choose "Yes" and give it a name like text-analysis-demo. Save the Data Pool ID that's returned - you'll need it later.
Step 3: Implement the Text Analysis Service
The full code of the text analysis service is available in the examples repository.
3.1 Update the Service Logic
Replace the contents of src/program.py with our text analyzer:
python
from planqk.commons.datapool import DataPool
from pydantic import BaseModel
import json
from typing import Dict, List
class AnalysisRequest(BaseModel):
files_to_analyze: List[str]
min_word_length: int = 3
class AnalysisResult(BaseModel):
total_files: int
word_counts: Dict[str, int]
total_words: int
summary: str
def run(data: AnalysisRequest, documents: DataPool) -> AnalysisResult:
"""Analyze text files from a Data Pool and return word statistics."""
word_counts = {}
files_processed = 0
for filename in data.files_to_analyze:
try:
# Read the text file from Data Pool
with documents.open(filename, 'r') as f:
content = f.read()
# Simple word counting
words = content.lower().split()
for word in words:
# Clean word and filter by length
clean_word = ''.join(char for char in word if char.isalnum())
if len(clean_word) >= data.min_word_length:
word_counts[clean_word] = word_counts.get(clean_word, 0) + 1
files_processed += 1
except FileNotFoundError:
print(f"Warning: File {filename} not found in Data Pool")
continue
total_words = sum(word_counts.values())
# Find most common words
top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:5]
summary = f"Analyzed {files_processed} files. Top words: {dict(top_words)}"
return AnalysisResult(
total_files=files_processed,
word_counts=word_counts,
total_words=total_words,
summary=summary
)3.2 Make Your Initial Commit to Track Your Changes [Optional]
To track your changes, initialize a Git repository and commit your code:
bash
git init
git add .
git commit -m "Initial commit: Implement text analysis service"Step 4: Test Locally
4.1 Set Up Local Test Environment
Create test input in input/data.json:
bash
cat > input/data.json << 'EOF'
{
"files_to_analyze": ["document1.txt", "document2.txt"],
"min_word_length": 4
}
EOF4.2 Update Local Test Runner
Replace src/__main__.py to test with our Data Pool:
python
import json
import os
from planqk.commons.constants import OUTPUT_DIRECTORY_ENV
from planqk.commons.datapool import DataPool
from planqk.commons.json import any_to_json
from planqk.commons.logging import init_logging
from .program import AnalysisRequest, run
init_logging()
# Set up output directory for local testing
directory = "./out"
os.makedirs(directory, exist_ok=True)
os.environ[OUTPUT_DIRECTORY_ENV] = directory
# Load test data
with open("./input/data.json") as file:
data = AnalysisRequest.model_validate(json.load(file))
# Simulate DataPool injection using local directory
result = run(data, documents=DataPool("./input/documents"))
print("Analysis Results:")
print(any_to_json(result))4.3 Run Local Test
Test your service locally:
bash
python -m srcYou should see output showing the word analysis results from your sample documents.
Step 5: Deploy Your Service
5.1 Generate OpenAPI Specification
bash
planqk openapi5.2 Deploy Your Service to the Platform
You have two options for deployment: using the CLI or the web UI.
5.2.1 Deploy via CLI
To deploy your service using the CLI, run:
bash
planqk up5.2.2 Deploy via Web UI
Alternatively, you can deploy via the platform web interface. Therefore, you need to compress your service files into a ZIP archive:
bash
planqk compress- Go to the platform web interface and navigate to services: https://dashboard.hub.kipu-quantum.com/services
- Click on
Create Service - Select your ZIP file at
Source>File - Configure the service:
- Set service name: "Text Analyzer with Data Pools"
- Add a Data Pool parameter named
documents
- Publish the service
Save your service ID - you'll need it for the next steps.
Step 6: Test Your Deployed Service
6.1 Create a Request Body
Create a file called service-request.json with the Data Pool reference:
bash
cat > service-request.json << 'EOF'
{
"data": {
"files_to_analyze": ["document1.txt", "document2.txt"],
"min_word_length": 3
},
"documents": {
"id": "<your-datapool-id>",
"ref": "DATAPOOL"
}
}
EOFReplace <your-datapool-id> with the Data Pool ID from Step 2.2.
6.2. Test the Execution Using the UI
Currently, the Jobs execution using Data Pools as input is not available. Therefore, you need to publish your service first and invoke it via an Application. Follow these steps:
- Go to the services page in the platform app: https://dashboard.hub.kipu-quantum.com/services and navigate to your service.
- Click on
Publish ServiceandPublish internally. - Go to the Applications page: https://dashboard.hub.kipu-quantum.com/applications and create a new Application (or reuse an existing one).
- Navigate to the Application you want to use.
- Click on
Subscribe Internallyand select your new service. - After subscribing, you can test your service by clicking on
Try it out. - Open the
POSTelement in the OpenAPI specification. - Click again on
Try it outand paste the content ofservice-request.jsoninto the request body. - Click the
Executebutton under the body to run the service. - Navigate to the Application again and click on the subscription of your service on
Activity Logs. - Select the latest execution and click on
Show Logs. - You should see the execution logs, including the analysis results similar to the local execution.
Step 7: Build a Python Client
The full code of the client is available in the examples repository.
7.1 Set Up Client Environment
Create a separate directory for your client:
bash
cd ..
mkdir text-analyzer-client
cd text-analyzer-client
# Set up Python environment
uv init && uv sync -U
source .venv/bin/activate # On Windows: .venv\Scripts\activate.ps1
uv add planqk-service-sdk python-dotenv7.2 Configure Client Credentials
Create a .env file with your application credentials (get these from your application's settings page):
You can get the CONSUMER_KEY, CONSUMER_SECRET, and DATAPOOL_ID from the application you created in the previous steps. The SERVICE_ENDPOINT can be found and copied from the subscription of your service inside the application details.
bash
cat > .env << 'EOF'
SERVICE_ENDPOINT=<your-service-endpoint>
CONSUMER_KEY=<your-consumer-key>
CONSUMER_SECRET=<your-consumer-secret>
DATAPOOL_ID=<your-datapool-id>
EOF7.3 Create the Client Script
Create analyze_client.py:
python
import os
from dotenv import load_dotenv
from planqk.service.client import PlanqkServiceClient
from planqk.service.datapool import DataPoolReference
# Load environment variables
load_dotenv()
# Initialize the client
client = PlanqkServiceClient(
service_endpoint=os.getenv("SERVICE_ENDPOINT"),
consumer_key=os.getenv("CONSUMER_KEY"),
consumer_secret=os.getenv("CONSUMER_SECRET")
)
def analyze_documents(files_to_analyze, min_word_length=3):
"""Run text analysis on documents in the Data Pool."""
# Create Data Pool reference
documents_ref = DataPoolReference(id=os.getenv("DATAPOOL_ID"))
# Prepare request
request_body = {
"data": {
"files_to_analyze": files_to_analyze,
"min_word_length": min_word_length
},
"documents": documents_ref
}
print("Starting analysis...")
# Execute the service
execution = client.run(request=request_body)
print(f"Execution started with ID: {execution.id}")
print("Waiting for completion...")
# Wait for completion
execution.wait_for_final_state(timeout=300)
if execution.status == "SUCCEEDED":
result = execution.result()
print("\n=== Analysis Results ===")
print(f"Status: {execution.status}")
print(f"Files processed: {result.total_files}")
print(f"Total words found: {result.total_words}")
print(f"Summary: {result.summary}")
# Show top 10 most common words
word_counts = result.word_counts
top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:10]
print("\nTop 10 most common words:")
for word, count in top_words:
print(f" {word}: {count}")
else:
print(f"Execution failed with status: {execution.status}")
logs = execution.logs()
print("Error logs:")
for log in logs[-5:]: # Show last 5 log entries
print(f" {log}")
if __name__ == "__main__":
# Analyze our sample documents
analyze_documents(
files_to_analyze=["document1.txt", "document2.txt"],
min_word_length=4
)7.4 Run the Client
bash
python analyze_client.pyYou should see the text analysis results from your deployed service!
Step 8: Advanced Usage
8.1 Add More Documents
Upload additional documents to your Data Pool:
bash
cd ../text-analyzer-service
# Create a new document
cat > input/documents/document3.txt << 'EOF'
Cloud computing provides scalable infrastructure for modern applications.
Microservices architecture enables independent deployment and scaling.
Container orchestration platforms manage distributed systems efficiently.
EOF
# Upload to existing Data Pool
planqk datapool upload -f ./input/documents/document3.txt --datapool-id <your-datapool-id>8.2 Analyze New Documents
Update your client to analyze the new document:
python
# In analyze_client.py, change the files list:
analyze_documents(
files_to_analyze=["document1.txt", "document2.txt", "document3.txt"],
min_word_length=5
)8.3 Monitor Execution Progress
Add progress monitoring to your client:
python
def analyze_with_monitoring(files_to_analyze, min_word_length=3):
"""Run analysis with real-time status monitoring."""
documents_ref = DataPoolReference(id=os.getenv("DATAPOOL_ID"))
request_body = {
"data": {
"files_to_analyze": files_to_analyze,
"min_word_length": min_word_length
},
"documents": documents_ref
}
execution = client.run(request=request_body)
print(f"Started execution: {execution.id}")
# Monitor progress
while not execution.has_finished:
print(f"Status: {execution.status}")
import time
time.sleep(2) # Check every 2 seconds
print(f"Final status: {execution.status}")
if execution.status == "SUCCEEDED":
return execution.result()
else:
print("Execution failed")
return NoneThen update the main block to use this function:
python
if __name__ == "__main__":
# In analyze_client.py, change the files list:
resutl = analyze_with_monitoring(
files_to_analyze=["document1.txt", "document2.txt", "document3.txt"],
min_word_length=5
)
print(result) if result else print("No results returned.")And run it again:
bash
python analyze_client.pyYou should see real-time status updates as your service processes the documents.
What You've Accomplished
🎉 Congratulations! You've successfully:
- ✅ Set up the CLI and authenticated
- ✅ Created sample data and uploaded it to a Data Pool
- ✅ Built a text analysis service that reads from Data Pools
- ✅ Tested your service locally with simulated Data Pools
- ✅ Deployed your service to the platform
- ✅ Created a Python client that consumes your service
- ✅ Learned how to monitor executions and handle results
Key Concepts Learned
- Data Pools: Managed file collections that can be mounted into services
- Local Testing: Simulating Data Pools with local directories
- Service Parameters: How Data Pool parameters are injected into your service
- SDK Integration: Using DataPoolReference to pass Data Pool IDs to services
- Error Handling: Managing file not found errors and execution failures
Next Steps
- Try uploading larger datasets (remember the 500 MB per file limit)
- Experiment with different analysis algorithms
- Build services that write results back to output Data Pools
- Explore the workflow orchestration features for multi-step data processing
References
[CLI] CLI Reference | Docs
[DataPool] Using Data Pools in Services | Docs

