Action Table - Document

Summarize news with JamAI Base

1. Introduction

This tutorial will guide you through using the JamAI Base Python SDK to create a simple news summarization system.

What We'll Build

Get a news and let AI tells you the main message!

In this tutorial, we'll create a news summarization system that:

  1. Takes a news document as input. The supported formats include: .csv, .tsv, .txt, .md, .doc, .docx, .pdf, .ppt, .pptx, .xls, .xlsx, .xml, .html, .json, .jsonl.

  2. Upload the document to JamAI Base action table.

  3. Obtain the key information like:

    • Summary

    • Tag

Prerequisites

Before starting, you'll need:

  • Python 3.11 or higher installed

  • Project ID and Personal Access Token (PAT)

2. Installation and Setup

Installing the Python SDK

pip install jamaibase

Basic Configuration

Get your Personal Access Token (PAT) here:

How to generate PAT

Get your Project ID here:

How to get Project ID
from jamaibase import JamAI, types as t

PROJECT_ID = "your_project_id"
PAT = "your_PAT"

client = JamAI(
    project_id=PROJECT_ID,
    token=PAT
)

You can use a .env file to manage your PROJECT_ID and PAT.

3. Creating Your Action Table

For simplicity, you can set up your action table in the JamAI Base platform:

  1. Navigate to your JamAI Base action table tab

  2. Create a new action table named "news_summarization"

  1. Create the following columns:

Name
Column Type
Data Type

document

Input

Document

summary

LLM Output

Text

tag

LLM Output

Text

  1. Update the Prompt as LLM Output columns as follows:

Column name
Prompt

summary

Table name: "news_summarization"

document: ${document}

Summarize the article in not more than three sentences and not more than 50 words.

Provide the summary in the number form.

Be factual and do not hallucinate. Remember to act as a cell in a spreadsheet and provide concise, relevant information without explanations unless specifically requested.

tag

Table name: "news_summarization"

document: ${document}

Provide at most three tags that well represent the document. Each tag should not have more than three words.

Provide the tag in the number form.

Be factual and do not hallucinate. Remember to act as a cell in a spreadsheet and provide concise, relevant information without explanations unless specifically requested.

4. Basic Implementation

4.1 Simple Document Processor

def process_single_news(document_path):
    # Upload document file
    file_response = client.file.upload_file(document_path)

    # Process in action table
    response = client.table.add_table_rows(
        table_type=t.TableType.ACTION,
        request=t.MultiRowAddRequest(
            table_id="news_summarization",
            data=[{"document": file_response.uri}],
            stream=False,
        ),
    )

    # Extract results
    return {
        "summary": response.rows[0].columns["summary"].text,
        "tag": response.rows[0].columns["tag"].text
    }

4.2 Complete Implementation with Error Handling

class DocumentProcessor:
    def __init__(self, project_id: str, pat: str):
        self.client = JamAI(
            project_id=project_id,
            token=pat
        )

    def validate_document(self, document_path: str) -> bool:
        """Validate if file exists and has correct extension"""
        if not os.path.exists(document_path):
            raise FileNotFoundError(f"Document not found: {document_path}")

        valid_extensions = [
            '.csv', '.tsv', '.txt', '.md', '.doc', '.docx', '.pdf', 
            '.ppt', '.pptx', '.xls', '.xlsx', '.xml', '.html', 
            '.json', '.jsonl'
        ]
        
        file_ext = os.path.splitext(document_path)[1].lower()
        if file_ext not in valid_extensions:
            raise ValueError(f"Unsupported file format. Use: {valid_extensions}")

        return True

    def process_document(self, document_path: str) -> Optional[Dict[str, str]]:
        """Process a single document file"""
        try:
            # Validate document
            self.validate_document(document_path)

            # Upload file
            print("Uploading document...")
            file_response = self.client.file.upload_file(document_path)
            print(f"Upload successful: {file_response.uri}")

            # Process in action table
            print("Processing document...")
            response = self.client.table.add_table_rows(
                table_type=t.TableType.ACTION,
                request=t.MultiRowAddRequest(
                    table_id="news_summarization",
                    data=[{"document": file_response.uri}],
                    stream=False,
                ),
            )

            # Extract and return results
            results = {
                "summary": response.rows[0].columns["summary"].text,
                "tag": response.rows[0].columns["tag"].text
            }
            print("Processing complete!")
            return results

        except Exception as e:
            print(f"Error processing document: {str(e)}")
            return None

5. Usage Examples

5.1 Basic Usage

# Initialize processor
processor = DocumentProcessor(PROJECT_ID, PAT)

# Process single receipt
result = processor.process_document("path/to/news.txt")

if result:
    print(f"Summary: {result['summary']}")
    print(f"Tag: {result['tag']}")

5.2 Batch Processing

def process_document_batch(document_folder: str):
    processor = DocumentProcessor(PROJECT_ID, PAT)
    results = []

    for filename in os.listdir(document_folder):
        # Check if file has a supported document extension
        valid_extensions = [
            '.csv', '.tsv', '.txt', '.md', '.doc', '.docx', '.pdf', 
            '.ppt', '.pptx', '.xls', '.xlsx', '.xml', '.html', 
            '.json', '.jsonl'
        ]
        
        if any(filename.lower().endswith(ext) for ext in valid_extensions):
            document_path = os.path.join(document_folder, filename)
            result = processor.process_document(document_path)
            if result:
                results.append({
                    "filename": filename,
                    **result
                })

    return results

# Usage
results = process_document_batch("path/to/documents/folder")
for result in results:
    print(f"File: {result['filename']}")
    print(f"Summary: {result['summary']}")
    print(f"Tag: {result['tag']}")
    print("---")

6. Best Practices

  1. Error Handling

    • Always validate input files

    • Handle network errors gracefully

  2. Performance

    • Reuse the client instance

    • Consider batch processing for multiple files

    • Implement rate limiting for large batches

  3. Security

    • Use environment variables for credentials

Complete Standalone Example

import os
import argparse
from jamaibase import JamAI, types as t
from typing import Dict, Optional

class DocumentProcessor:
    def __init__(self, project_id: str, pat: str):
        self.client = JamAI(
            project_id=project_id,
            token=pat
        )

    def validate_document(self, document_path: str) -> bool:
        if not os.path.exists(document_path):
            raise FileNotFoundError(f"Document not found: {document_path}")

        valid_extensions = [
            '.csv', '.tsv', '.txt', '.md', '.doc', '.docx', '.pdf', 
            '.ppt', '.pptx', '.xls', '.xlsx', '.xml', '.html', 
            '.json', '.jsonl'
        ]
        
        file_ext = os.path.splitext(document_path)[1].lower()
        if file_ext not in valid_extensions:
            raise ValueError(f"Unsupported file format. Use: {valid_extensions}")

        return True

    def process_document(self, document_path: str) -> Optional[Dict[str, str]]:
        try:
            self.validate_document(document_path)

            print(f"Processing document: {document_path}")
            print("Uploading document...")
            file_response = self.client.file.upload_file(document_path)
            print(f"Upload successful!")

            print("Extracting information...")
            response = self.client.add_table_rows(
                table_type=t.TableType.ACTION,
                request=t.MultiRowAddRequest(
                    table_id="news_summarization",
                    data=[{"document": file_response.uri}],
                    stream=False,
                ),
            )

            results = {
                "summary": response.rows[0].columns["summary"].text,
                "tag": response.rows[0].columns["tag"].text
            }
            return results

        except Exception as e:
            print(f"Error: {str(e)}")
            return None


def process_folder(folder_path: str, processor: DocumentProcessor) -> None:
    """Process all documents in a folder"""
    if not os.path.exists(folder_path):
        print(f"Folder not found: {folder_path}")
        return

    valid_extensions = [
        '.csv', '.tsv', '.txt', '.md', '.doc', '.docx', '.pdf', 
        '.ppt', '.pptx', '.xls', '.xlsx', '.xml', '.html', 
        '.json', '.jsonl'
    ]
    
    results = []
    for filename in os.listdir(folder_path):
        if any(filename.lower().endswith(ext) for ext in valid_extensions):
            document_path = os.path.join(folder_path, filename)
            result = processor.process_document(document_path)
            if result:
                results.append({
                    "filename": filename,
                    **result
                })

    # Print results in a formatted way
    print("\nProcessing Results:")
    print("-" * 50)
    for result in results:
        print(f"File: {result['filename']}")
        print(f"Summary: {result['summary']}")
        print(f"Tag: {result['tag']}")
        print("-" * 50)


def main():
    # Set up argument parser
    parser = argparse.ArgumentParser(description='Process documents using JamAIBase')
    parser.add_argument('--project-id', required=True, help='Your JamAIBase project ID')
    parser.add_argument('--pat', required=True, help='Your Personal Access Token')
    parser.add_argument('--input', required=True, help='Path to document file or folder')

    args = parser.parse_args()

    # Initialize processor
    processor = DocumentProcessor(args.project_id, args.pat)

    # Process input
    if os.path.isfile(args.input):
        # Single file processing
        result = processor.process_document(args.input)
        if result:
            print("\nResults:")
            print("-" * 50)
            print(f"Summary: {result['summary']}")
            print(f"Tag: {result['tag']}")
            print("-" * 50)
    else:
        # Folder processing
        process_folder(args.input, processor)


if __name__ == "__main__":
    main()

How to Run

  1. Save the code as news_processor.py

  2. Install the required package:

pip install jamaibase
  1. Run for a single news:

python news_processor.py --project-id "your-project-id" --pat "your-pat" --input "path/to/news.txt"
  1. Run for a folder of news:

python news_processor.py --project-id "your-project-id" --pat "your-pat" --input "path/to/news/folder"

Example Output

Processing document: ./jamai_test/test_doc.docx
Uploading document...
Upload successful!
Extracting information...

Results:
--------------------------------------------------
Summary: 1. President Trump announced plans to visit Malaysia, Japan, and South Korea, aiming for a fair deal with China and a meeting with President Xi Jinping during an Asia-Pacific summit.  
2. The trip marks his first to the region since his second term began, with Japan expected from Oct 27 after a regional conference in Malaysia.  
3. Trump emphasized positive relations with Xi and expressed optimism despite ongoing trade tensions.
Tag: 1. Trump Asia Trip  
2. US China Relations  
3. Presidential Visit
--------------------------------------------------
The news summarization table with the uploaded documents.

This standalone example provides a complete, working implementation that you can use as a starting point for your own projects or modify according to your needs.

Last updated

Was this helpful?