# Action Table - Document

## 1. Introduction

This tutorial will guide you through using the JamAI Base Python SDK to create a simple news summarization system.

### What We'll Build

Get a news and let AI tells you the main message!

In this tutorial, we'll create a news summarization system that:

1. Takes a news document as input. The supported formats include: .csv, .tsv, .txt, .md, .doc, .docx, .pdf, .ppt, .pptx, .xls, .xlsx, .xml, .html, .json, .jsonl.
2. Upload the document to JamAI Base action table.
3. Obtain the key information like:
   * Summary
   * Tag

### Prerequisites

Before starting, you'll need:

* Python 3.11 or higher installed
* Project ID and Personal Access Token (PAT)&#x20;

## 2. Installation and Setup

### Installing the Python SDK

```bash
pip install jamaibase
```

### Basic Configuration

Get your Personal Access Token (PAT) here:

<figure><img src="/files/Xh4HeLypIqKJ1EKcDKHJ" alt=""><figcaption><p>How to generate PAT</p></figcaption></figure>

Get your Project ID here:

<figure><img src="/files/pz1Gr6LRLftmj8PXqcX0" alt=""><figcaption><p>How to get Project ID</p></figcaption></figure>

```python
from jamaibase import JamAI, types as t

PROJECT_ID = "your_project_id"
PAT = "your_PAT"

client = JamAI(
    project_id=PROJECT_ID,
    token=PAT
)
```

{% hint style="info" %}
You can use a .env file to manage your PROJECT\_ID and PAT.
{% endhint %}

## 3. Creating Your Action Table

For simplicity, you can set up your action table in the JamAI Base platform:

1. Navigate to your JamAI Base action table tab
2. Create a new action table named `"news_summarization"`

<figure><img src="/files/W03S51nU7WGHWXcnahaJ" alt=""><figcaption></figcaption></figure>

3. Create the following columns:&#x20;

| Name     | Column Type | Data Type |
| -------- | ----------- | --------- |
| document | Input       | Document  |
| summary  | LLM Output  | Text      |
| tag      | LLM Output  | Text      |

4. Update the Prompt as LLM Output columns as follows:

<table><thead><tr><th width="113.199951171875">Column name</th><th>Prompt</th></tr></thead><tbody><tr><td>summary</td><td><p>Table name: "news_summarization"</p><p></p><p>document: ${document}</p><p></p><p>Summarize the article in not more than three sentences and not more than 50 words.</p><p>Provide the summary in the number form.</p><p>Be factual and do not hallucinate. Remember to act as a cell in a spreadsheet and provide concise, relevant information without explanations unless specifically requested.</p></td></tr><tr><td>tag</td><td><p>Table name: "news_summarization"</p><p></p><p>document: ${document}</p><p></p><p>Provide at most three tags that well represent the document. Each tag should not have more than three words.</p><p>Provide the tag in the number form.</p><p>Be factual and do not hallucinate. Remember to act as a cell in a spreadsheet and provide concise, relevant information without explanations unless specifically requested.</p></td></tr></tbody></table>

## 4. Basic Implementation

### 4.1 Simple Document Processor

```python
def process_single_news(document_path):
    # Upload document file
    file_response = client.file.upload_file(document_path)

    # Process in action table
    response = client.table.add_table_rows(
        table_type=t.TableType.ACTION,
        request=t.MultiRowAddRequest(
            table_id="news_summarization",
            data=[{"document": file_response.uri}],
            stream=False,
        ),
    )

    # Extract results
    return {
        "summary": response.rows[0].columns["summary"].text,
        "tag": response.rows[0].columns["tag"].text
    }
```

### 4.2 Complete Implementation with Error Handling

```python
class DocumentProcessor:
    def __init__(self, project_id: str, pat: str):
        self.client = JamAI(
            project_id=project_id,
            token=pat
        )

    def validate_document(self, document_path: str) -> bool:
        """Validate if file exists and has correct extension"""
        if not os.path.exists(document_path):
            raise FileNotFoundError(f"Document not found: {document_path}")

        valid_extensions = [
            '.csv', '.tsv', '.txt', '.md', '.doc', '.docx', '.pdf', 
            '.ppt', '.pptx', '.xls', '.xlsx', '.xml', '.html', 
            '.json', '.jsonl'
        ]
        
        file_ext = os.path.splitext(document_path)[1].lower()
        if file_ext not in valid_extensions:
            raise ValueError(f"Unsupported file format. Use: {valid_extensions}")

        return True

    def process_document(self, document_path: str) -> Optional[Dict[str, str]]:
        """Process a single document file"""
        try:
            # Validate document
            self.validate_document(document_path)

            # Upload file
            print("Uploading document...")
            file_response = self.client.file.upload_file(document_path)
            print(f"Upload successful: {file_response.uri}")

            # Process in action table
            print("Processing document...")
            response = self.client.table.add_table_rows(
                table_type=t.TableType.ACTION,
                request=t.MultiRowAddRequest(
                    table_id="news_summarization",
                    data=[{"document": file_response.uri}],
                    stream=False,
                ),
            )

            # Extract and return results
            results = {
                "summary": response.rows[0].columns["summary"].text,
                "tag": response.rows[0].columns["tag"].text
            }
            print("Processing complete!")
            return results

        except Exception as e:
            print(f"Error processing document: {str(e)}")
            return None
```

## 5. Usage Examples

### 5.1 Basic Usage

```python
# Initialize processor
processor = DocumentProcessor(PROJECT_ID, PAT)

# Process single receipt
result = processor.process_document("path/to/news.txt")

if result:
    print(f"Summary: {result['summary']}")
    print(f"Tag: {result['tag']}")
```

### 5.2 Batch Processing

```python
def process_document_batch(document_folder: str):
    processor = DocumentProcessor(PROJECT_ID, PAT)
    results = []

    for filename in os.listdir(document_folder):
        # Check if file has a supported document extension
        valid_extensions = [
            '.csv', '.tsv', '.txt', '.md', '.doc', '.docx', '.pdf', 
            '.ppt', '.pptx', '.xls', '.xlsx', '.xml', '.html', 
            '.json', '.jsonl'
        ]
        
        if any(filename.lower().endswith(ext) for ext in valid_extensions):
            document_path = os.path.join(document_folder, filename)
            result = processor.process_document(document_path)
            if result:
                results.append({
                    "filename": filename,
                    **result
                })

    return results

# Usage
results = process_document_batch("path/to/documents/folder")
for result in results:
    print(f"File: {result['filename']}")
    print(f"Summary: {result['summary']}")
    print(f"Tag: {result['tag']}")
    print("---")
```

## 6. Best Practices

1. **Error Handling**
   * Always validate input files
   * Handle network errors gracefully
2. **Performance**
   * Reuse the client instance
   * Consider batch processing for multiple files
   * Implement rate limiting for large batches
3. **Security**
   * Use environment variables for credentials

## Complete Standalone Example

```python
import os
import argparse
from jamaibase import JamAI, types as t
from typing import Dict, Optional

class DocumentProcessor:
    def __init__(self, project_id: str, pat: str):
        self.client = JamAI(
            project_id=project_id,
            token=pat
        )

    def validate_document(self, document_path: str) -> bool:
        if not os.path.exists(document_path):
            raise FileNotFoundError(f"Document not found: {document_path}")

        valid_extensions = [
            '.csv', '.tsv', '.txt', '.md', '.doc', '.docx', '.pdf', 
            '.ppt', '.pptx', '.xls', '.xlsx', '.xml', '.html', 
            '.json', '.jsonl'
        ]
        
        file_ext = os.path.splitext(document_path)[1].lower()
        if file_ext not in valid_extensions:
            raise ValueError(f"Unsupported file format. Use: {valid_extensions}")

        return True

    def process_document(self, document_path: str) -> Optional[Dict[str, str]]:
        try:
            self.validate_document(document_path)

            print(f"Processing document: {document_path}")
            print("Uploading document...")
            file_response = self.client.file.upload_file(document_path)
            print(f"Upload successful!")

            print("Extracting information...")
            response = self.client.add_table_rows(
                table_type=t.TableType.ACTION,
                request=t.MultiRowAddRequest(
                    table_id="news_summarization",
                    data=[{"document": file_response.uri}],
                    stream=False,
                ),
            )

            results = {
                "summary": response.rows[0].columns["summary"].text,
                "tag": response.rows[0].columns["tag"].text
            }
            return results

        except Exception as e:
            print(f"Error: {str(e)}")
            return None


def process_folder(folder_path: str, processor: DocumentProcessor) -> None:
    """Process all documents in a folder"""
    if not os.path.exists(folder_path):
        print(f"Folder not found: {folder_path}")
        return

    valid_extensions = [
        '.csv', '.tsv', '.txt', '.md', '.doc', '.docx', '.pdf', 
        '.ppt', '.pptx', '.xls', '.xlsx', '.xml', '.html', 
        '.json', '.jsonl'
    ]
    
    results = []
    for filename in os.listdir(folder_path):
        if any(filename.lower().endswith(ext) for ext in valid_extensions):
            document_path = os.path.join(folder_path, filename)
            result = processor.process_document(document_path)
            if result:
                results.append({
                    "filename": filename,
                    **result
                })

    # Print results in a formatted way
    print("\nProcessing Results:")
    print("-" * 50)
    for result in results:
        print(f"File: {result['filename']}")
        print(f"Summary: {result['summary']}")
        print(f"Tag: {result['tag']}")
        print("-" * 50)


def main():
    # Set up argument parser
    parser = argparse.ArgumentParser(description='Process documents using JamAIBase')
    parser.add_argument('--project-id', required=True, help='Your JamAIBase project ID')
    parser.add_argument('--pat', required=True, help='Your Personal Access Token')
    parser.add_argument('--input', required=True, help='Path to document file or folder')

    args = parser.parse_args()

    # Initialize processor
    processor = DocumentProcessor(args.project_id, args.pat)

    # Process input
    if os.path.isfile(args.input):
        # Single file processing
        result = processor.process_document(args.input)
        if result:
            print("\nResults:")
            print("-" * 50)
            print(f"Summary: {result['summary']}")
            print(f"Tag: {result['tag']}")
            print("-" * 50)
    else:
        # Folder processing
        process_folder(args.input, processor)


if __name__ == "__main__":
    main()
```

### How to Run

1. Save the code as `news_processor.py`&#x20;
2. Install the required package:&#x20;

```bash
pip install jamaibase
```

3. Run for a single news:

```bash
python news_processor.py --project-id "your-project-id" --pat "your-pat" --input "path/to/news.txt"
```

4. Run for a folder of news:

```bash
python news_processor.py --project-id "your-project-id" --pat "your-pat" --input "path/to/news/folder"
```

### Example Output

```
Processing document: ./jamai_test/test_doc.docx
Uploading document...
Upload successful!
Extracting information...

Results:
--------------------------------------------------
Summary: 1. President Trump announced plans to visit Malaysia, Japan, and South Korea, aiming for a fair deal with China and a meeting with President Xi Jinping during an Asia-Pacific summit.  
2. The trip marks his first to the region since his second term began, with Japan expected from Oct 27 after a regional conference in Malaysia.  
3. Trump emphasized positive relations with Xi and expressed optimism despite ongoing trade tensions.
Tag: 1. Trump Asia Trip  
2. US China Relations  
3. Presidential Visit
--------------------------------------------------
```

<figure><img src="/files/fAauJ7ro6bs7SLlWEazr" alt=""><figcaption><p>The news summarization table with the uploaded documents.</p></figcaption></figure>

This standalone example provides a complete, working implementation that you can use as a starting point for your own projects or modify according to your needs.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.jamaibase.com/developer-reference/python-sdk-documentation/quick-start-action-table/action-table-document.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
