Women Clothing Reviews Analysis Dashboard

Fashion Insights Made Easy:

Ever wondered how fashion brands stay ahead of the trends and keep their customers happy? It all starts with listening to what their audience has to say! Review analysis plays a crucial role in understanding customer feedback, allowing brands to improve products, enhance customer experience, and create marketing strategies that truly resonate.

By diving deep into reviews and analyzing style preferences, fit feedback, and overall satisfaction, brands can unlock a treasure trove of insights:

  • Tailored Products: Imagine knowing exactly what customers love (and don't love) about a particular dress. This detailed feedback enables targeted improvements that keep customers coming back for more.

  • Personalized Marketing: Understanding customer preferences allows brands to create marketing campaigns that speak directly to their audience, highlighting the styles and features they desire most.

  • Trendsetting Success: By analyzing review data, brands can quickly identify emerging trends and adapt their offerings to stay ahead of the curve in the ever-evolving fashion world.

Now, imagine having an AI assistant that can do all this for you! Introducing JamAI Base, your partner in unlocking the power of fashion insights. Forget about manually sifting through mountains of text – our cutting-edge LLMs will do the heavy lifting, providing you with actionable insights and recommendations for product and marketing improvements.

Here's the magic touch of JamAI Base:

  • Transforming Text into Treasure: JamAI Base converts unstructured review text into a beautifully organized, structured JSON format, making it easy to analyze and visualize the data.

  • Keywords at Your Fingertips: No more reading every single review! Our LLMs extract key themes and issues, giving you a quick overview of customer sentiment and recurring topics.

  • Visual Storytelling: With structured data, creating insightful graphs and charts becomes a breeze. Identify trends, patterns, and outliers like a data detective, all thanks to the power of JamAI Base.

Ready to unlock the fashion insights hidden within your review data? Let's embark on this stylish journey together!

Project Overview

Imagine having an AI assistant analyze hundreds of clothing reviews, extracting key features, sentiments, and trends, and presenting them in a beautiful dashboard. That's what we'll achieve!

Our goal:

  • Analyze women's clothing reviews using LLMs.

  • Extract features like fit, style, quality, and sentiment.

  • Identify trends and patterns in customer feedback.

  • Visualize the results in a user-friendly Streamlit dashboard.

Tools we'll use:

  • JamAI Base Action Table: Our AI-powered data powerhouse.

  • Python: The language of choice for interacting with JamAI Base and building our analysis.

  • Streamlit: Creates a stunning dashboard to showcase our findings.

Setting Up JamAI Base

  1. JamAI Base: All you need is the JamAI Base APIs!

  2. Eagle Client: Python SDK to give extra functionality. (Optional)

Full Code: https://github.com/EmbeddedLLM/jamaibase-cookbook/women_clothing_reviews

Building Backend with JamAI Base Action Table

# jamai_action_table.py
import pandas as pd
import json
import time
import httpx
from httpx import Timeout
from loguru import logger
from protocol import Page

BASE_URL = "http://192.168.80.61:6969/api"

SCHEMA = {
    "features": {
        "fit": {
            "sizing": ["true to size", "runs small", "runs large", "other"],
            "silhouette": [
                "flattering",
                "comfortable",
                "loose",
                "boxy",
                "fitted",
                "flowy",
                "other",
            ],
            "body_area_fit": {
                "waist": [
                    "high-waisted",
                    "low-waisted",
                    "empire waist",
                    "natural waist",
                    "other",
                ],
                "hips": ["tight", "loose", "just right"],
                "length": ["long", "short", "midi", "maxi", "cropped", "other"],
            },
        },
        "style": {
            "aesthetic": [
                "classic",
                "trendy",
                "bohemian",
                "romantic",
                "vintage",
                "minimalistic",
                "other",
            ],
        },
        "quality": {
            "construction": ["well-made", "cheap", "other"],
        },
    },
    "issues": [
        "zipper_problems",
        "see-through",
        "fabric_quality",
        "inconsistent_sizing",
        "arm_hole_issues",
        "unflattering",
        "itchy",
        "too_short",
        "too_long",
        "too_tight",
        "too_loose",
        "shrinkage",
        "other",
    ],
    "sentiment": ["positive", "negative", "neutral"],
}

FULL_SCHEMA = {
    "clothing_id": "int",
    "age": "int",
    "title": "string",
    "review_text": "string",
    "rating": "int",
    "recommended": "bool",
    "positive_feedback_count": "int",
    "division_name": "string",
    "department_name": "string",
    "class_name": "string",
    "features": SCHEMA["features"],
    "issues": SCHEMA["issues"],
    "sentiment": SCHEMA["sentiment"],
}


DF = pd.read_csv("Womens Clothing E-Commerce Reviews.csv")


class Timer:
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        self.end = time.time()
        self.interval = self.end - self.start
        logger.info(
            f"Execution time: {self.interval:.2f} seconds | {self.interval/60:.2f} minutes | {self.interval/3600:.2f} hours"
        )


class ActionTableCommuniate:
    def __init__(self) -> None:
        self.client = httpx.Client(
            transport=httpx.HTTPTransport(retries=3), timeout=Timeout(5 * 60)
        )

    def create_table(
        self,
        table_id: str,
        cols_info: tuple[dict[str, str], dict[str, str]] = None,
    ):

        schema = {
            "id": table_id,
            "cols": [{"id": k, "dtype": v} for k, v in cols_info[0].items()]
            + [{"id": k, "dtype": v} for k, v in cols_info[1].items()],
        }

        response = self.client.post(f"{BASE_URL}/v1/gen_tables/action", json=schema)
        response.raise_for_status()

    def add_row(self, table_id: str, row=dict[str, str]):
        response = self.client.post(
            f"{BASE_URL}/v1/gen_tables/action/rows/add",
            json={"table_id": table_id, "data": row, "stream": False},
        )
        logger.info(response.text)
        response.raise_for_status()
        logger.info(response.json())
        return

    def update_gen_config(self, gen_config: dict):
        response = self.client.post(
            f"{BASE_URL}/v1/gen_tables/action/gen_config/update",
            json=gen_config,
        )
        logger.info(response.status_code)
        response.raise_for_status()

    def get_table(self, table_id):
        response = self.client.get(f"{BASE_URL}/v1/gen_tables/action/{table_id}")
        response.raise_for_status()
        return response.json()

    def delete_table(self, table_id):
        response = self.client.delete(f"{BASE_URL}/v1/gen_tables/action/{table_id}")
        response.raise_for_status()

    def get_rows(self, table_id):
        response = self.client.get(f"{BASE_URL}/v1/gen_tables/action/{table_id}/rows")
        response.raise_for_status()
        page = Page[dict](**response.json())
        return page.items
  • We define the SCHEMA which outlines the structure of the extracted features, issues, and sentiment from the reviews.

  • FULL_SCHEMA combines input data columns with the output columns for the extracted information.

  • ActionTableCommunicate class handles interactions with the Action Table API, allowing us to create tables, add rows, and configure LLM settings.

  • We specify the table_id and cols_info to define the table structure and create it using the create_table method.

Adding Data and LLM Magic

Now comes the fun part – adding review data and letting LLMs work their magic!

  1. Prepare your data: Load your women's clothing review data into a Pandas DataFrame.

  2. Define LLM prompts: We'll use prompts to instruct the LLMs on what information to extract from the reviews.

def prompt_of_col(model_name: str, column_name: str):
    return {
        "id": "",
        "model": model_name,
        "messages": [
            {
                "role": "system",
                "content": "You are an artificial intelligent assistant created by EmbeddedLLM. You should give helpful, detailed, and polite answers to the human's questions.",
            },
            {
                "role": "user",
                "content": "${Features} \n\nJson data above is the customer review, get the information of "
                + f"{column_name}. If the information is null, just output null. Format the list into lowercase string seperated by comma. \nOnly output the string seperated by comma, not list, do not include any other information.",
            },
        ],
        "functions": [],
        "function_call": "auto",
        "temperature": 0.1,
        "top_p": 0.01,
        "stream": False,
        "stop": [],
        "max_tokens": 2000,
        "presence_penalty": 0,
        "frequency_penalty": 0,
    }


def main():

    MODEL_NAME = "anthropic/claude-3-haiku-20240307"

    atc = ActionTableCommuniate()

    table_id = "WomenClothingReviews_cookbook"

    try:
        # atc.delete_table(table_id)

        cols_info = (
            {
                "No": "int",
                "Clothing Id": "int",
                "Age": "int",
                "Title": "str",
                "Review Text": "str",
                "Rating": "int",
                "Recommended": "bool",
                "Positive Feedback Count": "int",
                "Division Name": "str",
                "Department Name": "str",
                "Class Name": "str",
            },
            {
                "Keywords": "str",
                "Features": "str",
                "Fit Sizing": "str",
                "Fit Silhouette": "str",
                "Fit Body Area Waist": "str",
                "Fit Body Area Hips": "str",
                "Fit Body Area Length": "str",
                "Style Aesthetic": "str",
                "Quality Construction": "str",
                "Issue": "str",
                "Sentiment": "str",
            },
        )

        atc.create_table(table_id, cols_info)

        column_map = {
            "Keywords": {
                "id": "",
                "model": MODEL_NAME,
                "messages": [
                    {
                        "role": "system",
                        "content": "You are an artificial intelligent assistant created by EmbeddedLLM. You should give helpful, detailed, and polite answers to the human's questions.",
                    },
                    {
                        "role": "user",
                        "content": "[Title]: \n${Title} \n\n[Review Text]: \n${Review Text}"
                        + "\n\n\nUnderstand the Title and Review Text, extract keywords to help people read better. "
                        + "List out the keywords, seperate by comma. Only output keywords, do not include any other information.",
                    },
                ],
                "functions": [],
                "function_call": "auto",
                "temperature": 0.1,
                "top_p": 0.01,
                "stream": False,
                "stop": [],
                "max_tokens": 2000,
                "presence_penalty": 0,
                "frequency_penalty": 0,
            },
            "Features": {
                "id": "",
                "model": "openai/gpt-3.5-turbo",
                "messages": [
                    {
                        "role": "system",
                        "content": "You are an artificial intelligent assistant created by EmbeddedLLM. You should give helpful, detailed, and polite answers to the human's questions.",
                    },
                    {
                        "role": "user",
                        "content": "[Title]: \n${Title} \n\n[Review Text]: \n${Review Text} \n\n[Schema]: \n"
                        + json.dumps(SCHEMA)
                        + "\n\n\nUnderstand the Title and Review Text, convert the result to structural format based on Schema."
                        + "\nIf there is no information in the Review Text, put Null. Only output structured json, do not include any other information.",
                    },
                ],
                "functions": [],
                "function_call": "auto",
                "temperature": 0.1,
                "top_p": 0.01,
                "stream": False,
                "stop": [],
                "max_tokens": 2000,
                "presence_penalty": 0,
                "frequency_penalty": 0,
            },
        }
        for out_col in cols_info[1].keys():
            if out_col not in ["Keywords", "Features"]:
                column_map[out_col] = prompt_of_col(MODEL_NAME, out_col)

        gen_config = {"table_id": table_id, "column_map": column_map}
        atc.update_gen_config(gen_config)

        col_map = {
            "Unnamed: 0": "No",
            "Clothing ID": "Clothing Id",
            "Recommended IND": "Recommended",
        }

        records = DF.to_dict("records")
        # --- Sample some rows, comment out to include all rows --- #
        records = records[::500]
        # --------------------------------------------------------- #
        with Timer() as t:
            for i, record in enumerate(records):
                try:
                    row = {
                        col_map[k] if k in col_map else k: str(v)
                        for k, v in record.items()
                    }
                    logger.info(f"Adding {i}, {row}")
                    atc.add_row(table_id, row)
                except Exception:
                    continue

        logger.info(f"Inserted {len(records)} rows.")

        # Check inserted row
        result_rows = atc.get_rows(table_id)
        logger.info(result_rows[0])

    except httpx.HTTPError as http_err:
        logger.error(f"HTTP error occurred: {http_err}")
    except Exception as err:
        logger.error(f"An error occurred: {err}")


if __name__ == "__main__":
    main()
  • cols_info defines input and output columns, only output columns need the prompts.

  • prompt_of_col function defines the structure of a prompt for a specific column, instructing the LLM on how to extract and format the information.

  • MODEL_NAME specifies the LLM you want to use. JamAI Base supports various LLMs, so choose your favorite!

  • column_map associates each output column with its corresponding prompt.

  • Update the generation configuration of the Action Table with our defined prompts using update_gen_config.

  1. Add rows and generate insights: Loop through your DataFrame, add each review as a row to the Action Table, and let the LLMs populate the output columns.

  • We convert the DataFrame into a list of dictionaries (records) and iterate through each record.

  • For each record, we create a row for the Action Table and call add_row to insert it. The LLMs will process the input data and populate the output columns based on the defined prompts.

  • We verify the inserted rows and generated outputs using get_rows.

  1. Start to add rows and let's LLM to perform the magic!

    $ python jamai_action_table.py

Building the Streamlit Dashboard

# dashboard.py
import sys
import streamlit as st
import httpx
from httpx import Timeout
from loguru import logger
import pandas as pd
import streamlit as st
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from protocol import Page


def get_rows(table_id):
    BASE_URL = "http://192.168.80.61:6969/api"
    client = httpx.Client(
        transport=httpx.HTTPTransport(retries=3), timeout=Timeout(5 * 60)
    )

    response = client.get(f"{BASE_URL}/v1/gen_tables/action/{table_id}/rows")
    response.raise_for_status()
    page = Page[dict](**response.json())
    return page.items[::]


def bar_chart(df_expanded, col1, col2):
    # Adjust the binning process
    if col1 == "Age":
        df_expanded[col1] = pd.cut(
            df_expanded[col1],
            bins=range(10, 101, 10),
            right=False,
            labels=[f"{i}-{i+9}" for i in range(10, 100, 10)],
        )
    elif col1 == "Positive Feedback Count":
        df_expanded[col1] = pd.cut(
            df_expanded[col1],
            bins=range(0, 21, 5),
            right=False,
            labels=[f"{i}-{i+4}" for i in range(0, 20, 5)],
        )

    # Group by the specified columns and count occurrences
    counts = df_expanded.groupby([col1, col2]).size().unstack(fill_value=0)

    if col1 == "Age" or "Positive Feedback Count":
        # Normalize the data by the total counts in each age group
        counts_normalized = counts.div(counts.sum(axis=1), axis=0)
    else:
        counts_normalized = counts
    fig, ax = plt.subplots(width_ratios=[0.8])
    # Stacked bar chart for better visualization of proportions
    counts_normalized.plot(kind="bar", ax=ax, width=0.8, stacked=True)
    ax.set_title(f"Normalized Frequency of {col2} by {col1}")
    ax.set_xlabel(col1)
    ax.set_ylabel("Normalized Frequency")
    ax.legend(title=col2)
    ax.tick_params(labelrotation=0)
    
    # Use Streamlit to display the chart
    st.pyplot(fig)


def apps():
    FLATTEN_SCHEMA = {
        "Fit Sizing": ["true to size", "runs small", "runs large"],
        "Fit Silhouette": [
            "flattering",
            "comfortable",
            "loose",
            "boxy",
            "fitted",
            "flowy",
        ],
        "Fit Body Area Waist": [
            "high-waisted",
            "low-waisted",
            "empire waist",
            "natural waist",
        ],
        "Fit Body Area Hips": ["tight", "loose", "just right"],
        "Fit Body Area Length": ["long", "short", "midi", "maxi", "cropped"],
        "Style Aesthetic": [
            "classic",
            "trendy",
            "bohemian",
            "romantic",
            "vintage",
            "minimalistic",
        ],
        "Quality Construction": ["well-made", "cheap", "other"],
        "Issue": [
            "zipper_problems",
            "see-through",
            "fabric_quality",
            "inconsistent_sizing",
            "arm_hole_issues",
            "unflattering",
            "itchy",
            "too_short",
            "too_long",
            "too_tight",
            "too_loose",
            "shrinkage",
        ],
        "Sentiment": ["positive", "negative", "neutral"],
    }

    flatten_columns = [
        "Fit Sizing",
        "Fit Silhouette",
        "Fit Body Area Waist",
        "Fit Body Area Hips",
        "Fit Body Area Length",
        "Style Aesthetic",
        "Quality Construction",
        "Issue",
        "Sentiment",
    ]
    compare_columns = ["Age", "Rating", "Positive Feedback Count"]

    TABLE_ID = "WomenClothingReviews_cookbook"

    rows = get_rows(TABLE_ID)

    tabs = st.tabs(["Action Table", "Keywords"] + flatten_columns)
    tab_action_table, tab_analysis_age_keywords = tabs[0:2]
    tab_matrix = tabs[2:]

    df_ = pd.DataFrame(rows)

    with tab_action_table:
        st.table(df_)

    with tab_analysis_age_keywords:
        df = df_[["No", "Age", "Rating", "Keywords"]]
        df_expanded_age_keywords = df.drop("Keywords", axis=1).join(
            df["Keywords"]
            .str.split(", ", expand=True)
            .stack()
            .reset_index(level=1, drop=True)
            .rename("Keywords")
        )

        keyword_counts = df_expanded_age_keywords["Keywords"].value_counts().head(100)

        st.dataframe(
            keyword_counts.reset_index().rename(
                columns={"index": "Keyword", "keyword": "Frequency"}
            )
        )

        keyword_freq = keyword_counts.to_dict()

        wordcloud = WordCloud(width=800, height=400, background_color="white")

        # Generate the word cloud from frequencies
        wordcloud.generate_from_frequencies(keyword_freq)

        # Use matplotlib to create a figure
        fig, ax = plt.subplots()
        ax.imshow(wordcloud, interpolation="bilinear")
        ax.axis("off")  # Hide the axes

        st.pyplot(fig)

    for tab, col2 in zip(tab_matrix, flatten_columns):
        with tab:

            subtabs = st.columns([0.3, 0.3, 0.3])
            for subtab, col1 in zip(subtabs, compare_columns):

                matrix_name = f"{col1} vs {col2}"
                with subtab:
                    st.write(matrix_name)
                    df = df_[["No", col1, col2]]
                    df_expanded = df.drop(col2, axis=1).join(
                        df[col2]
                        .str.split(", ", expand=True)
                        .stack()
                        .reset_index(level=1, drop=True)
                        .rename(col2)
                    )

                    keywords_to_keep = FLATTEN_SCHEMA[col2]

                    # Filter the DataFrame
                    filtered_df = df_expanded[df_expanded[col2].isin(keywords_to_keep)]

                    bar_chart(filtered_df, col1, col2)
                    st.table(filtered_df)


def main():
    st.set_page_config(page_icon="🦊", layout="wide")
    st.title("Jamai Women Clothing Reviews Dashboard 🦊🌸")
    try:
        apps()
    except Exception as e:
        logger.exception(e)
        st.error(e)
        st.warning(
            "Sorry we seem to have encountered an internal issue 🥺🙏 \nPlease try refreshing your browser."
        )


if __name__ == "__main__":
    logger.remove()
    logger.add(sys.stderr, level="INFO")
    logger.add(
        "jamai_app.log",
        level="INFO",
        enqueue=True,
        backtrace=True,
        diagnose=True,
        rotation="50 MB",
    )
    main()
  • We define functions like bar_chart to create different visualizations.

  • The apps function defines the layout and content of the Streamlit dashboard. We can display tables, charts, and word clouds to showcase the extracted insights.

  • The main function sets up logging and runs the Streamlit app.

Running Your Project

  1. Run the Streamlit app: Open your terminal and navigate to the directory containing your dashboard.py file. Run the command streamlit run dashboard.py.

  2. Explore your dashboard: Your browser will open, revealing your interactive dashboard with visualizations and insights generated from the women's clothing reviews!

Congratulations!

You've built a fantastic project using JamAI Base Action Table and explored the power of LLMs in data analysis. Experiment with different LLMs, prompts, and visualizations to further enhance your project and impress your friends with your AI-powered analysis skills!

Last updated