Skip to content
On this page

Displaying a Dataframe - .show()

Overview

The show() method is used to display the contents of a DataFrame in a tabular format. It allows you to inspect the data within the DataFrame and is particularly useful during the development and debugging stages. The show() method provides options to control the number of rows displayed, truncate long strings, and adjust column widths, making it more flexible and user-friendly.

Syntax of the show() method:

python
show(n=20, truncate=True, vertical=False)

Parameters:

  • n: The number of rows to display. The default value is 20.
  • truncate: If True, long strings in the DataFrame will be truncated, and ellipsis will be added at the end of the strings. If False, the entire string will be displayed. The default value is True.
  • vertical: If True, the output will be displayed vertically. If False, the output will be displayed horizontally. The default value is False.

Usage

Assuming we have the following DataFrame:

python
from pyspark.sql import SparkSession

# Create a SparkSession (if not already created)
spark = SparkSession.builder.appName("ShowMethodExample").getOrCreate()

# Sample data as a list of dictionaries
data = [
    {"name": "Alice", "age": 30, "city": "New York"},
    {"name": "Bob", "age": 25, "city": "San Francisco"},
    {"name": "Charlie", "age": 35, "city": "Los Angeles"},
    # Add more rows as needed
]

# Create a DataFrame
df = spark.createDataFrame(data)

1. Show the default number of rows (20):

python
df.show()

Output:

+-------+---+-------------+
|   name|age|         city|
+-------+---+-------------+
|  Alice| 30|     New York|
|    Bob| 25|San Francisco|
|Charlie| 35|  Los Angeles|
+-------+---+-------------+

2. Show only the first 3 rows:

python
df.show(3)

Output:

+-------+---+-------------+
|   name|age|         city|
+-------+---+-------------+
|  Alice| 30|     New York|
|    Bob| 25|San Francisco|
|Charlie| 35|  Los Angeles|
+-------+---+-------------+

3. Show all rows without truncation:

python
df.show(n=df.count(), truncate=False)

Output:

+-------+---+-------------+
|   name|age|         city|
+-------+---+-------------+
|  Alice| 30|     New York|
|    Bob| 25|San Francisco|
|Charlie| 35|  Los Angeles|
+-------+---+-------------+

4. Show vertically:

python
df.show(vertical=True)

Output:

-RECORD 0----------
name | Alice       
age  | 30          
city | New York    
-RECORD 1----------
name | Bob         
age  | 25          
city | San Francisco
-RECORD 2----------
name | Charlie     
age  | 35          
city | Los Angeles

By using the various parameters of the show() method, you can control the appearance and size of the DataFrame output, making it more convenient for data exploration and debugging.

📖👉 Official Doc