sort

Deadline

ended (2025-06-30 00:00 UTC)

Language

Python

GPU Types

A100, H100, L4, T4

Description

Implement a sort kernel that matches the reference implementation. The kernel should sort the input array in ascending order using a sort algorithm of your choice. Input arrays are generated as random floating-point numbers, where each row of a roughly square matrix is drawn from a normal distribution with a different mean value per row based on the seed and then flattened into a 1D array.

Reference Implementation

from utils import make_match_reference
import torch
from task import input_t, output_t


def ref_kernel(data: input_t) -> output_t:
    """
    Reference implementation of sort using PyTorch.
    Args:
        data: Input tensor to be sorted
    Returns:
        Sorted tensor
    """
    return torch.sort(data)[0]


def generate_input(size: int, seed: int) -> torch.Tensor:
    """
    Generates random input tensor where elements are drawn from different distributions.
    
    Args:
        size: Total size of the final 1D tensor
        seed: Base seed for random generation
    
    Returns:
        1D tensor of size `size` containing flattened values from different distributions
    """
    # Calculate dimensions for a roughly square 2D matrix
    rows = int(size ** 0.5)  # Square root for roughly square shape
    cols = (size + rows - 1) // rows  # Ceiling division to ensure total size >= requested size
    
    gen = torch.Generator(device='cuda')
    result = torch.empty((rows, cols), device='cuda', dtype=torch.float32)
    
    # Different seed for each row!
    for i in range(rows):
        row_seed = seed + i
        gen.manual_seed(row_seed)
        
        # Generate values for this row with mean=row_seed
        result[i, :] = torch.randn(cols, device='cuda', dtype=torch.float32, generator=gen) + row_seed
    
    # Flatten and trim to exact size requested
    return result.flatten()[:size].contiguous()


check_implementation = make_match_reference(ref_kernel)

Rankings

L4

Nader 🥇 15687.478μs submission.py
Dante 🥈 49099.568μs   +33412.090μs submission.py
ajhinh 🥉 1175091.112μs   +1125991.544μs l4.py

T4

Nader 🥇 25029.484μs submission.py
Dante 🥈 60884.220μs   +35854.736μs submission.py
ajhinh 🥉 1153206.609μs   +1092322.389μs t4.py

A100

Nader 🥇 3627.491μs submission.py
Dante 🥈 13311.998μs   +9684.507μs submission.py
mancala 🥉 15801.310μs   +2489.312μs submission.py
ajhinh 202852.265μs   +187050.955μs a100.py

H100

Nader 🥇 2275.541μs submission.py
Dante 🥈 6343.298μs   +4067.757μs submission.py
Seraphim 🥉 6552.651μs   +209.353μs submission.py
sajy 6582.328μs   +29.677μs submission.py
yechenzhi 6897.514μs   +315.186μs ref.py
mancala 7159.787μs   +262.273μs submission.py
ajhinh 139032.024μs   +131872.237μs h100.py