sort
Deadline
ended (2025-06-30 00:00 UTC)
Language
Python
GPU Types
A100, H100, L4, T4
Description
Implement a sort kernel that matches the reference implementation. The kernel should sort the input array in ascending order using a sort algorithm of your choice. Input arrays are generated as random floating-point numbers, where each row of a roughly square matrix is drawn from a normal distribution with a different mean value per row based on the seed and then flattened into a 1D array.
Reference Implementation
from utils import make_match_reference
import torch
from task import input_t, output_t
def ref_kernel(data: input_t) -> output_t:
"""
Reference implementation of sort using PyTorch.
Args:
data: Input tensor to be sorted
Returns:
Sorted tensor
"""
return torch.sort(data)[0]
def generate_input(size: int, seed: int) -> torch.Tensor:
"""
Generates random input tensor where elements are drawn from different distributions.
Args:
size: Total size of the final 1D tensor
seed: Base seed for random generation
Returns:
1D tensor of size `size` containing flattened values from different distributions
"""
# Calculate dimensions for a roughly square 2D matrix
rows = int(size ** 0.5) # Square root for roughly square shape
cols = (size + rows - 1) // rows # Ceiling division to ensure total size >= requested size
gen = torch.Generator(device='cuda')
result = torch.empty((rows, cols), device='cuda', dtype=torch.float32)
# Different seed for each row!
for i in range(rows):
row_seed = seed + i
gen.manual_seed(row_seed)
# Generate values for this row with mean=row_seed
result[i, :] = torch.randn(cols, device='cuda', dtype=torch.float32, generator=gen) + row_seed
# Flatten and trim to exact size requested
return result.flatten()[:size].contiguous()
check_implementation = make_match_reference(ref_kernel)
Rankings
L4
Nader 🥇 | 15687.478μs | submission.py |
Dante 🥈 | 49099.568μs +33412.090μs | submission.py |
ajhinh 🥉 | 1175091.112μs +1125991.544μs | l4.py |
T4
Nader 🥇 | 25029.484μs | submission.py |
Dante 🥈 | 60884.220μs +35854.736μs | submission.py |
ajhinh 🥉 | 1153206.609μs +1092322.389μs | t4.py |
A100
Nader 🥇 | 3627.491μs | submission.py |
Dante 🥈 | 13311.998μs +9684.507μs | submission.py |
mancala 🥉 | 15801.310μs +2489.312μs | submission.py |
H100
Nader 🥇 | 2275.541μs | submission.py |
Dante 🥈 | 6343.298μs +4067.757μs | submission.py |
Seraphim 🥉 | 6552.651μs +209.353μs | submission.py |