Splitting CSV files in PowerShell with Split-Csv

In the course of my work, I am often times dealing with large CSV files that need to be broken up into smaller files, for whatever reason.  If the CSV file has a header, I can’t simply use something that breaks up the file by number of lines because I need to retain the header for use in each file.  So, I wrote up a script that I call Split-Csv.

Split-Csv takes a file path and a batch size as parameters, Path and BatchSize, respectively:

Split-Csv <InputFilePath> -BatchSize <int>

Script contents:


Param(
    [Parameter(Mandatory=$True,Position=1,HelpMessage="File path")]
    [string]$Path,
    [Parameter(Mandatory=$True,HelpMessage="Batch size")]
    [int]$BatchSize
)

If(!(Test-Path $Path -ErrorAction SilentlyContinue)) {
    Write-Error "$Path not found."
    Return
}

$inputFile = Get-Item $Path

$ParentDirectory = Split-Path $Path

$strFilename = dir $Path | Select-Object BaseName,Extension

$strBaseName = $strFilename.BaseName
$strExtension = $strFilename.Extension

$objFile = Import-Csv $Path

$Count = [math]::ceiling($objFile.Count/$BatchSize)

$Length = "D" + ([math]::floor([math]::Log10($Count) + 1)).ToString()

1 .. $Count | ForEach-Object {
    $i = $_.ToString($Length)
    $Offset = $BatchSize * ($_ - 1)
    $outputFile = $ParentDirectory + "\" + $strBaseName + "-" + $i + $strExtension
    If($_ -eq 1) {
        $objFile | Select-Object -First $BatchSize | Export-Csv $outputFile -NoTypeInformation
    } Else {
        $objFile | Select-Object -First $BatchSize -Skip $Offset | Export-Csv $outputFile -NoTypeInformation
    }
}

Easy enough.

There are a couple of things of note.

1) I just drop these back into the same directory as the original; and,
2) I name the output files the same as the original appended with the number represented by their batch, zero-padded.

That’s it.

2 thoughts on “Splitting CSV files in PowerShell with Split-Csv

  1. Nice script, worked as expected, took me a little while to figure out what the “$Length = “D” + ([math]::floor([math]::Log10($Count) + 1)).ToString()” was doing, but after debugging it became clear

    Like

    1. @Steve V, the problem that I tend to have is that filenames are sorted alphabetically. So, I wanted it to be zero padded so that the number of digits was a consistent length and would sort properly. For those that read this later, the PowerShell formatting options offer the ability to pad digits by using D followed by the number of digits to pad; so if you need 5 digits in total and you have a number like 27, D5 will cause it to be formatted as 00027. Using Log10 with the “floor” average let’s me figure out the maximum number of digits that will be necessary.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s