6 min read read
split json file into multiple fileshow to split json in pythonsplit large jsonsplit json using jqjson file splittersplit nested json fileonline json splittersplit json by sizejson tools for developerschunk json file pythonhow-to guidetutorialjson operations

How to Split JSON File into Multiple Files: Step-by-Step Guide

Imad Uddin

Full Stack Developer

How to Split JSON File into Multiple Files: Step-by-Step Guide

If you work with JSON files regularly, you've almost certainly run into a file that's just too big. Maybe it's a massive data export from an API, a database dump with millions of records, or a log file that's grown out of control. Whatever the source, large JSON files cause real problems: they're slow to open in editors, they choke memory when you try to parse them, and plenty of APIs and tools have strict file size limits.

Splitting a large JSON file into smaller pieces is the practical solution. I've had to do this more times than I can count when processing bulk API exports and preparing datasets for upload to systems with file size restrictions. In this guide, I'll walk through every method I've used: Python scripts (the most flexible option), an online splitter tool (fastest for quick jobs), and command-line utilities like jq (ideal for automation on Linux and macOS). We'll also cover splitting nested structures and handling some of the common errors you might hit along the way.

Why Split a JSON File?

There are several concrete reasons why breaking a JSON file into smaller chunks makes sense:

Performance improvements. Smaller files are faster to read, parse, and process. If you're loading a JSON file into memory, the difference between a 10MB file and a 500MB file is dramatic. Your text editor will thank you, too.

Memory constraints. Some systems simply can't handle massive JSON files. If your application loads the entire file into memory before processing (which is how most JSON parsers work), a large enough file will exhaust your available RAM and crash the process.

Upload and API limits. Many services impose file size limits on uploads. AWS Lambda payloads are capped at 6MB for synchronous invocation. Firebase has import limits. Plenty of internal tools have similar restrictions. Splitting your data into smaller files lets you work within these constraints.

Better organization. Large files are unwieldy. Splitting records into logical groups, say by date, category, or region, makes the data easier to navigate and reason about.

Team collaboration. When multiple people need to work with different subsets of the data, splitting the file up front avoids everyone having to download and parse the entire thing.

Version control friendliness. Git doesn't handle large files well. Splitting data into smaller, meaningful chunks makes diffs readable and merges manageable.

Parallel processing. If you're running data through a processing pipeline, splitting the input lets you process multiple chunks concurrently, which can significantly speed up the overall workflow.

JSON File Structures: What You're Splitting

Before you split a JSON file, you need to understand its structure, because the approach differs based on what you're working with.

An array of objects is the most common format you'll encounter:

JSON
[
  { "id": 1, "name": "Alice" },
  { "id": 2, "name": "Bob" },
  { "id": 3, "name": "Charlie" }
]

This is straightforward to split. You just slice the array into chunks of whatever size you need.

A nested JSON object requires a different approach:

JSON
{
  "users": [
    { "id": 1, "name": "Alice" },
    { "id": 2, "name": "Bob" }
  ],
  "metadata": { "count": 2 }
}

Here, you might want to split the

users
array into chunks while preserving the
metadata
, or you might want to extract each top-level key into its own file.

Line-delimited JSON (JSONL) is already semi-split by design:

JSON
{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}

Each line is a standalone JSON object, so splitting JSONL files is really just splitting them by line count.

Understanding which structure you're dealing with is the most important step before you start writing any code.

Method 1: Split JSON File in Python

Python is my go-to for this task because it gives you the most control and scales well from small files to very large ones. Here's how to do it step by step.

Basic Array Splitting

Python
import json

with open('large_file.json') as f:
    data = json.load(f)

Once the data is loaded, split it into chunks:

Python
chunk_size = 100  # records per file

for i in range(0, len(data), chunk_size):
    chunk = data[i:i+chunk_size]
    with open(f'chunk_{i//chunk_size + 1}.json', 'w') as f:
        json.dump(chunk, f, indent=4)

This creates files named chunk_1.json, chunk_2.json, and so on, each containing up to 100 records. You can adjust

chunk_size
to whatever number makes sense for your use case.

When to use this approach: When you have a large array of objects and need full control over the output format. It's also the easiest to integrate into larger data pipelines.

Making It Reusable with argparse

If you find yourself splitting JSON files frequently, turning the script into a proper CLI tool saves time:

Python
import argparse
import json

parser = argparse.ArgumentParser(description='Split a JSON array into multiple files')
parser.add_argument('input_file', help='Path to the input JSON file')
parser.add_argument('--size', type=int, default=100, help='Number of records per output file')
parser.add_argument('--output-dir', default='.', help='Directory for output files')
args = parser.parse_args()

with open(args.input_file) as f:
    data = json.load(f)

for i in range(0, len(data), args.size):
    chunk = data[i:i+args.size]
    output_path = f'{args.output_dir}/chunk_{i//args.size + 1}.json'
    with open(output_path, 'w') as f:
        json.dump(chunk, f, indent=2)
    print(f'Wrote {len(chunk)} records to {output_path}')

print(f'\nDone. Split {len(data)} records into {-(-len(data)//args.size)} files.')

Now you can run it like:

Bash
python split_json.py data.json --size 500 --output-dir ./chunks

Splitting by File Size Instead of Record Count

Sometimes you don't care about the number of records per file. You care about keeping each output file under a certain size (for example, under 5MB for an API upload limit). Here's how:

Python
import json
import os

def split_by_size(input_file, max_size_mb=5):
    """Split a JSON array into files that don't exceed max_size_mb."""
    with open(input_file) as f:
        data = json.load(f)

    max_bytes = max_size_mb * 1024 * 1024
    current_chunk = []
    chunk_num = 1

    for record in data:
        current_chunk.append(record)
        # Check size of current chunk
        chunk_json = json.dumps(current_chunk, indent=2)
        if len(chunk_json.encode('utf-8')) >= max_bytes:
            # Remove last record, write chunk, start new one
            current_chunk.pop()
            with open(f'chunk_{chunk_num}.json', 'w') as f:
                json.dump(current_chunk, f, indent=2)
            print(f'chunk_{chunk_num}.json: {len(current_chunk)} records')
            current_chunk = [record]
            chunk_num += 1

    # Write remaining records
    if current_chunk:
        with open(f'chunk_{chunk_num}.json', 'w') as f:
            json.dump(current_chunk, f, indent=2)
        print(f'chunk_{chunk_num}.json: {len(current_chunk)} records')

split_by_size('large_data.json', max_size_mb=5)

Handling Very Large Files with Streaming

If your JSON file is so large that loading it entirely into memory isn't practical (we're talking gigabytes), you can use the

ijson
library to stream-parse it:

Bash
pip install ijson
Python
import ijson
import json

def split_large_json(input_file, chunk_size=1000):
    """Stream-split a large JSON array without loading it all into memory."""
    chunk = []
    chunk_num = 1

    with open(input_file, 'rb') as f:
        for item in ijson.items(f, 'item'):
            chunk.append(item)
            if len(chunk) >= chunk_size:
                with open(f'chunk_{chunk_num}.json', 'w') as out:
                    json.dump(chunk, out, indent=2)
                print(f'Wrote chunk_{chunk_num}.json ({len(chunk)} records)')
                chunk = []
                chunk_num += 1

    if chunk:
        with open(f'chunk_{chunk_num}.json', 'w') as out:
            json.dump(chunk, out, indent=2)
        print(f'Wrote chunk_{chunk_num}.json ({len(chunk)} records)')

split_large_json('massive_file.json', chunk_size=5000)

This approach uses minimal memory regardless of how large the input file is, because it reads one record at a time instead of loading the entire file.

Method 2: Use Our Online JSON Splitter Tool

If you don't want to write any code, our online splitter handles the most common use cases right in your browser.

Try it here: merge-json-files.com/json-file-splitter

How to use it:

  1. Upload your JSON file
  2. Choose the number of records per file
  3. Click Split
  4. Download the resulting files

No setup, no installation, works on any device with a browser. It's particularly useful for one-off splits where writing a script would be overkill. I use it when I need to quickly break up an API export before uploading it somewhere else.

The tool processes everything locally in your browser, so your data stays private. It validates the JSON before splitting and handles the output file naming automatically.

Method 3: Command Line with jq (Linux/macOS)

For developers who live in the terminal,

jq
is an incredibly fast and powerful option for splitting JSON files.

Install jq:

Bash
sudo apt install jq   # Debian/Ubuntu
brew install jq       # macOS

Split a JSON Array into Chunks:

Bash
jq -c '.[]' large_file.json | split -l 100 - chunk_

This converts each array element to a single line, then uses

split
to create files of 100 lines each. The output files will be named chunk_aa, chunk_ab, and so on.

Convert each split file back to a proper JSON array:

Bash
for file in chunk_*; do
  jq -s '.' "$file" > "$file.json"
  rm "$file"
  mv "$file.json" "$file"
done

Split into a specific number of files:

If you want exactly 10 output files regardless of how many records there are:

Bash
total=$(jq 'length' large_file.json)
chunk_size=$(( (total + 9) / 10 ))
jq -c '.[]' large_file.json | split -l $chunk_size - part_

Extract specific ranges:

Bash
# Get records 0-99
jq '.[0:100]' large_file.json > first_100.json

# Get records 100-199
jq '.[100:200]' large_file.json > second_100.json

jq is fast, works natively on Linux and macOS, and is great for incorporating into shell scripts and automation pipelines. The syntax takes some getting used to, but once you're comfortable with it, you can do surprisingly complex transformations in a single line.

Splitting Nested JSON Structures

Not all JSON files are flat arrays. If your file has multiple top-level keys with different data types, you'll want to split by key rather than by array index.

Say your file looks like this:

JSON
{
  "users": [...],
  "admins": [...],
  "settings": {...}
}

You can split it by key in Python:

Python
import json

with open('nested.json') as f:
    data = json.load(f)

for key, value in data.items():
    with open(f'{key}.json', 'w') as f:
        json.dump(value, f, indent=2)
    print(f'Wrote {key}.json')

This creates

users.json
,
admins.json
, and
settings.json
, each containing just the data from that key.

For more complex nested structures where you need to split an inner array while preserving the parent structure:

Python
import json

with open('nested.json') as f:
    data = json.load(f)

users = data['users']
chunk_size = 100

for i in range(0, len(users), chunk_size):
    output = {
        'users': users[i:i+chunk_size],
        'metadata': data.get('metadata', {}),
        'chunk_info': {
            'chunk_number': i // chunk_size + 1,
            'total_records': len(users),
            'records_in_chunk': len(users[i:i+chunk_size])
        }
    }
    with open(f'users_chunk_{i//chunk_size + 1}.json', 'w') as f:
        json.dump(output, f, indent=2)

This preserves the overall structure while splitting the large inner array into manageable pieces.

Real-World Use Case: JSON API Pagination

Many APIs return paginated results. Here's a practical workflow for collecting paginated data and splitting it for downstream processing:

Python
import requests
import json

all_data = []
for page in range(1, 6):
    response = requests.get(f'https://api.example.com/data?page={page}')
    all_data.extend(response.json())

# Save the combined result
with open('combined.json', 'w') as f:
    json.dump(all_data, f)

# Then split it into chunks for upload to another system
chunk_size = 200
for i in range(0, len(all_data), chunk_size):
    chunk = all_data[i:i+chunk_size]
    with open(f'upload_batch_{i//chunk_size + 1}.json', 'w') as f:
        json.dump(chunk, f)

If you later need to recombine these split files, check out our guide on how to merge JSON files.

Splitting JSONL Files

JSONL (JSON Lines) files are simpler to split because each line is already an independent JSON object. You can use standard text-splitting tools:

Using split on Linux/macOS:

Bash
split -l 1000 data.jsonl chunk_ --additional-suffix=.jsonl

Using Python:

Python
chunk_size = 1000
chunk_num = 1
current_chunk = []

with open('data.jsonl') as f:
    for line in f:
        current_chunk.append(line)
        if len(current_chunk) >= chunk_size:
            with open(f'chunk_{chunk_num}.jsonl', 'w') as out:
                out.writelines(current_chunk)
            current_chunk = []
            chunk_num += 1

if current_chunk:
    with open(f'chunk_{chunk_num}.jsonl', 'w') as out:
        out.writelines(current_chunk)

This is memory-efficient since it reads line by line, and works with files of any size.

Tools Comparison Table

MethodBest ForTechnical SkillScalabilityFlexibility
Python ScriptAutomation, full controlIntermediate/AdvancedHandles any sizeFully customizable
Online ToolQuick one-off tasksBeginnerLimited by browser memoryBasic splitting
jq (CLI)Linux users, large filesIntermediateVery fastGood for arrays

Common Errors When Splitting JSON Files and How to Fix Them

JSONDecodeError usually means the input file is malformed. There might be a trailing comma, a missing bracket, or some other syntax issue. Validate the file with a JSON linter before trying to split it. Tools like jsonlint.com are helpful here.

TypeError: list indices must be integers means you're trying to use dictionary-style access on a list, or vice versa. Check your data structure. If you expected an array but got an object, your splitting logic needs to change.

PermissionError happens when you're trying to write output files to a directory you don't have access to. Change the output directory or adjust permissions.

MemoryError means your input file is too large for your system's available RAM. Use the streaming approach with

ijson
(covered above) or increase your system's memory. You could also split the file at the text level first using
split
before parsing as JSON.

UnicodeDecodeError occurs with files that aren't UTF-8 encoded. Open the file with the correct encoding (try

encoding='utf-8-sig'
for files with BOM markers, or
encoding='latin-1'
as a fallback) and re-save as UTF-8 before processing.

Best Practices When Splitting JSON Files

Validate before and after. Run the JSON through a validator before splitting to catch issues early. After splitting, validate each output file to make sure the chunks are valid JSON.

Keep backup copies. Always keep the original file intact. If something goes wrong during splitting, you want to be able to start over.

Use descriptive filenames. Names like

users_part_1.json
tell you more than
chunk_aa
. Include enough context in the filename that you can understand what's inside without opening it.

Add logging to your scripts. Print how many records went into each file and the total across all files. This makes it easy to verify that nothing was lost during the split.

Check the file size limit of your target platform. If you're splitting to meet an upload limit, verify that each chunk is actually under the limit. Record count doesn't always correlate directly with file size, especially if records vary in complexity.

Consider compression. If you're archiving or transferring the split files, compressing them with zip or tar.gz can save significant space and transfer time.

Preserve encoding. Make sure your output files use the same encoding as the input (UTF-8 in almost all cases). Don't let your text editor or script silently change the encoding.

Recap

Python is best for automation and complex splitting logic. You get full control over how records are divided, the output format, and error handling.

Online tools are perfect for non-coders or quick one-off tasks where writing a script isn't worth the effort.

jq on the command line is fast and efficient for large file processing, especially if you're already working in a Linux or macOS terminal.

Pick the method that fits your project and your comfort level with the tools.

Final Thoughts

Learning to split JSON files is one of those practical skills that saves you real time and headaches once you need it. From quick online tools to full-blown Python automation and fast CLI utilities, you've got options no matter what your technical background. The key is understanding your JSON structure first, picking the right tool for the job, and always validating the results.

Try our free tool at merge-json-files.com/json-file-splitter for fast, no-code splitting when you just need it done.

If you're working with batch processing, API integrations, or database imports, mastering JSON file splitting will pay for itself many times over. And if you ever need to put those files back together, our guide on how to merge JSON files covers every method for that too.

Related Tools: If you're dealing with complex nested structures, our JSON Flattener can help simplify your data before splitting.

Read More

All Articles