IllustrisTNG - Only downloading certain fields - chunk filesize for "full" TNG100 snapshot still significantly bigger than "mini" snapshots

Harley Brown

13 Aug '24

Hi,

Using a slight adaptation of the example code snippet from Task 11 of the Web-based API cookbook, I've been downloading only certain fields for only dark matter and stars from several TNG100-1 Snapshots. The only fields I'm downloading are "Coordinates" for dark-matter, and "ParticleIDs", "Coordinates", "Masses", and "GFM_StellarFormationTime" for stars. Here's the code I'm using for this:

#%% Download settings

# Target ss index
targ_ss_ind = 50
#[43, 50, 56]

# Param.s / fields to download
params = {'dm': 'Coordinates',
          'stars': 'ParticleIDs,Coordinates,Masses,GFM_StellarFormationTime'}

#%% API "get" func setup (copied directly from API Cookbook: https://www.tng-project.org/data/docs/api/)

import requests

base_url = 'http://www.tng-project.org/api/TNG100-1/'
headers = {"api-key":"API Key that I've removed for post"}

def get(path, params=None):
    # make HTTP GET request to path
    headers = {"api-key":"API Key that I've removed for post"}
    r = requests.get(path, params=params, headers=headers)

    # raise exception if response code is not HTTP SUCCESS (200)
    r.raise_for_status()

    if r.headers['content-type'] == 'application/json':
        return r.json() # parse json responses automatically

    if 'content-disposition' in r.headers:
        filename = r.headers['content-disposition'].split("filename=")[1]
        with open(filename, 'wb') as f:
            f.write(r.content)
            return filename # return the filename string

    return r

#%% Actual download (following cookbook example)

# Pulling count of files / "chunk" per sim. snapshot
sim_metadata = get(base_url)
chunk_per_ss_count = sim_metadata['num_files_snapshot']

# [For restarting interrupted downloads]
start = 0

# Loop over all snapshot chunks, only downloading requested parameters
for i in range(start, chunk_per_ss_count):
    ts = timeit.default_timer()
    file_url = base_url + "files/snapshot-" + str(targ_ss_ind) + "." + str(i) + ".hdf5"
    saved_filename = get(file_url, params)
    te = timeit.default_timer()
    print (saved_filename + " (" + str(i+1) + "/" + str(chunk_per_ss_count) +") [ Time: "+str(round(te-ts, 5))+"s ]")

Strangely, I've noticed the filesizes of the individual snapshot "chunk" this code acquires for me for SnapNum=50 are much larger than those for either SnapNum=43 or SnapNum=56 (~350 MB vs ~180 MB for both 43 and 56). It did occur to me that SnapNum=50 is one of the "full" snapshots which have many more fields available than the other "mini" snapshots and hence have a total snapshot filesize ~4x bigger than otherwise, but (assuming the above code is doing what I believe it's doing) I shouldn't be downloading any of those extra fields so I am confused as to why the filesizes for just this one snapshot are bigger anyway.

If I'm not downloading any of the additional fields offered by the "full" snapshots, why might my filesizes be bigger for SnapNum=50?

Dylan Nelson

13 Aug '24

Can you post the output of h5ls -rv chunk.hdf5 for two chunks of different sizes?

Harley Brown

1
14 Aug '24

Hi Dylan Nelson,

Here's the requested output for the 0th chunk of Snapshot 43:

h5ls -rv snap_043.0.hdf5
Opened "snap_043.0.hdf5" with sec2 driver.
/                        Group
    Location:  1:96
    Links:     1
/Header                  Group
    Attribute: BoxSize scalar
        Type:      native double
        Data:  75000
    Attribute: Composition_vector_length scalar
        Type:      native int
        Data:  0
    Attribute: CutoutRequest scalar
        Type:      variable-length null-terminated UTF-8 string
        Data:  "PartType1=Coordinates+PartType4=Coordinates,GFM_StellarFormationTime,Masses,ParticleIDs"
    Attribute: Flag_Cooling scalar
        Type:      native int
        Data:  1
    Attribute: Flag_DoublePrecision scalar
        Type:      native int
        Data:  0
    Attribute: Flag_Feedback scalar
        Type:      native int
        Data:  1
    Attribute: Flag_Metals scalar
        Type:      native int
        Data:  0
    Attribute: Flag_Sfr scalar
        Type:      native int
        Data:  1
    Attribute: Flag_StellarAge scalar
        Type:      native int
        Data:  0
    Attribute: Git_commit scalar
        Type:      40-byte null-padded ASCII string
        Data:  "d203ec8b07c7e2bdda5f608aa0babea46d603699"
    Attribute: Git_date scalar
        Type:      29-byte null-padded ASCII string
        Data:  "Thu Apr 7 14:14:27 2016 +0200"
    Attribute: HubbleParam scalar
        Type:      native double
        Data:  0.6774
    Attribute: MassTable {6}
        Type:      native double
        Data:
            (0) 0, 0.000505574, 0, 4.71973e-05, 0, 0
    Attribute: NumFilesPerSnapshot scalar
        Type:      native int
        Data:  448
    Attribute: NumPart_ThisFile {6}
        Type:      native int
        Data:
            (0) 0, 13180471, 0, 0, 917592, 0
    Attribute: NumPart_Total {6}
        Type:      native unsigned int
        Data:
            (0) 1346317033, 1733600704, 0, 0, 230516819, 34202
    Attribute: NumPart_Total_HighWord {6}
        Type:      native unsigned int
        Data:
            (0) 1, 1, 0, 0, 0, 0
    Attribute: Omega0 scalar
        Type:      native double
        Data:  0.3089
    Attribute: OmegaBaryon scalar
        Type:      native double
        Data:  0.0486
    Attribute: OmegaLambda scalar
        Type:      native double
        Data:  0.6911
    Attribute: Redshift scalar
        Type:      native double
        Data:  1.30238
    Attribute: Time scalar
        Type:      native double
        Data:  0.434333
    Attribute: UnitLength_in_cm scalar
        Type:      native double
        Data:  3.08568e+21
    Attribute: UnitMass_in_g scalar
        Type:      native double
        Data:  1.989e+43
    Attribute: UnitVelocity_in_cm_per_s scalar
        Type:      native double
        Data:  100000
    Location:  1:800
    Links:     1
/PartType1               Group
    Location:  1:158173876
    Links:     1
/PartType1/Coordinates   Dataset {13180471/13180471, 3/3}
    Location:  1:7952
    Links:     1
    Storage:   158165652 logical bytes, 158165652 allocated bytes, 100.00% utilization
    Type:      native float
/PartType4               Group
    Location:  1:158175180
    Links:     1
/PartType4/Coordinates   Dataset {917592/917592, 3/3}
    Location:  1:158174908
    Links:     1
    Storage:   11011104 logical bytes, 11011104 allocated bytes, 100.00% utilization
    Type:      native float
/PartType4/GFM_StellarFormationTime Dataset {917592/917592}
    Location:  1:169187356
    Links:     1
    Storage:   3670368 logical bytes, 3670368 allocated bytes, 100.00% utilization
    Type:      native float
/PartType4/Masses        Dataset {917592/917592}
    Location:  1:169187628
    Links:     1
    Storage:   3670368 logical bytes, 3670368 allocated bytes, 100.00% utilization
    Type:      native float
/PartType4/ParticleIDs   Dataset {917592/917592}
    Location:  1:169187900
    Links:     1
    Storage:   7340736 logical bytes, 7340736 allocated bytes, 100.00% utilization
    Type:      native unsigned long

and here's the output for the 0th chunk for the (anomalously big) snapshot 50:

h5ls -rv snap_050.0.hdf5
Opened "snap_050.0.hdf5" with sec2 driver.
/                        Group
    Location:  1:96
    Links:     1
/Header                  Group
    Attribute: BoxSize scalar
        Type:      native double
        Data:  75000
    Attribute: Composition_vector_length scalar
        Type:      native int
        Data:  0
    Attribute: CutoutRequest scalar
        Type:      variable-length null-terminated UTF-8 string
        Data:  "PartType1=Coordinates+PartType4=Coordinates,GFM_StellarFormationTime,Masses,ParticleIDs"
    Attribute: Flag_Cooling scalar
        Type:      native int
        Data:  1
    Attribute: Flag_DoublePrecision scalar
        Type:      native int
        Data:  0
    Attribute: Flag_Feedback scalar
        Type:      native int
        Data:  1
    Attribute: Flag_Metals scalar
        Type:      native int
        Data:  0
    Attribute: Flag_Sfr scalar
        Type:      native int
        Data:  1
    Attribute: Flag_StellarAge scalar
        Type:      native int
        Data:  0
    Attribute: Git_commit scalar
        Type:      40-byte null-padded ASCII string
        Data:  "d203ec8b07c7e2bdda5f608aa0babea46d603699"
    Attribute: Git_date scalar
        Type:      29-byte null-padded ASCII string
        Data:  "Thu Apr 7 14:14:27 2016 +0200"
    Attribute: HubbleParam scalar
        Type:      native double
        Data:  0.6774
    Attribute: MassTable {6}
        Type:      native double
        Data:
            (0) 0, 0.000505574, 0, 4.71973e-05, 0, 0
    Attribute: NumFilesPerSnapshot scalar
        Type:      native int
        Data:  448
    Attribute: NumPart_ThisFile {6}
        Type:      native int
        Data:
            (0) 0, 13380791, 0, 0, 839606, 0
    Attribute: NumPart_Total {6}
        Type:      native unsigned int
        Data:
            (0) 1309059993, 1733600704, 0, 3467201408, 279081778, 35557
    Attribute: NumPart_Total_HighWord {6}
        Type:      native unsigned int
        Data:
            (0) 1, 1, 0, 2, 0, 0
    Attribute: Omega0 scalar
        Type:      native double
        Data:  0.3089
    Attribute: OmegaBaryon scalar
        Type:      native double
        Data:  0.0486
    Attribute: OmegaLambda scalar
        Type:      native double
        Data:  0.6911
    Attribute: Redshift scalar
        Type:      native double
        Data:  0.997294
    Attribute: Time scalar
        Type:      native double
        Data:  0.500677
    Attribute: UnitLength_in_cm scalar
        Type:      native double
        Data:  3.08568e+21
    Attribute: UnitMass_in_g scalar
        Type:      native double
        Data:  1.989e+43
    Attribute: UnitVelocity_in_cm_per_s scalar
        Type:      native double
        Data:  100000
    Location:  1:800
    Links:     1
/PartType1               Group
    Location:  1:321147208
    Links:     1
/PartType1/Coordinates   Dataset {13380791/13380791, 3/3}
    Location:  1:7952
    Links:     1
    Storage:   321138984 logical bytes, 321138984 allocated bytes, 100.00% utilization
    Type:      native double
/PartType4               Group
    Location:  1:321148512
    Links:     1
/PartType4/Coordinates   Dataset {839606/839606, 3/3}
    Location:  1:321148240
    Links:     1
    Storage:   20150544 logical bytes, 20150544 allocated bytes, 100.00% utilization
    Type:      native double
/PartType4/GFM_StellarFormationTime Dataset {839606/839606}
    Location:  1:341300128
    Links:     1
    Storage:   3358424 logical bytes, 3358424 allocated bytes, 100.00% utilization
    Type:      native float
/PartType4/Masses        Dataset {839606/839606}
    Location:  1:341300400
    Links:     1
    Storage:   3358424 logical bytes, 3358424 allocated bytes, 100.00% utilization
    Type:      native float
/PartType4/ParticleIDs   Dataset {839606/839606}
    Location:  1:341300672
    Links:     1
    Storage:   6716848 logical bytes, 6716848 allocated bytes, 100.00% utilization
    Type:      native unsigned long

Dylan Nelson

15 Aug '24

It seems the difference is due to double (float64) precision Coordinates in the full snapshots, but single (float32) in the mini snapshots.

I had forgotten about this difference. For some applications, e.g. reconstruction of the Voronoi mesh, double precision is important, but usually not.

Public Data Access Overview / Discussion Forum

Only downloading certain fields - chunk filesize for "full" TNG100 snapshot still significantly bigger than "mini" snapshots