Only downloading certain fields - chunk filesize for "full" TNG100 snapshot still significantly bigger than "mini" snapshots
Harley Brown
13 Aug
Hi,
Using a slight adaptation of the example code snippet from Task 11 of the Web-based API cookbook, I've been downloading only certain fields for only dark matter and stars from several TNG100-1 Snapshots. The only fields I'm downloading are "Coordinates" for dark-matter, and "ParticleIDs", "Coordinates", "Masses", and "GFM_StellarFormationTime" for stars. Here's the code I'm using for this:
#%% Download settings
# Target ss index
targ_ss_ind = 50
#[43, 50, 56]
# Param.s / fields to download
params = {'dm': 'Coordinates',
'stars': 'ParticleIDs,Coordinates,Masses,GFM_StellarFormationTime'}
#%% API "get" func setup (copied directly from API Cookbook: https://www.tng-project.org/data/docs/api/)
import requests
base_url = 'http://www.tng-project.org/api/TNG100-1/'
headers = {"api-key":"API Key that I've removed for post"}
def get(path, params=None):
# make HTTP GET request to path
headers = {"api-key":"API Key that I've removed for post"}
r = requests.get(path, params=params, headers=headers)
# raise exception if response code is not HTTP SUCCESS (200)
r.raise_for_status()
if r.headers['content-type'] == 'application/json':
return r.json() # parse json responses automatically
if 'content-disposition' in r.headers:
filename = r.headers['content-disposition'].split("filename=")[1]
with open(filename, 'wb') as f:
f.write(r.content)
return filename # return the filename string
return r
#%% Actual download (following cookbook example)
# Pulling count of files / "chunk" per sim. snapshot
sim_metadata = get(base_url)
chunk_per_ss_count = sim_metadata['num_files_snapshot']
# [For restarting interrupted downloads]
start = 0
# Loop over all snapshot chunks, only downloading requested parameters
for i in range(start, chunk_per_ss_count):
ts = timeit.default_timer()
file_url = base_url + "files/snapshot-" + str(targ_ss_ind) + "." + str(i) + ".hdf5"
saved_filename = get(file_url, params)
te = timeit.default_timer()
print (saved_filename + " (" + str(i+1) + "/" + str(chunk_per_ss_count) +") [ Time: "+str(round(te-ts, 5))+"s ]")
Strangely, I've noticed the filesizes of the individual snapshot "chunk" this code acquires for me for SnapNum=50 are much larger than those for either SnapNum=43 or SnapNum=56 (~350 MB vs ~180 MB for both 43 and 56). It did occur to me that SnapNum=50 is one of the "full" snapshots which have many more fields available than the other "mini" snapshots and hence have a total snapshot filesize ~4x bigger than otherwise, but (assuming the above code is doing what I believe it's doing) I shouldn't be downloading any of those extra fields so I am confused as to why the filesizes for just this one snapshot are bigger anyway.
If I'm not downloading any of the additional fields offered by the "full" snapshots, why might my filesizes be bigger for SnapNum=50?
Dylan Nelson
13 Aug
Can you post the output of h5ls -rv chunk.hdf5 for two chunks of different sizes?
Hi,
Using a slight adaptation of the example code snippet from Task 11 of the Web-based API cookbook, I've been downloading only certain fields for only dark matter and stars from several TNG100-1 Snapshots. The only fields I'm downloading are "Coordinates" for dark-matter, and "ParticleIDs", "Coordinates", "Masses", and "GFM_StellarFormationTime" for stars. Here's the code I'm using for this:
Strangely, I've noticed the filesizes of the individual snapshot "chunk" this code acquires for me for SnapNum=50 are much larger than those for either SnapNum=43 or SnapNum=56 (~350 MB vs ~180 MB for both 43 and 56). It did occur to me that SnapNum=50 is one of the "full" snapshots which have many more fields available than the other "mini" snapshots and hence have a total snapshot filesize ~4x bigger than otherwise, but (assuming the above code is doing what I believe it's doing) I shouldn't be downloading any of those extra fields so I am confused as to why the filesizes for just this one snapshot are bigger anyway.
If I'm not downloading any of the additional fields offered by the "full" snapshots, why might my filesizes be bigger for SnapNum=50?
Can you post the output of
h5ls -rv chunk.hdf5
for two chunks of different sizes?Hi Dylan Nelson,
Here's the requested output for the 0th chunk of Snapshot 43:
and here's the output for the 0th chunk for the (anomalously big) snapshot 50:
It seems the difference is due to double (float64) precision
Coordinates
in the full snapshots, but single (float32) in the mini snapshots.I had forgotten about this difference. For some applications, e.g. reconstruction of the Voronoi mesh, double precision is important, but usually not.