I'm interested in the merger history of individual subhalos, in hopes of deriving aggregate results from the individual histories. The required processing time to do so, however, is large enough such that writing Python code against downloaded group catalogs and merger tree files appears the way to go. To begin this analysis, I need to get all of the unique-across-all-snapshots subhalo IDs in each snapshot, but I can't figure out a way to do that without using the browser or web-based APIs, which I'd rather not do because of said performance issues. Is there a way, then, to get all of those IDs using just Python and the downloaded data? Also, it would simplify my analysis if it were true that every ID'ed subhalo is included in a merger tree. Is that the case?
The term "IDs" (refering to halos or subhalos) is interchangeable with "indices", e.g. a list of all the "SubfindIDs" (edited, this was always meant to say SubfindIDs) in a snapshot is just a list of integers from 0 to the total number of subhalos in that snapshot, minus one:
import illustris_python as il
basePath = './Illustris-1/'
h = il.groupcat.loadHeader(basePath,135)
>>> h['Nsubgroups_Total']
4366546
>>> inds = np.arange(h['Nsubgroups_Total'])
>>> inds.min(), inds.max(), inds.size
(0, 4366545, 4366546)
It isn't absolutely guaranteed that all SubfindIDs (across all snapshots) will exist in some tree, at least in the SubLink trees. But, it should be rare, especially with objects which are large enough to be at all resolved.
I'd suggest to just check for this case, and handle it if/when it comes up. E.g.
w = np.where( (f['SubfindID'] == subid) & (f['SnapNum'] == snap) )
if len(w[0]) == 0:
raise Exception('This subhalo not in any tree (at least in file f).')
The term "IDs" (refering to halos or subhalos) is interchangeable with "indices", e.g. a list of all the "SubhaloIDs" in a snapshot is just a list of integers from 0 to the total number of subhalos in that snapshot, minus one:
This appears to imply that a single subhaloID is not unique across all snapshots, and as such, could not be "the unique ID (within the whole simulation) of the corresponding subhalo", which is what I believe I need as the 'id' input for, for example, sublink.loadTree(). If that's the case, what's the ID that I'd need for loadTree() and numMergers(), and is there a way to obtain all of the ones in a particular snapshot using only Python and downloaded data?
Dylan Nelson
29 Apr '16
Hi,
Apologies, this is the difference between SubfindID and SubhaloID.
The SubfindID is what is input to the sublink.loadTree() function, this corresponds to indices for subhalos in the group catalogs, as I described above.
The SubhaloID is essentially an internal value for the trees and you can ignore this, unless digging into how the trees are stored on disk and/or loaded efficiently.
Jonathan Mack
4 May '16
Thanks for all your time; it's really helped me out a lot. One more (hopefully last) question: even if SubhaloID isn't necessarily meant to be used as unique-but-constant-across-all-snapshots-and-trees-of-a-run identifier, can I do so anyway? For instance:
It looks like I can use the combination of SubfindID and SnapNum to uniquely identify a subhalo, but it also appears that the format of all the Sublink ...ID fields (MainLeafProgenitorID used as the example above), is in the form of SubhaloID, implying that SubhaloID should be what I should use to establish tree relationships between subhalos. Thoughts?
Dylan Nelson
4 May '16
Hi,
You're right you can use either the SubfindID,SnapNum pair of the SubhaloID in order to uniquely identify a subhalo in a given simulation.
And as you say, fields such as LastProgenitorID or MainLeafProgenitorID refer to the SubhaloID of another entry in the tree.
I'm interested in the merger history of individual subhalos, in hopes of deriving aggregate results from the individual histories. The required processing time to do so, however, is large enough such that writing Python code against downloaded group catalogs and merger tree files appears the way to go. To begin this analysis, I need to get all of the unique-across-all-snapshots subhalo IDs in each snapshot, but I can't figure out a way to do that without using the browser or web-based APIs, which I'd rather not do because of said performance issues. Is there a way, then, to get all of those IDs using just Python and the downloaded data? Also, it would simplify my analysis if it were true that every ID'ed subhalo is included in a merger tree. Is that the case?
Thanks!
Hi Jonathan,
The term "IDs" (refering to halos or subhalos) is interchangeable with "indices", e.g. a list of all the "SubfindIDs" (edited, this was always meant to say SubfindIDs) in a snapshot is just a list of integers from 0 to the total number of subhalos in that snapshot, minus one:
It isn't absolutely guaranteed that all SubfindIDs (across all snapshots) will exist in some tree, at least in the SubLink trees. But, it should be rare, especially with objects which are large enough to be at all resolved.
I'd suggest to just check for this case, and handle it if/when it comes up. E.g.
This appears to imply that a single subhaloID is not unique across all snapshots, and as such, could not be "the unique ID (within the whole simulation) of the corresponding subhalo", which is what I believe I need as the 'id' input for, for example,
sublink.loadTree()
. If that's the case, what's the ID that I'd need forloadTree()
andnumMergers()
, and is there a way to obtain all of the ones in a particular snapshot using only Python and downloaded data?Hi,
Apologies, this is the difference between
SubfindID
andSubhaloID
.The
SubfindID
is what is input to thesublink.loadTree()
function, this corresponds to indices for subhalos in the group catalogs, as I described above.The
SubhaloID
is essentially an internal value for the trees and you can ignore this, unless digging into how the trees are stored on disk and/or loaded efficiently.Thanks for all your time; it's really helped me out a lot. One more (hopefully last) question: even if
SubhaloID
isn't necessarily meant to be used as unique-but-constant-across-all-snapshots-and-trees-of-a-run identifier, can I do so anyway? For instance:gives
It looks like I can use the combination of
SubfindID
andSnapNum
to uniquely identify a subhalo, but it also appears that the format of all the Sublink...ID
fields (MainLeafProgenitorID
used as the example above), is in the form ofSubhaloID
, implying thatSubhaloID
should be what I should use to establish tree relationships between subhalos. Thoughts?Hi,
You're right you can use either the
SubfindID,SnapNum
pair of theSubhaloID
in order to uniquely identify a subhalo in a given simulation.And as you say, fields such as
LastProgenitorID
orMainLeafProgenitorID
refer to theSubhaloID
of another entry in the tree.You've answered all my questions. Thanks so much!