I am trying to figure out how to download image data of as many galaxies as possible, as high resolution as possible. I'm particularly interested in the data being separated into types (e.g. "spirals", "ellipticals", etc.) if that is possible.
I'm having difficulty understanding the guide for accessing the data. Is there a simple way to do a bulk download like this?
Dylan Nelson
14 Sep '18
Hi Alex,
Are you just interested in PNGs, or do you need the images in actual scientific units?
There aren't any simple measurements of type (e.g. spiral vs elliptical) in the group catalogs, but there are a few supplementary catalogs which have some relevant information. If you download one of these, you can use it to make a selection of 'subhalo IDs' for each type, then you could run a wget command to download the images for all the subhalos in each list.
You can look either at the "(b) Photometric Non-Parametric Stellar Morphologies" catalog, then you will maybe need to read a bit in the reference what "Gini,M20" means and how to use it to separate out the types you're interested in. Or you could use the "(c) Stellar Circularities" catalog, taking high values of the "CircAbove07Frac" parameter as indicators of a disk.
I am just interested in PNGs (at least, at the moment). This would be for an image processing experiment so scale is not super important.
I see the list of subhalo commands (I think) but I confess I'm not sure how to construct a wget or any other type of download right now. I'm at square one. The documentation is a little overwhelming. Is this the right page to be looking at for just basic downloading? Do I need to register for an API key? Forgive my ignorance.
where the number 132699 is the subhalo ID, and this is the only thing you need to change.
To get a list of subhalo IDs, I suggest you download the Stellar Circularities supplemental catalog for Illustris-1 at z=0. This is an HDF5 file, described on the documentation page - one entry is SubfindID, this is what you need. You could just try to get the first 10 or 100 using this procedure, then later sort them by the CircAbove07Frac field as I mentioned above.
To actually run the wget command, you will need to register and get an API key, and follow the syntax e.g. shown at the top of the galaxy observatory page.
Alexander Gruber
13 Oct '18
Thanks for all your help. I've got it working well enough to have downloaded a few pngs just using a wget on the first 100 subhalo IDs. I've gone through the data access guide for loading the group 135 data and have that working now, but I'm having trouble figuring out how to open that stellar circularities catalog.
Dylan Nelson
13 Oct '18
Hi Alexander,
This is just a HDF5 file, you can use "h5py" in python to load it, and its webpage has some simple tutorials to help there.
Alexander Gruber
14 Oct '18
I figured out how to get a [SubfindID, CircAbove07Frac] list and sort it by CircAbove07Frac, then took the SubfindID entries and put them into a file list of the form you gave above. It seems like most of them are not getting found, though. Does every subhalo have an image?
Dylan Nelson
14 Oct '18
No, only relatively large subhalos have images, and probably your list is dominated by small things which aren't really of interest to you. If you apply the same criterion as the images (M* > 10^10 Msun), then you should find most all of them.
Alexander Gruber
14 Nov '18
Sorry to bother you again-- I have two more questions. If I should be posting them as separate threads, please let me know and I'll do so.
The first question is probably dumb, but how do I link up a supplementary data catalog to use with the main group catalog at the same time? Earlier, when I sorted the SubfindIDs by CircAbove07Frac, I was using stellar_circs.hdf5. If I want to get, say, the SubhaloMass associated with that SubfindID, the SubhaloMass is located in groups_135.0.hdf5. Can these hdf5s cooperate together so that this is possible?
Second, I noticed that there is some discussion of (relative) coordinates of particles in the properties for halos and subhalos. Does this imply that there is a way to access point cloud data-- specifically, to get a list of coordinates corresponding to the stars/other bodies that comprise a galaxy? Currently I've been trying to reverse engineer data in this form the images via several methods (e.g. sampling points using the image as a histogram distribution), but it occurs to me that this data may exist already and I'm being redundant.
Dylan Nelson
14 Nov '18
Hi Alex,
Yes the SubfindID is actually the index into the group catalog, at that snapshot. You can load all the fields of a single subhalo with the illustris_python.groupcat.loadSingle() function of the scripts (see examples). Just a caution that "groups_135.0.hdf5" is one of many "chunks", careful here of the indexing.
Second, yes you can always obtain all the member particles/cells of a given subhalo/halo. If you have the snapshot downloaded, you can directly use the illustris_python.snapshot.loadSubhalo() function of the helper scripts. If you haven't downloaded the snapshot, you can get a "cutout" directly from the web-API using [base]/subhalos/{id}/cutout.hdf5 (see docs).
Alexander Gruber
7 Jan '19
Thanks-- I pulled the subhalos. A couple questions about the results. It seems like many of the subhalos consist of more than one galaxy that appear topologically separate, e.g. these two:
and also there are some (quite a few) subhalos consisting of only a few sparse stars, e.g.
I suppose this was to be expected as subhalos and galaxies aren't the same thing, and there are obvious difficulties in defining what constitutes a single galaxy based purely on coordinate data (particularly in the case of mergers or big interacting clusters). Is there a way among the other physical parameters to separate the stars into galaxy-like groups? In other words, I recall you saying there isn't a simple measurement of galaxy type-- is there a measurement to select discrete galaxies?
Dylan Nelson
7 Jan '19
Hi Alex,
I think you are seeing galaxies wrap around the periodic box, as here:
You mean it could be the result of wraparound? That is certainly possible for a few of the examples I've seen, but I'm not sure it could be for most of them. For example the following set is rotated to be more or less viewed head through one of the coordinate planes, and there would definitely have to be more than one cluster, whether or not there is wraparound.
or, for a more extreme example, the subhalo with the greatest size point cloud looks like this on my machine
have I misunderstood in some way how these are structured?
Dylan Nelson
9 Jan '19
Can you provide the run,snap,subhalo_Id of that first example?
Alexander Gruber
10 Jan '19
That would be subhalo 62 in snap 135 of Illustris 3.
Dylan Nelson
10 Jan '19
Hi Alex,
This of course a satellite of the first group. I checked and loaded the dark matter, stars, and gas particles, but found them all localized and not quite as you show above. For instance, for the gas:
In [9]: pos_gas[:,0].max() - pos_gas[:,0].min()
Out[9]: 41.335938
In [10]: pos_gas[:,1].max() - pos_gas[:,1].min()
Out[10]: 32.875
In [11]: pos_gas[:,2].max() - pos_gas[:,2].min()
Out[11]: 42.847656
In [15]: pos_gas.shape
Out[15]: (143, 3)
In [16]: pos_gas[:,0].mean()
Out[16]: 74701.11
Maybe verify you have the same numbers above, and check that you are loading the correct particles?
Yes your numbers are correct (halo 62, not subhalo 62), and it's true this halo spans ~1.5 Mpc. This is slightly large for its mass, and in this case I would guess a large halo-halo merger has just taken place, such that what you are effectively seeing is two (or more) halos "bridged" into a single object. This "FoF bridging" can always occur since the FoF algorithm will at some point link together two previously distinct structures. If you look at the merger tree of this halo, you'll probably be able to spot the distinct (large) progenitor branches. If you look at the subhalos, you see the first three have very large M ~ 10^11.1, and the the fourth has M ~ 10.1 (i.e. the first true satellite, the other three being roughly equal mass, essentially central galaxies).
Alexander Gruber
11 Jan '19
I see-- so, I guess then that, in general, Subhalos more likely will correspond to distinct galaxies than Halos?
Is the intuition here that Subhalos correspond to clusters and Halos correspond to clusters that are close enough to be significantly interacting?
Dylan Nelson
11 Jan '19
Yes you should always use 'subhalos' as 'galaxies'. A general medium-sized halo will have a large central galaxy and many satellite galaxies, each of which will be picked up as separate subhalos.
Alexander Gruber
22 Mar '19
I just have one more question! I am looking at the nonparametric morphologies catalog and have downloaded the data. Now I am trying to match up the data to the previous Subhalo data I downloaded. The morphology catalog data gives me SubfindIDs. I got the SubhaloData using il.snapshot.loadSubhalo(...) with what I assume are SubhaloIDs. How do I figure out which SubhaloID is associated with a SubfindID?
Dylan Nelson
22 Mar '19
Same thing, we use "subhalo" and "subfind" as well as "id" and "index" all interchangeably.
I see-- that's what I had thought. What confused me is that some of the entries under SubfindID_camX went up to very high values (highest was 627325). My impression was the SubhaloIDs only went up to about 60k.
I should say mention most of the SubfindIDs I get from nonparametric_morphologies are above 60604. I suspect something is happening with offsets here but I'm not sure how to fix the issue?
Dylan Nelson
24 Mar '19
Hi Alex,
Are you sure you're at the right sim (basePath)? Illustris-1, at z=0, has 4366546 subhalos and your load commands work ok for me.
Oh geez, that must be the issue. And now that really explains why I was having trouble with the circularities catalog earlier. I think what I have must be Illustris-3, since I got the file through following the Getting Started Guide.
I don't suppose there is Gini,M20,C data is available somewhere for Illustris-3? Or some way to convert it from Illustris-1? I read the Gini,M20 paper you suggested and I think it would be interesting to correlate that data with the results of the analysis I've been doing, but that evidently has all been on Illustris-3. If the Illustris-1 data is more detailed, redoing it may be somewhat computationally intractable (on this 2 dollar computer I'm using, at least).
Dylan Nelson
25 Mar '19
Hi Alex,
Many of the catalogs are available for Illustris-3 but unfortunately not Gini,M20 (nor the mock images). Galaxies at this resolution are getting pretty unresolved, so it isn't likely that we would scientifically trust such spatially resolved morphological measurements.
Alexander Gruber
3 Apr '19
I see, that does make sense. I lack the computing power to go beyond Illustris-3 on my personal machine, so I suppose I'll have to write up the code to do the analysis on Illustris-1 and make friends with somebody with a lot of GPUs.
Just one more question, and I think this should be my last for a while. What angles are the four cameras viewing from? Are they consistent between galaxies-- for example, cam0 always views from the xy plane, cam1 from the yz plane, etc.-- or do they change between galaxies-- for example, viewing from planes defined by PCA on the galaxy?
Dylan Nelson
3 Apr '19
Hi Alex,
You can find the details of these 4 cameras in Torrey+ (2015).
You may also want to consider testing out the new JupyterLab service, to see if it might help your work.
Wang Peng
14 Jun '20
Dear Dylan,
How can I download hundreds of images of given subhalo id from TGN300-1?
where the number 331453 is the subhalo ID, and this is the part you would change for each line of the text file.
Note: there are no pre-generated "stellar_mocks" for TNG300, so these are stellar light visualizations on-the-fly. This is one option. Please see the documentation under {vis_query} for information.
I am trying to figure out how to download image data of as many galaxies as possible, as high resolution as possible. I'm particularly interested in the data being separated into types (e.g. "spirals", "ellipticals", etc.) if that is possible.
I'm having difficulty understanding the guide for accessing the data. Is there a simple way to do a bulk download like this?
Hi Alex,
Are you just interested in PNGs, or do you need the images in actual scientific units?
There aren't any simple measurements of type (e.g. spiral vs elliptical) in the group catalogs, but there are a few supplementary catalogs which have some relevant information. If you download one of these, you can use it to make a selection of 'subhalo IDs' for each type, then you could run a wget command to download the images for all the subhalos in each list.
You can look either at the "(b) Photometric Non-Parametric Stellar Morphologies" catalog, then you will maybe need to read a bit in the reference what "Gini,M20" means and how to use it to separate out the types you're interested in. Or you could use the "(c) Stellar Circularities" catalog, taking high values of the "CircAbove07Frac" parameter as indicators of a disk.
I am just interested in PNGs (at least, at the moment). This would be for an image processing experiment so scale is not super important.
I see the list of subhalo commands (I think) but I confess I'm not sure how to construct a wget or any other type of download right now. I'm at square one. The documentation is a little overwhelming. Is this the right page to be looking at for just basic downloading? Do I need to register for an API key? Forgive my ignorance.
Hi Alex,
An easy way is maybe to wget a list of links that you first put in a text file.
The links will look like:
http://www.illustris-project.org/api/Illustris-1/snapshots/135/subhalos/132699/stellar_mocks/image_fof.png
where the number
132699
is the subhalo ID, and this is the only thing you need to change.To get a list of subhalo IDs, I suggest you download the Stellar Circularities supplemental catalog for Illustris-1 at z=0. This is an HDF5 file, described on the documentation page - one entry is
SubfindID
, this is what you need. You could just try to get the first 10 or 100 using this procedure, then later sort them by theCircAbove07Frac
field as I mentioned above.To actually run the wget command, you will need to register and get an API key, and follow the syntax e.g. shown at the top of the galaxy observatory page.
Thanks for all your help. I've got it working well enough to have downloaded a few pngs just using a wget on the first 100 subhalo IDs. I've gone through the data access guide for loading the group 135 data and have that working now, but I'm having trouble figuring out how to open that stellar circularities catalog.
Hi Alexander,
This is just a HDF5 file, you can use "h5py" in python to load it, and its webpage has some simple tutorials to help there.
I figured out how to get a [SubfindID, CircAbove07Frac] list and sort it by CircAbove07Frac, then took the SubfindID entries and put them into a file list of the form you gave above. It seems like most of them are not getting found, though. Does every subhalo have an image?
No, only relatively large subhalos have images, and probably your list is dominated by small things which aren't really of interest to you. If you apply the same criterion as the images (M* > 10^10 Msun), then you should find most all of them.
Sorry to bother you again-- I have two more questions. If I should be posting them as separate threads, please let me know and I'll do so.
The first question is probably dumb, but how do I link up a supplementary data catalog to use with the main group catalog at the same time? Earlier, when I sorted the SubfindIDs by CircAbove07Frac, I was using stellar_circs.hdf5. If I want to get, say, the SubhaloMass associated with that SubfindID, the SubhaloMass is located in groups_135.0.hdf5. Can these hdf5s cooperate together so that this is possible?
Second, I noticed that there is some discussion of (relative) coordinates of particles in the properties for halos and subhalos. Does this imply that there is a way to access point cloud data-- specifically, to get a list of coordinates corresponding to the stars/other bodies that comprise a galaxy? Currently I've been trying to reverse engineer data in this form the images via several methods (e.g. sampling points using the image as a histogram distribution), but it occurs to me that this data may exist already and I'm being redundant.
Hi Alex,
Yes the
SubfindID
is actually the index into the group catalog, at that snapshot. You can load all the fields of a single subhalo with theillustris_python.groupcat.loadSingle()
function of the scripts (see examples). Just a caution that "groups_135.0.hdf5" is one of many "chunks", careful here of the indexing.Second, yes you can always obtain all the member particles/cells of a given subhalo/halo. If you have the snapshot downloaded, you can directly use the
illustris_python.snapshot.loadSubhalo()
function of the helper scripts. If you haven't downloaded the snapshot, you can get a "cutout" directly from the web-API using[base]/subhalos/{id}/cutout.hdf5
(see docs).Thanks-- I pulled the subhalos. A couple questions about the results. It seems like many of the subhalos consist of more than one galaxy that appear topologically separate, e.g. these two:
and also there are some (quite a few) subhalos consisting of only a few sparse stars, e.g.
I suppose this was to be expected as subhalos and galaxies aren't the same thing, and there are obvious difficulties in defining what constitutes a single galaxy based purely on coordinate data (particularly in the case of mergers or big interacting clusters). Is there a way among the other physical parameters to separate the stars into galaxy-like groups? In other words, I recall you saying there isn't a simple measurement of galaxy type-- is there a measurement to select discrete galaxies?
Hi Alex,
I think you are seeing galaxies wrap around the periodic box, as here:
http://www.illustris-project.org/data/forum/topic/136/some-questions-on-particles/
You mean it could be the result of wraparound? That is certainly possible for a few of the examples I've seen, but I'm not sure it could be for most of them. For example the following set is rotated to be more or less viewed head through one of the coordinate planes, and there would definitely have to be more than one cluster, whether or not there is wraparound.
or, for a more extreme example, the subhalo with the greatest size point cloud looks like this on my machine
have I misunderstood in some way how these are structured?
Can you provide the run,snap,subhalo_Id of that first example?
That would be subhalo 62 in snap 135 of Illustris 3.
Hi Alex,
This of course a satellite of the first group. I checked and loaded the dark matter, stars, and gas particles, but found them all localized and not quite as you show above. For instance, for the gas:
Maybe verify you have the same numbers above, and check that you are loading the correct particles?
This would just be for the star particles. I got them with
which I then exported to a csv. I'm not getting the same numbers, e.g.
Hi Alex,
Yes your numbers are correct (halo 62, not subhalo 62), and it's true this halo spans ~1.5 Mpc. This is slightly large for its mass, and in this case I would guess a large halo-halo merger has just taken place, such that what you are effectively seeing is two (or more) halos "bridged" into a single object. This "FoF bridging" can always occur since the FoF algorithm will at some point link together two previously distinct structures. If you look at the merger tree of this halo, you'll probably be able to spot the distinct (large) progenitor branches. If you look at the subhalos, you see the first three have very large M ~ 10^11.1, and the the fourth has M ~ 10.1 (i.e. the first true satellite, the other three being roughly equal mass, essentially central galaxies).
I see-- so, I guess then that, in general, Subhalos more likely will correspond to distinct galaxies than Halos?
Is the intuition here that Subhalos correspond to clusters and Halos correspond to clusters that are close enough to be significantly interacting?
Yes you should always use 'subhalos' as 'galaxies'. A general medium-sized halo will have a large central galaxy and many satellite galaxies, each of which will be picked up as separate subhalos.
I just have one more question! I am looking at the nonparametric morphologies catalog and have downloaded the data. Now I am trying to match up the data to the previous Subhalo data I downloaded. The morphology catalog data gives me SubfindIDs. I got the SubhaloData using il.snapshot.loadSubhalo(...) with what I assume are SubhaloIDs. How do I figure out which SubhaloID is associated with a SubfindID?
Same thing, we use "subhalo" and "subfind" as well as "id" and "index" all interchangeably.
I see-- that's what I had thought. What confused me is that some of the entries under SubfindID_camX went up to very high values (highest was 627325). My impression was the SubhaloIDs only went up to about 60k.
So, do the SubfindIDs above 60604 in SubfindID_camX not correspond to Subhalos?
For example, in nonparametric_morphologies (loaded as "nm"), when I execute
it returns
however when I try to
I get the error
I get the same error from
I should say mention most of the SubfindIDs I get from nonparametric_morphologies are above 60604. I suspect something is happening with offsets here but I'm not sure how to fix the issue?
Hi Alex,
Are you sure you're at the right sim (basePath)? Illustris-1, at z=0, has 4366546 subhalos and your load commands work ok for me.
Oh geez, that must be the issue. And now that really explains why I was having trouble with the circularities catalog earlier. I think what I have must be Illustris-3, since I got the file through following the Getting Started Guide.
I don't suppose there is Gini,M20,C data is available somewhere for Illustris-3? Or some way to convert it from Illustris-1? I read the Gini,M20 paper you suggested and I think it would be interesting to correlate that data with the results of the analysis I've been doing, but that evidently has all been on Illustris-3. If the Illustris-1 data is more detailed, redoing it may be somewhat computationally intractable (on this 2 dollar computer I'm using, at least).
Hi Alex,
Many of the catalogs are available for Illustris-3 but unfortunately not Gini,M20 (nor the mock images). Galaxies at this resolution are getting pretty unresolved, so it isn't likely that we would scientifically trust such spatially resolved morphological measurements.
I see, that does make sense. I lack the computing power to go beyond Illustris-3 on my personal machine, so I suppose I'll have to write up the code to do the analysis on Illustris-1 and make friends with somebody with a lot of GPUs.
Just one more question, and I think this should be my last for a while. What angles are the four cameras viewing from? Are they consistent between galaxies-- for example, cam0 always views from the xy plane, cam1 from the yz plane, etc.-- or do they change between galaxies-- for example, viewing from planes defined by PCA on the galaxy?
Hi Alex,
You can find the details of these 4 cameras in Torrey+ (2015).
You may also want to consider testing out the new JupyterLab service, to see if it might help your work.
Dear Dylan,
How can I download hundreds of images of given subhalo id from TGN300-1?
Hi,
As above, I think the easiest suggestion is: wget a list of links that you first put in a text file.
The links will look like:
http://www.tng-project.org/api/TNG300-1/snapshots/99/subhalos/331453/vis.png?partType=stars&partField=stellarBand-sdss_r
where the number
331453
is the subhalo ID, and this is the part you would change for each line of the text file.Note: there are no pre-generated "stellar_mocks" for TNG300, so these are stellar light visualizations on-the-fly. This is one option. Please see the documentation under
{vis_query}
for information.Thanks, Dylan.