I am trying to get all of the particles (gas, dm, stars) within a certain volume (e.g., a cube of a specified size centered on some specified location).
According to the example scripts, the following should work:
dm_pos = il.snapshot.loadSubset(basePath,99,'dm',['Coordinates'])
However, I am getting a memory error for TNG300.
I could try and find a way to use loadHalo around a list of FoF halos, but this must be missing "inner fuzz" and "outer fuzz" (http://www.tng-project.org/data/docs/specifications/ )
Can you suggest an algorithm to help? Is there a ways to load the "fuzz" separately?
For TNG300-1, this load command requires exactly 2500^3 * 3 * 8 bytes (the 3 from x,y,z, the 8 from float64 each). This is 350 GB, which is much more than I can allow on the JupyterLab interface, for instance.
2500^3 * 3 * 8
If you have an analysis node with this much memory, and have downloaded the data there, then this will certainly work. I do indeed often work like this, as it will be the most efficient way.
But if you don't, there really isn't any need to load all the particles at the same time. You could e.g. load just 1% of the particles (requiring 3.5 GB) of memory, select those within the volume of interest, compute whatever you need (e.g. a radial profile?), then discard those particles, load the next 1%, and so on.
If the type of computation you have requires simultaneously all particles in the volume, then you could do it in two phases, first: locate all the particle indices in the volume with the chunked approached as above, save this index set, then load only the properties of those particles.
Hi Chris and Dylan,
I am now running into the same issues with Chris, i.e., get all of the particles (gas, dm, stars) within a certain volume (e.g., a cube of a specified size centered on some specified location) for TNG300-1, for example.
Do you might have some example script that can do the "the chunked approached"?
I am trying the approach you suggested in another post "Loading the TNG100-1 data", but the "snapNumChunks" function is missing.
Thanks in advance.
I am trying to reuse a few functions from the illustris_python.snapshot to create a subset dictionary, modifying the corresponding values to chunk up the particle files, and pass to il.snapshot.loadSubset, seems to be working ...
import illustris_python as il
import numpy as np
from illustris_python.snapshot import getSnapOffsets, snapPath, getNumPart
from illustris_python.util import partTypeNum
basePath = '../sims.TNG/TNG100-3/output/'
subset = getSnapOffsets(basePath, 84, 0, "Subhalo")
with h5py.File(snapPath(basePath, 84), 'r') as f:
header = dict(f['Header'].attrs.items())
nPart = getNumPart(header)
while(startpoint < nPart[partTypeNum(partType)]):
subset['offsetType'][partTypeNum(partType)] = startpoint
subset['lenType'][partTypeNum(partType)] = len_range
dm = il.snapshot.loadSubset(basePath, 84, 'dm', fields=['Velocities','Coordinates'], subset=subset)
sum0 = sum0+len(dm['Coordinates'][:, 0])
Below is a code snippet for my TNG volume search:
#My machine would need 350GB of memory (95 avail) to read in the entire set of HDF particles files at once. Instead, we read 1/10th of them at a time (thus the length).
part_len = 15625000000.
loops = 10.0
bin_size = part_len/loops
#We are looking at a 5x5x5Mpc box around each location. We need to be real careful of when "h" lives in the units, else we get this wrong.
#For Illustris, they are pretty good at keeping everything in inverse h distance units.
vol_size = 5000.0
for i in range(0,int(loops)):
start = int(i*bin_size)
stop = int((i+1)*bin_size - 1)
# print i, start, stop
#we use the simple access mechaism to access the data (h5py). Usage is described here: http://www.tng-project.org/data/docs/specifications/
#Note that particles for any given halo can live in any of the HDF snapshot files. In other words, halo #0 might have particles that live
#in an HDF file that is not used for the first 156250000 particles and so we would only catch it on the second (or third...tenth) loop.
#For now, we therefore save all of the particles for EACH HALO at A SPECIFIC LOOP over the particles. This means that we will need to
#do a second round of data cleaning where we combine the particles from each loop for any specific halo back into one single particle file
#for that halo. Below, str(clusters[j]) puts the halo id in the filename while str(i) denotes the loop #.
with h5py.File('simulation.hdf5','r') as f:
dm_positions = f['/Snapshots/99/PartType1/Coordinates'][start:stop,:]
for j in range(0,len(centers)):
w = np.where((np.abs(dm_positions[:,0] - clusters[j]) < vol_size) & (np.abs(dm_positions[:,1] - clusters[j]) < vol_size) & (np.abs(dm_positions[:,2] - clusters[j]) < vol_size))
if (len(w) > 0):
filename = front + '/particles/halo_' + str(clusters[j]) + '_positions_' + str(i) + '.fits'
t = Table([dm_positions[w,0]/1000.0,dm_positions[w,1]/1000.0, dm_positions[w,2]/1000.0], names=('px','py','pz'))
#We cannot read in both the velocities and the particles at the same time or else we run out of memory. We could just shrink the size of
#each chunk (i.e., increase the number of loops). Instead, i choose to write out the list of member tracers. I will use these later on the
filename = front + '/particles/halo_' + str(clusters[j]) + '_members_' + str(i) + '.fits'
t = Table([w],names=('w'))
#I have to delete the variable before I even try to read in the next set, else we get memory issues.
#To get the velocities, I use the vector of saved members, which goes a lot faster.
Thank you Chris!