I am using the Jupyter workspace, and am trying to extract information about subhalos whose centres of mass are within a certain distance of the centre of mass of a target subhalo. I am trying to do this for a specific list of subhalos in TNG100-1 and in TNG300-1, at multiple snapshots.
This is painfully slow. I left this running over the weekend and have only retrieved information for one snapshot in TNG100-1, and I'm pretty sure that running this for TNG300-1 causes the kernel to become unresponsive. For part of my code, see below.
Is there a more efficient way to recursively pick out all the subhalos in multiple local volumes as I'm doing? Would appreciate any useful advice.
reqfields = ['SubhaloMassType', 'SubhaloCM']
sh1=il.groupcat.loadSubhalos('sims.TNG/TNG100-1/output/', 99, fields=reqfields)
sh3=il.groupcat.loadSubhalos('sims.TNG/TNG300-1/output/', 99, fields=reqfields)
CM100 = sh1['SubhaloCM'] / h
CM300 = sh3['SubhaloCM'] / h
MH100 = sh1['SubhaloMassType'][:,1] * 1e10 / 0.6774
MH300 = sh3['SubhaloMassType'][:,1] * 1e10 / 0.6774
ids100 = np.genfromtxt('ids100.txt')
ids300 = np.genfromtxt('ids300.txt')
env100 = 
for i in ids100:
obj = il.groupcat.loadSingle('sims.TNG/TNG100-1/output/', 99, subhaloID=i)
cm = obj['SubhaloCM'] / h
D = np.absolute(CM100 - cm)
Dm = np.max(D, axis=1)
R = np.sqrt(np.sum((CM100-cm)**2, axis=1))
w = np.where((Dm>0) & (Dm<1000) & (R>0))
if len(w) > 0:
MHw = MH100[w]
There are two issues here.
First, a good rule is "never load anything from disk, once per halo". So you need to remove the il.groupcat.loadSingle call from the loop. You can replace this with il.groupcat.loadSubhalos(..., fields=['SubhaloCM']) before the loop starts, and then just access the correct index of this array in the loop.
Second, your algorithm is a nearest neighbor search, and under the techniques listed there you are doing a "linear search". This has a poor complexity of O(N^2), which can be improved using one of the "space partioning" ideas described there. (In practice, e.g. scipy.cKDTree). If the size of ids is greater than ~1 million to ~1 billion, such an improvement might also be needed.
First of all, I tried using the loadSubhalos function but it doesn't seem to have an ID field, hence why I resorted to loadSingle. Maybe I'm being stupid, but how do you raise this field, assuming it's possible?
The cKDTree thing looks very useful though - I'll give that a try.
The loadSubhalos() function loads fields for all subhalos, so you don't pass an ID. You have already loaded the data actually, all you need to do is replace
cm = obj['SubhaloCM'] / h
cm = CM100[i,:]
Excellent, thanks for your help!