Selection of fof halos (Lab loading data slow)

Johanna Paine
  • 1
  • 10 Mar

Hi so I wrote this function to select some fof halos by mass in a web-based JupyterLab session. Last week for March 1-3, I could select about 100 halos in 5 mins in the . But as of March 6th, it ran for hours and never completed. So then I tried to just select one halo, which took like 5 mins for the same mass range. Perhaps there are other big runs happening?
Here is my function:

def find_fof_halos(
    base=baseHydro / "output",
    h=0.6774,
    Mmin=1e10,
    Mmax=1e11,
    n=10,):
    """Return the first n FoF haloes in [Mmin, Mmax] """
    results = []
    offset = 0
    for fname in sorted(os.listdir(base)):
        if not fname.endswith(".hdf5"):
            continue
        with h5py.File(os.path.join(base, fname), "r") as h5:
            if "Group" not in h5 or "GroupMass" not in h5["Group"]:
                continue
            grp = h5["Group"]
            M = grp["GroupMass"][:] * 1e10 / h  # Msun
            sel = np.where((M > Mmin) & (M < Mmax))[0]
            for i in sel:
                fof_id = offset + int(i)
                results.append({
                    "R_crit200": float(grp["Group_R_Crit200"][i] / h),   # ckpc
                    "FoF_ID": fof_id,
                    "M_FoF": float(M[i]),
                    "CentralSubhaloID": int(grp["GroupFirstSub"][i]),
                    "FoF_Position": (grp["GroupPos"][i] / h).astype(float),  })
                if len(results) >= n:
                    return results
            offset += len(M)
    return results
halos = find_fof_halos(base=baseHydro/"output"/f"groups_{snap:03d}", Mmin=0.8e12, Mmax=1.2e12, n=1)

These calls are also taking much longer and I am not sure why:

    Hh = il.groupcat.loadHalos(bh, snap, fields=["GroupFirstSub","GroupPos","GroupMass","Group_R_Crit200"])
    Hd = il.groupcat.loadHalos(bd, snap, fields=["GroupPos","GroupMass","Group_R_Crit200"])

    Sh = il.groupcat.loadSubhalos(bh, snap, fields=["SubhaloPos"])
    Sd = il.groupcat.loadSubhalos(bd, snap, fields=["SubhaloPos","SubhaloGrNr"])
Dylan Nelson
  • 10 Mar

I am not sure I fully understand find_fof_halos(), it is loading the HDF5 files directly and avoiding the helper functions, for some particular reason?

Why not just

GroupMass = il.groupcat.loadHalos(path, snap, fields=["GroupMass"])
M = GroupMass * 1e10 / h  # Msun
fof_IDs = np.where((M > Mmin) & (M < Mmax))[0]
Dylan Nelson
  • 10 Mar

I do see the performance issue, clearly something is very slow right now. Will look into it.

  • Page 1 of 1