Note

This tutorial was generated from an IPython notebook that can be downloaded here.

# Astrometric fitting¶

In this tutorial we’ll walk through the simplest astrometric example with exoplanet and then explain how to build up a more complicated example with parallax measurements. For our dataset, we’ll use astrometric and radial velocity observations of a binary star system.

Astrometric observations usually consist of measurements of the separation and position angle of the secondary star (or directly imaged exoplanet), relative to the primary star as a function of time. The simplest astrometric orbit (in terms of number of parameters), describes the orbit using a semi-major axis a_ang measured in arcseconds, since the distance to the system is assumed to be unknown. We’ll work through this example first, then introduce the extra constraints provided by parallax information.

## Data¶

First, let’s load and examine the data. We’ll use the astrometric measurements of HR 466 (HD 10009) as compiled by Pourbaix 1998. The speckle observations are originally from Hartkopf et al. 1996.

from astropy.io import ascii
from astropy.time import Time

# grab the formatted data and do some munging
dirname = "https://gist.github.com/iancze/262aba2429cb9aee3fd5b5e1a4582d4d/raw/c5fa5bc39fec90d2cc2e736eed479099e3e598e3/"

dirname + "astro.txt", format="csv", fill_values=[(".", "0")]
)

# convert UT date to JD
astro_dates = Time(astro_data_full["date"].data, format="decimalyear")

# Following the Pourbaix et al. 1998 analysis, we'll limit ourselves to the highest quality data
# since the raw collection of data outside of these ranges has some ambiguities in swapping
# the primary and secondary star
ind = (
(astro_dates.value > 1975.0)
& (astro_dates.value < 1999.73)
)

astro_data = astro_data_full[ind]

astro_yrs = astro_data["date"]
astro_dates.format = "jd"
astro_jds = astro_dates[ind].value


Many of these measurements in this heterogeneous dataset do not have reported error measurements. For these, we assume a modest uncertainty of $$1^\circ$$ in position angle and $$0.01^{\prime\prime}$$ in separation for the sake of specifying something, but we’ll include a jitter term for both of these measurements as well. The scatter in points around the final solution will be a decent guide of what the measurement uncertainties actually were.

import numpy as np

# Convert all masked frames to be raw np arrays, since theano has issues with astropy masked columns
rho_data = np.ascontiguousarray(astro_data["rho"], dtype=float)  # arcsec
rho_err = np.ascontiguousarray(astro_data["rho_err"], dtype=float)

# The position angle measurements come in degrees in the range [0, 360].
# We'll convert this to radians in the range [-pi, pi]
deg = np.pi / 180.0
theta_data = np.ascontiguousarray(astro_data["PA"] * deg, dtype=float)
theta_data[theta_data > np.pi] -= 2 * np.pi

theta_err = np.ascontiguousarray(astro_data["PA_err"] * deg)  # radians


## Astrometric conventions¶

The conventions describing the orientation of the orbits are described in detail in the exoplanet paper; we summarize them briefly here. Generally, we follow the conventions from Pourbaix et al. 1998, which are a consistent set conforming to the right-hand-rule and the conventions of the visual binary field, where the ascending node is that where the secondary is receeding from the observer (without radial velocity information, there is a $$\pi$$ degeneracy in which node is ascending, and so common practice in the literature is to report a value in the range $$[0,\pi]$$). The orbital inclination ranges from $$[0, \pi$$]. $$i = 0$$ describes a face-on orbit rotating counter-clockwise on the sky plane, while $$i=\pi$$ describes a face-on orbit rotating clockwise on the sky. $$i = \pi/2$$ is an edge-on orbit.

The observer frame $$X$$, $$Y$$, $$Z$$ is oriented on the sky such that $$+Z$$ points towards the observer, $$X$$ is the north axis, and $$Y$$ is the east axis. All angles are measured in radians, and the position angle is returned in the range $$[-\pi, \pi]$$, which is the degrees east of north (be sure to check your data is in this format too!) The radial velocity is still defined such that a positive radial velocity corresponds to motion away from the observer.

In an astrometric-only orbit, it is common practice in the field to report $$\omega = \omega_\mathrm{secondary}$$, whereas with an RV orbit it is generally common practice to report $$\omega = \omega_\mathrm{primary}$$. The result is that unless the authors specify what they’re using, in a joint astrometric-RV orbit there is an ambiguity to which $$\omega$$ the authors mean, since $$\omega_\mathrm{primary} = \omega_\mathrm{secondary} + \pi$$. To standardize this across the exoplanet package, in all orbits (including astrometric-only) $$\omega = \omega_\mathrm{primary}$$.

import matplotlib.pyplot as plt

# Make a plot of the astrometric data on the sky
# The convention is that North is up and East is left
fig, ax = plt.subplots(nrows=1, figsize=(4, 4))

xs = rho_data * np.cos(theta_data)  # X is north
ys = rho_data * np.sin(theta_data)  # Y is east
ax.plot(ys, xs, ".k")
ax.set_ylabel(r"$\Delta \delta$ ['']")
ax.set_xlabel(r"$\Delta \alpha \cos \delta$ ['']")
ax.invert_xaxis()
ax.plot(0, 0, "k*")
ax.set_aspect("equal", "datalim") The plot on the sky is helpful to look at, but the “raw” measurements are the values of $$\rho$$ (separation) and $$\theta$$ (also called P.A., position angle) that we listed in our data table, and that the measurement uncertainties live on these values as nice Gaussians. So, to visualize this space more clearly, we can plot $$\rho$$ vs. time and P.A. vs. time.

fig, ax = plt.subplots(nrows=2, sharex=True)
ax.errorbar(astro_yrs, rho_data, yerr=rho_err, fmt=".k", lw=1, ms=5)
ax.set_ylabel(r'$\rho\,$ ["]')

ax.errorbar(astro_yrs, theta_data, yerr=theta_err, fmt=".k", lw=1, ms=5)
_ = ax.set_xlabel("time [years]") ## Fitting the astrometric orbit with exoplanet¶

To get started, let’s import the relative packages from exoplanet, plot up a preliminary orbit from the literature, and then sample to find the best parameters.

Note

Orbits in exoplanet generally specify the semi-major axis in units of solar radii R_sun. For transits and RV orbits, you usually have enough external information (e.g., estimate of stellar mass from spectral type) to put a physical scale onto the orbit. For the most basic of astrometric orbits without parallax information, however, this information can be lacking and thus it makes sense to fit for the semi-major axis in units of arcseconds. But, exoplanet is modeling a real orbit (where semi-major axis is in units of R_sun), so we do need to at least provide a fake parallax to convert from arcseconds to R_sun.

import pymc3 as pm
import theano
import theano.tensor as tt

import exoplanet as xo
from exoplanet.distributions import Angle

from astropy import constants

# conversion constant from au to R_sun
au_to_R_sun = (constants.au / constants.R_sun).value

# Just to get started, let's take a look at the orbit using the best-fit parameters from Pourbaix et al. 1998

# Orbital elements from Pourbaix et al. 1998
# For the relative astrometric fit, we only need the following parameters
a_ang = 0.324  # arcsec
parallax = 1  # arcsec (meaningless choice for now)
a = a_ang * au_to_R_sun / parallax
e = 0.798
i = 96.0 * deg  # [rad]
omega = 251.6 * deg - np.pi  # Pourbaix reports omega_2, but we want omega_1
Omega = 159.6 * deg
P = 28.8 * 365.25  # days

T0 = Time(1989.92, format="decimalyear")
T0.format = "jd"
T0 = T0.value  # [Julian Date]

# instantiate the orbit
orbit = xo.orbits.KeplerianOrbit(
a=a, t_periastron=T0, period=P, incl=i, ecc=e, omega=omega, Omega=Omega
)

# The position functions take an optional argument parallax to convert from
# physical units back to arcseconds
t = np.linspace(T0 - P, T0 + P, num=200)  # days
rho, theta = theano.function([], orbit.get_relative_angles(t, parallax))()

# Plot the orbit
fig, ax = plt.subplots(nrows=1, figsize=(4, 4))

xs = rho * np.cos(theta)  # X is north
ys = rho * np.sin(theta)  # Y is east
ax.plot(ys, xs, color="C0", lw=1)

# plot the data
xs = rho_data * np.cos(theta_data)  # X is north
ys = rho_data * np.sin(theta_data)  # Y is east
ax.plot(ys, xs, ".k")

ax.set_ylabel(r"$\Delta \delta$ ['']")
ax.set_xlabel(r"$\Delta \alpha \cos \delta$ ['']")
ax.invert_xaxis()
ax.plot(0, 0, "k*")
ax.set_aspect("equal", "datalim")
ax.set_title("initial orbit")

fig, ax = plt.subplots(nrows=2, sharex=True, figsize=(6, 6))
ax.errorbar(astro_jds, rho_data, yerr=rho_err, fmt=".k", lw=1, ms=5)
ax.plot(t, rho, color="C0", lw=1)
ax.set_ylabel(r'$\rho\,$ ["]')
ax.set_title("initial orbit")

ax.errorbar(astro_jds, theta_data, yerr=theta_err, fmt=".k", lw=1, ms=5)
ax.plot(t, theta, color="C0", lw=1)
_ = ax.set_xlabel("time [JD]")  Now that we have an initial orbit, we can set the model up using PyMC3 to do inference.

yr = 365.25

# for predicted orbits
t_fine = np.linspace(astro_jds.min() - 500, astro_jds.max() + 500, num=1000)

def get_model(parallax=None):
with pm.Model() as model:

if parallax is None:
# Without an actual parallax measurement, we can model the orbit in units of arcseconds
# by providing a fake_parallax and conversion constant
plx = 1  # arcsec
else:
# Below we will run a version of this model where a measurement of parallax is provided
# The measurement is in milliarcsec
m_plx = pm.Bound(pm.Normal, lower=0, upper=100)(
"m_plx", mu=parallax, sd=parallax, testval=parallax
)
plx = pm.Deterministic("plx", 1e-3 * m_plx)

a_ang = pm.Uniform("a_ang", 0.1, 1.0, testval=0.324)
a = pm.Deterministic("a", a_ang / plx)

# We expect the period to be somewhere in the range of 25 years,
# so we'll set a broad prior on logP
logP = pm.Normal(
"logP", mu=np.log(25 * yr), sd=10.0, testval=np.log(28.8 * yr)
)
P = pm.Deterministic("P", tt.exp(logP))

# For astrometric-only fits, it's generally better to fit in
# p = (Omega + omega)/2 and m = (Omega - omega)/2 instead of omega and Omega
# directly
omega0 = 251.6 * deg - np.pi
Omega0 = 159.6 * deg
p = Angle("p", testval=0.5 * (Omega0 + omega0))
m = Angle("m", testval=0.5 * (Omega0 - omega0))
omega = pm.Deterministic("omega", p - m)
Omega = pm.Deterministic("Omega", p + m)

# For these orbits, it can also be better to fit for a phase angle
# (relative to a reference time) instead of the time of periasteron
# passage directly
phase = Angle("phase", testval=0.0)
tperi = pm.Deterministic("tperi", T0 + P * phase / (2 * np.pi))

# Geometric uiform prior on cos(incl)
cos_incl = pm.Uniform(
"cos_incl", lower=-1, upper=1, testval=np.cos(96.0 * deg)
)
incl = pm.Deterministic("incl", tt.arccos(cos_incl))
ecc = pm.Uniform("ecc", lower=0.0, upper=1.0, testval=0.798)

# Set up the orbit
orbit = xo.orbits.KeplerianOrbit(
a=a * au_to_R_sun,
t_periastron=tperi,
period=P,
incl=incl,
ecc=ecc,
omega=omega,
Omega=Omega,
)
if parallax is not None:
pm.Deterministic("M_tot", orbit.m_total)

# Compute the model in rho and theta
rho_model, theta_model = orbit.get_relative_angles(astro_jds, plx)
pm.Deterministic("rho_model", rho_model)
pm.Deterministic("theta_model", theta_model)

# Add jitter terms to both separation and position angle
log_rho_s = pm.Normal(
"log_rho_s", mu=np.log(np.median(rho_err)), sd=5.0
)
log_theta_s = pm.Normal(
"log_theta_s", mu=np.log(np.median(theta_err)), sd=5.0
)
rho_tot_err = tt.sqrt(rho_err ** 2 + tt.exp(2 * log_rho_s))
theta_tot_err = tt.sqrt(theta_err ** 2 + tt.exp(2 * log_theta_s))

# define the likelihood function, e.g., a Gaussian on both rho and theta
pm.Normal("rho_obs", mu=rho_model, sd=rho_tot_err, observed=rho_data)

# We want to be cognizant of the fact that theta wraps so the following is equivalent to
# pm.Normal("obs_theta", mu=theta_model, observed=theta_data, sd=theta_tot_err)
# but takes into account the wrapping. Thanks to Rob de Rosa for the tip.
theta_diff = tt.arctan2(
tt.sin(theta_model - theta_data), tt.cos(theta_model - theta_data)
)
pm.Normal("theta_obs", mu=theta_diff, sd=theta_tot_err, observed=0.0)

# Set up predicted orbits for later plotting
rho_dense, theta_dense = orbit.get_relative_angles(t_fine, plx)
rho_save = pm.Deterministic("rho_save", rho_dense)
theta_save = pm.Deterministic("theta_save", theta_dense)

# Optimize to find the initial parameters
map_soln = model.test_point
map_soln = xo.optimize(map_soln, vars=[log_rho_s, log_theta_s])
map_soln = xo.optimize(map_soln, vars=[phase])
map_soln = xo.optimize(map_soln, vars=[p, m, ecc])
map_soln = xo.optimize(map_soln, vars=[logP, a_ang, phase])
map_soln = xo.optimize(map_soln)

return model, map_soln

model, map_soln = get_model()

optimizing logp for variables: [log_theta_s, log_rho_s]
10it [00:01,  8.70it/s, logp=1.471440e+02]
message: Optimization terminated successfully.
logp: 104.85554109304441 -> 147.14399186005338
optimizing logp for variables: [phase]
22it [00:01, 18.18it/s, logp=1.676422e+02]
message: Optimization terminated successfully.
logp: 147.14399186005338 -> 167.64220598197195
optimizing logp for variables: [ecc, m, p]
34it [00:00, 35.67it/s, logp=2.100634e+02]
message: Optimization terminated successfully.
logp: 167.64220598197198 -> 210.06340668297904
optimizing logp for variables: [phase, a_ang, logP]
11it [00:00, 12.19it/s, logp=2.105014e+02]
message: Optimization terminated successfully.
logp: 210.06340668297906 -> 210.50136989148027
optimizing logp for variables: [log_theta_s, log_rho_s, ecc, cos_incl, phase, m, p, logP, a_ang]
35it [00:01, 27.10it/s, logp=2.150212e+02]
message: Optimization terminated successfully.
logp: 210.5013698914803 -> 215.021177422116


Now that we have a maximum a posteriori estimate of the parameters, let’s take a look at the results to make sure that they seem reasonable.

ekw = dict(fmt=".k", lw=0.5)

fig, ax = plt.subplots(nrows=4, sharex=True, figsize=(6, 8))
ax.set_ylabel(r'$\rho\,$ ["]')
ax.set_ylabel(r"$\rho$ residuals")
ax.set_ylabel(r"P.A. residuals")

tot_rho_err = np.sqrt(rho_err ** 2 + np.exp(2 * map_soln["log_rho_s"]))
tot_theta_err = np.sqrt(theta_err ** 2 + np.exp(2 * map_soln["log_theta_s"]))

ax.errorbar(astro_jds, rho_data, yerr=tot_rho_err, **ekw)
ax.plot(t_fine, map_soln["rho_save"], "C1")

ax.axhline(0.0, color="0.5")
ax.errorbar(
astro_jds, rho_data - map_soln["rho_model"], yerr=tot_rho_err, **ekw
)

ax.plot(t_fine, map_soln["theta_save"], "C1")
ax.errorbar(astro_jds, theta_data, yerr=tot_theta_err, **ekw)

ax.axhline(0.0, color="0.5")
ax.errorbar(
astro_jds, theta_data - map_soln["theta_model"], yerr=tot_theta_err, **ekw
)

ax.set_xlim(t_fine, t_fine[-1])
_ = ax.set_title("map orbit") Now let’s sample the posterior.

np.random.seed(1234)
with model:
trace = pm.sample(
tune=5000,
draws=4000,
start=map_soln,
cores=2,
chains=2,
)

Multiprocess sampling (2 chains in 2 jobs)
NUTS: [log_theta_s, log_rho_s, ecc, cos_incl, phase, m, p, logP, a_ang]
Sampling 2 chains, 0 divergences: 100%|██████████| 18000/18000 [02:25<00:00, 123.95draws/s]
The number of effective samples is smaller than 25% for some parameters.


First we can check the convergence for some of the key parameters.

pm.summary(
trace, varnames=["P", "tperi", "a_ang", "omega", "Omega", "incl", "ecc"]
)

mean sd hpd_3% hpd_97% mcse_mean mcse_sd ess_mean ess_sd ess_bulk ess_tail r_hat
P 10379.755 122.012 10173.625 10618.094 2.807 1.992 1890.0 1876.0 2407.0 1884.0 1.0
tperi 2447861.436 19.844 2447823.030 2447897.694 0.278 0.197 5098.0 5098.0 5106.0 4692.0 1.0
a_ang 0.318 0.008 0.303 0.332 0.000 0.000 4329.0 4305.0 4413.0 4485.0 1.0
omega 1.235 0.014 1.211 1.262 0.000 0.000 4222.0 4222.0 4245.0 3989.0 1.0
Omega 2.787 0.011 2.765 2.808 0.000 0.000 5074.0 5074.0 5075.0 5602.0 1.0
incl 1.691 0.006 1.680 1.703 0.000 0.000 4303.0 4303.0 4306.0 4945.0 1.0
ecc 0.776 0.012 0.755 0.799 0.000 0.000 4125.0 4110.0 4172.0 4392.0 1.0

That looks pretty good. Now here’s a corner plot showing the covariances between parameters.

import corner

samples = pm.trace_to_dataframe(trace, varnames=["ecc"])
samples["$P$ [yr]"] = trace["P"] / yr
samples["$T_\mathrm{peri} - T_0$ [day]"] = trace["tperi"] - T0
samples["$a$ [arcsec]"] = trace["a_ang"]
samples["$\omega$ [deg]"] = (trace["omega"] / deg) % 360
samples["$\Omega$ [deg]"] = (trace["Omega"] / deg) % 360
samples["$i$ [deg]"] = (trace["incl"] / deg) % 360
samples["$e$"] = samples["ecc"]
del samples["ecc"]
_ = corner.corner(samples) Finally, we can plot the posterior constraints on $$\rho$$ and $$\theta$$. This figure is much like the one for the MAP solution above, but this time the orange is a contour (not a line) showing the 68% credible region for the model.

ekw = dict(fmt=".k", lw=0.5)

fig, ax = plt.subplots(nrows=2, sharex=True, figsize=(6, 6))
ax.set_ylabel(r'$\rho\,$ ["]')

tot_rho_err = np.sqrt(
rho_err ** 2 + np.exp(2 * np.median(trace["log_rho_s"], axis=0))
)
tot_theta_err = np.sqrt(
theta_err ** 2 + np.exp(2 * np.median(trace["log_theta_s"], axis=0))
)

ax.errorbar(astro_jds, rho_data, yerr=tot_rho_err, **ekw)
q = np.percentile(trace["rho_save"], [16, 84], axis=0)
ax.fill_between(t_fine, q, q, color="C1", alpha=0.8, lw=0)

ax.errorbar(astro_jds, theta_data, yerr=tot_theta_err, **ekw)
q = np.percentile(trace["theta_save"], [16, 84], axis=0)
ax.fill_between(t_fine, q, q, color="C1", alpha=0.8, lw=0)

ax[-1].set_xlim(t_fine, t_fine[-1])
_ = ax.set_title("posterior inferences") As we can see from the narrow range of orbits (the orange swath appears like a thin line), the orbit is actually highly constrained by the astrometry. We also see two outlier epochs in the vicinity of 2445000 - 2447000, since adjacent epochs seem to be right on the orbit. It’s likely the uncertainties were not estimated correctly for these, and the simlplistic jitter model we implemented isn’t sophisticated to apply more weight to only these discrepant points.

## Including parallax¶

While this is encouraging that we fit an astrometric orbit, a simple astrometric fit to just $$\rho$$ and $$\theta$$ isn’t actually that physically satisfying, since many of the orbital parameters simply have to do with the orientation relative to us ($$i$$, $$\omega$$, and $$\Omega$$). The only truely intrinsic parameters are $$P$$ and $$e$$. To learn more about some of the physical parameters, such as the total mass of the system, we’d like to incorporate distance information to put a physical scale to the problem.

The Gaia DR2 parallax is $$\varpi = 24.05 \pm 0.45$$ mas.

We can use exactly the same model as above with only an added parallax constraint:

plx_model, plx_map_soln = get_model(parallax=[24.05, 0.45])

optimizing logp for variables: [log_theta_s, log_rho_s]
10it [00:00, 10.29it/s, logp=1.499286e+02]
message: Optimization terminated successfully.
logp: 107.64015029566296 -> 149.92860106267193
optimizing logp for variables: [phase]
22it [00:01, 21.39it/s, logp=1.704268e+02]
message: Optimization terminated successfully.
logp: 149.92860106267193 -> 170.4268151845905
optimizing logp for variables: [ecc, m, p]
34it [00:01, 33.83it/s, logp=2.128480e+02]
message: Optimization terminated successfully.
logp: 170.42681518459054 -> 212.8480158855976
optimizing logp for variables: [phase, a_ang, logP]
11it [00:01, 10.87it/s, logp=2.132860e+02]
message: Optimization terminated successfully.
logp: 212.8480158855976 -> 213.28597909410485
optimizing logp for variables: [log_theta_s, log_rho_s, ecc, cos_incl, phase, m, p, logP, a_ang, m_plx]
244it [00:01, 179.40it/s, logp=2.178059e+02]
message: Desired error not necessarily achieved due to precision loss.
logp: 213.28597909410485 -> 217.80586833503276

np.random.seed(5432)
with plx_model:
plx_trace = pm.sample(
tune=5000,
draws=4000,
start=plx_map_soln,
cores=2,
chains=2,
step=xo.get_dense_nuts_step(target_accept=0.9, start=plx_map_soln),
)

Multiprocess sampling (2 chains in 2 jobs)
NUTS: [log_theta_s, log_rho_s, ecc, cos_incl, phase, m, p, logP, a_ang, m_plx]
Sampling 2 chains, 0 divergences: 100%|██████████| 18000/18000 [02:18<00:00, 130.22draws/s]


Check the convergence diagnostics.

pm.summary(
plx_trace,
varnames=[
"P",
"tperi",
"a_ang",
"omega",
"Omega",
"incl",
"ecc",
"M_tot",
"plx",
],
)

mean sd hpd_3% hpd_97% mcse_mean mcse_sd ess_mean ess_sd ess_bulk ess_tail r_hat
P 10382.983 123.380 10179.682 10632.856 2.503 1.777 2430.0 2411.0 3038.0 2049.0 1.0
tperi 2447861.559 19.831 2447825.268 2447899.013 0.279 0.197 5051.0 5051.0 5055.0 5124.0 1.0
a_ang 0.318 0.008 0.304 0.333 0.000 0.000 4638.0 4605.0 4778.0 4377.0 1.0
omega 1.235 0.014 1.209 1.261 0.000 0.000 4505.0 4500.0 4522.0 4365.0 1.0
Omega 2.786 0.012 2.765 2.809 0.000 0.000 5355.0 5354.0 5352.0 5132.0 1.0
incl 1.691 0.006 1.680 1.703 0.000 0.000 4670.0 4670.0 4690.0 4665.0 1.0
ecc 0.776 0.012 0.753 0.798 0.000 0.000 4212.0 4189.0 4308.0 3703.0 1.0
M_tot 2.865 0.272 2.353 3.358 0.004 0.003 5258.0 5175.0 5357.0 5428.0 1.0
plx 0.024 0.000 0.023 0.025 0.000 0.000 6641.0 6641.0 6644.0 5030.0 1.0

And make the corner plot for the physical parameters.

samples = pm.trace_to_dataframe(plx_trace, varnames=["ecc"])
samples["$P$ [yr]"] = plx_trace["P"] / yr
samples["$T_\mathrm{peri} - T_0$ [day]"] = plx_trace["tperi"] - T0
samples["$a$ [au]"] = plx_trace["a"]
samples["$M_\mathrm{tot}$ [$M_\odot$]"] = plx_trace["M_tot"]
samples["$e$"] = plx_trace["ecc"]
del samples["ecc"]
_ = corner.corner(samples)

Substituting with a symbol from Computer Modern.
Substituting with a symbol from Computer Modern.
Substituting with a symbol from Computer Modern.
Substituting with a symbol from Computer Modern. 