Issue 
A&A
Volume 654, October 2021



Article Number  A156  
Number of page(s)  18  
Section  Planets and planetary systems  
DOI  https://doi.org/10.1051/00046361/202140989  
Published online  25 October 2021 
Chaotic diffusion of the fundamental frequencies in the Solar System^{★}
Astronomie et Systèmes Dynamiques, Institut de Mécanique Céleste et de Calcul des Éphémérides, CNRS UMR 8028, Observatoire de Paris, Université PSL, Sorbonne Université,
77 Avenue DenfertRochereau,
75014
Paris,
France
email: nam.hoanghoai@obspm.fr
Received:
2
April
2021
Accepted:
4
May
2021
The longterm variations in the orbit of the Earth govern the insolation on its surface and hence its climate. The use of the astronomical signal, whose imprint has been recovered in the geological records, has revolutionized the determination of the geological timescales. However, the orbital variations beyond 60 Myr cannot be reliably predicted because of the chaotic dynamics of the planetary orbits in the Solar System. Taking this dynamical uncertainty to account is necessary for a complete astronomical calibration of geological records. Our work addresses this problem with a statistical analysis of 120 000 orbital solutions of the secular model of the Solar System ranging from 500 Myr to 5 Gyr. We obtain the marginal probability density functions of the fundamental secular frequencies using kernel density estimation. The uncertainty of the density estimation is also obtained here in the form of confidence intervals determined by the moving block bootstrap method. The results of the secular model are shown to be in good agreement with those of the direct integrations of a comprehensive model of the Solar System. Application of our work is illustrated on two geological data sets: the NewarkHartford records and the Libsack core.
Key words: chaos / diffusion / celestial mechanics / methods: statistical
The density estimation of the fundamental frequencies and their combinations are only available at the CDS via anonymous ftp to cdsarc.ustrasbg.fr (130.79.128.5) or via http://cdsarc.ustrasbg.fr/vizbin/cat/J/A+A/654/A156
© N. H. Hoang et al. 2021
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Milankovitch (1941) hypothesized that some of the past large climate changes on the Earth originated from the longterm variations in its orbital and rotational elements. These variations are imprinted along the stratigraphic sequences of sediments. Using their correlations with an orbital solution (Laskar et al. 2004, 2011a; Laskar 2020), some of the geological records can be dated with precision. This method, named astrochronology, has become a standard practice in the stratigraphic community and has proven to be a powerful tool for reconstructing the geological timescale (e.g., Gradstein et al. 2004, 2012; Gradstein & Ogg 2020).
The climate rhythms found in the geological records are directly related to the Earth’s precession constant and to the fundamental secular frequencies of the Solar System: the precession frequencies of the planet perihelia and the precession frequencies of their ascending nodes. The evolution of these fundamental frequencies is accurately determined up to 60 Myr (Laskar et al. 2004, 2011a; Laskar 2020). Beyond this limit, even with the current highest precision ephemerides, it is hopeless to obtain a precise past history of the Solar System simply via numerical integration. This limit does not lie in the precision of the determination of the initial conditions but originates in the chaotic nature of the Solar System (Laskar 1989, 1990; Laskar et al. 2011b). However, because the astronomical signal is recorded in the geological data, it appears to be possible to trace back the orbital variations in the geological record and thus to constrain the astronomical solutions beyond their predictability horizon (Olsen & Kent 1999; Ma et al. 2017; Olsen et al. 2019). Nevertheless, a deterministic view of the Solar System is no longer reliable beyond 60 Myr, and a statistical approach should be adopted. Geological constraints should likewise be retrieved in a statistical setting. In this spirit, a recent Bayesian Markov chain Monte Carlo (MCMC) approach has been proposed to provide geological constraints on the fundamental frequencies (Meyers & Malinverno 2018). For such a Bayesian approach to give any meaningful constraint, proper prior distributions of the fundamental secular frequencies are required, and therefore a statistical study of the orbital motion of the Solar System planets is needed. This constitutes the motivation for the present study.
Laskar (2008) performed the first statistical analysis of the longterm chaotic behavior of the planetary eccentricities and inclinations in the Solar System. Mogavero (2017) reconsidered the problem from the perspective of statistical mechanics. Our study is a followup of Laskar (2008). We study fundamental frequencies instead of orbital elements because they are more robust and are closer to the proxies that can be traced in the geological records. This study is based on the numerical integrations of 120 000 different solutions of the averaged equations of the Solar System over 500 Myr, 40 000 of which were integrated up to 5 Gyr. The initial conditions of the solutions are sampled closely around a reference value that is compatible with our present knowledge of planetary motion.
2 Dynamical model
2.1 Secular equations
We used the secular equations of motions of Laskar (1985, 1990, 2008, and references therein). They were obtained via series expansions in planetary masses, eccentricities, and inclinations as well as through secondorder analytical averaging over the rapidly changing mean longitudes of the planets. The expansion was truncated at the second order with respect to the masses and to degree 5 in eccentricities and inclinations. The equations include corrections from general relativity and Earth–Moon gravitational interaction. This leads to the following system of ordinary differential equations: (1)
where ω = (z_{1}, …, z_{8}, ζ_{1}, …, ζ_{8}) with z_{k} = e_{k} exp(ϖ_{k}) and ζ_{k} = sin(i_{k}∕2)exp(Ω_{k}). The variable ϖ_{k} is the longitude of the perihelion, Ω_{k} is the longitude of the ascending node, e_{k} is eccentricity, and i_{k} is inclination. The function and are the terms of degree 3 and 5. The 16 × 16 matrix Γ is the linear LaplaceLagrange system, which is slightly modified to make up for the higherorder terms in the outer Solar System. With an adapted initial condition, the secular solution is very close to the solution of direct integration over 35 Myr (Laskar et al. 2004). The major advantage of the secular system over direct integration is speed. Numerically integrating averaged equations is 2000 times faster than nonaveraged ones due to the much larger step size: 250 yr instead of 1.8265 days. It is thus desirable to employ the secular equations to study the statistics of the Solar System. However, we also compare their predictions to those of a nonaveraged comprehensive dynamical model in Sect. 4.
2.2 Frequency analysis
We employed the frequency analysis (FA) technique proposed by Laskar (1988, 1993) to extract the fundamental secular frequencies from the integrated solutions. The method finds a quasiperiodic approximation of a function f(t) over a time span interval [0, T]. It first finds the strongest mode, which corresponds to the maximum of the function: (2)
where χ(t) is a weight function that improves the precision of the maximum determination; it was chosen to be the Hanning window filter, that is, χ(t) = 1 + cos(πt∕T). The next step is the GramSchmidt orthogonalization. The complex amplitude a_{1} of the first frequency ν_{1} is calculated via the orthogonal projection of the function f(t) on e^{iν1t}. This mode is then subtracted from the function f(t) to get a new function, f_{1}(t) = f(t) − a_{1}e^{νt}. The process is then repeated with this newly obtained function until N desired strongest modes are obtained. This technique works very well for weakly chaotic systems such as the Solar System when variables can be decomposed into quasiperiodic modes over a sufficiently short period of time. It has been proven that this algorithm converges toward the true frequencies much faster than the classical fast Fourier transform (Laskar 2005). Therefore, it is a good tool for studying the chaotic diffusion of the fundamental frequencies. In this work we used a routine naftab written in the publicly available computer algebra software TRIP (Gastineau & Laskar 2011), developed at IMCCE, to directly apply the frequency analysis.
To extract the fundamental secular frequencies of the Solar System, we applied the FA to the proper variables of the secular equations (Laskar 1990). Each fundamental frequency is obtained as the frequency of the strongest Fourier component of the corresponding proper variable. To track the time evolution of the frequencies, the FA was applied with a moving interval whose sliding step was 1 Myr. The interval sizes were 10 Myr, 20 Myr, and 50 Myr.
3 Estimation of probability density functions
The samples of this study consist of the secular frequencies of the astronomical solutions that were obtained by integrating the secular equations (Eq. (1)) from very close initial conditions. Due to this initial proximity, the correlation of the solutions in the samples lasts for a long period of time but will eventually diminish. Our objective is to obtain a robust estimation of the marginal probability density functions (PDFs) from these correlated samples. In fact, this correlation is the main motivation for our use of the estimation methods in this section. Details of our samples are described in the first part of Sect. 4.
We used kernel density estimation (KDE) to estimate the timeevolving marginal PDFs of the fundamental frequencies of the Solar System. In addition, the statistical uncertainty of our density estimations (i.e., PDF estimations) was measured by the moving block bootstrap (MBB) method. To our knowledge, this application of MBB for the KDE of a timeevolving sample whose correlation changes over time is new. Therefore, in order to ensure the validity of the method, we carried out several tests (see Sect. 3.3).
3.1 Kernel density estimation
We chose KDE, also known as the ParzenRosenblatt window method, as our preferred nonparametric estimator of the PDFs of the fundamental frequencies of the Solar System because of its high convergence rate and its smoothing property (Rosenblatt 1956; Parzen 1962). We briefly present the method here. Let X = {X_{1}, X_{2}, …, X_{n}} be a univariate independent and identically distributed (i.i.d.) sample drawn from an unknown distribution P with density function p(x). The KDE of the sample is then defined as: (3)
where K is a nonnegative kernel function and h is the bandwidth of the KDE. The choice of bandwidth h is much more important than the choice of kernel K, which was chosen to be the standard normal distribution in this paper. In this work, we consider bandwidths of the following form: (4)
where is the standard deviation of the sample, IQR is its interquartile range, and β is a constant of choice. The bandwidth with β = 1∕5 corresponds to the Silverman’s rule of thumb (Silverman 1986). The BW_{β=1∕5} is a version of the optimal choice of bandwidth for Gaussian distributed data that is slightly modified for better adaption to nonGaussian data. The bias error and variance error of the KDE with this bandwidth will be on the same order of magnitude. Undersmoothing, that is, choosing a smaller bandwidth, shrinks the bias so that the total error is dominated by the variance error, which can then be estimated by the bootstrap method (Hall et al. 1995); the common value of β for undersmoothing is 1∕3.
When the sample is identically distributed but correlated, KDE is still valid under some mixing conditions (Robinson 1983; Hart 1996). Indeed, in the case of observations that are not too highly correlated, the dependence among the data that fall in the support of the kernel function K can actually be much weaker than it is among the entire sample. This principle is known as “whitening by windowing” (Hart 1996). Therefore, the correlation in the sample does not invalidate the use of KDE and only impacts the variability of the estimation (see Sect. 3.2). With regard to the choice of bandwidth, Hall et al. (1995) suggested that using the asymptotically optimal bandwidth for the independent data is still a good choice, even for some strongly dependent data sequences. The samples generated by a chaotic measurepreserving dynamical system (as in the case of the numerical integration of the Solar System) resemble those of mixing stochastic processes; therefore, the theory of KDE should also be applicable for dynamical systems, although the formulation might be different (Bosq & Guégan 1995; MaumeDeschamps 2006; Hang et al. 2018).
3.2 Moving block bootstrap
Since the seminal paper by Efron (1979), bootstrap has become a standard resampling technique for evaluating the uncertainty of a statistical estimation. Bias and variance errors of KDE with the choice of bandwidth BW_{1∕5} (Eq. (4)) are on the same order of magnitude, and hence one should either undersmooth (i.e., choose a smaller bandwidth) to minimize the bias (Hall et al. 1995) or use an explicit bias correction with appropriate Studentizing (Cheng et al. 2019; Calonico et al. 2018). However, naively applying the i.i.d. bootstrap procedure on dependent data could underestimate the true uncertainty because all the dependence structure would be lost just by “scrambling” data together (Hart 1996). To remedy the problem, Kunsch (1989) and Liu et al. (1992) independently devised the MBB, which later became standard practice for evaluating the uncertainty in dependent data (see Kreiss & Lahiri 2012 for a review). Although the MBB for smooth functional has been intensively studied, the literature on MBB for the KDE of dependent data is very limited. Recently, Kuffner et al. (2019) formulated an optimality theory of the block bootstrap for KDE under an assumption of weak dependence. They proposed both undersmoothing and an explicit bias correction scheme to obtain the sampling distribution of the KDE. However, good tuning parameters, which are generally difficult to find if the data are from an unknown distribution, are required to provide a decent result. In this paper we propose overcoming this problem with an inductive approach: The optimal tuning parameters obtained in a known model are tested on different models and then extrapolated to the subject of our study, the Solar System.
Procedure of MBB
We briefly describe the undersmooth MBB for the KDE method (a more detailed description can be found in Kuffner et al. 2019). We suppose that X = {X_{1}, X_{2}, …, X_{n}} is a dependent sample of a mixing process with an underlying density function p(x). The KDE of the sample is ; the hat above a given quantity denotes its estimated value. We used MBB to estimate the distribution of , where x ∈ Ω and Ω is the domain of interest. Let l be an integer satisfying 1 ≤ l ≤ n. Then B_{i,l} = {X_{i}, X_{i+1}, …, X_{i+l−1}} with i ∈ {1, …, n − l + 1} denotes all the possible overlapping blocks of size l of the sample. Supposing, for the sake of simplicity, that l divides n, then b = n∕l. The MBB samples are obtained by selecting b blocks randomly with replacement from {B_{1,l}, …, B_{n−l+1,l}}. Serial concatenation of b blocks will give n bootstrap observations: . By choosing sufficiently large values of l (preferably larger than the correlation length), the MBB sample can retain the structure of the sample dependence. For k > 0, the KDE of the bootstrap sample is and its expectation is . We define (5)
such that if h is chosen properly to reduce the bias to be asymptotically negligible with respect to the stochastic variation, then the MBB distribution is a consistent estimator of the error distribution P(δ(x)) when h → 0, nh →∞, k → 0, lk →∞, and n∕l →∞. We note that if l = 1, then k = h and MBB reverts to the undersmoothing procedure for the i.i.d. sample studied by Hall et al. (1995). The efficiency of this estimator depends sensitively on two tuning parameters, l and k. We are interested in the uncertainty of the KDE, which is characterized by the confidence interval CI_{1−α}(x) and the confidence band CB_{1−α}, which are defined as: (6) (7)
where α denote the level of uncertainty; for example, α = 0.05 denotes 95% CI.
They can be estimated by the MBB distribution as:
In this paper we also use CB and CI without the hat overhead to denote estimated values.
Our choice of the parameters of the MBB procedure, l and k, is based on the effective sample size n_{eff}, defined as (Kass et al. 1998): (10)
where ρ(k) is the sample autocorrelation of lag k. The block length l is chosen by the sample correlation size l_{corr} as (11)
and the bootstrap bandwidth is parametrized as (12)
where γ and c_{0} are two optimizing constants. The reason for this choice of parametrization is twofold. First, when l_{corr} → 1, the sample become independent, and then k → h. Secondly, the rate of change of k with respect to l should be greater when l_{corr} is small than when it is large. Therefore, when l_{corr} ≫ 1, the optimal value of k should be quite stable. We also observe experimentally that the optimal value of k is indeed relatively stable at around 2h as long as l = l_{corr} ≫ 1. So we simply chose γ = 1 and c_{0} = 2. This choice of parameters turns out to be quite robust, as demonstrated by the two numerical experiments in Sect. 3.3.
The literature on KDE and MBB focuses on stationary, weakly dependent sequences. The data in our case, however, are different: They are not strictly stationary; the formulation of the mixing condition might be different (Hang et al. 2018); the correlation in the sample is not constant but evolving with time; finally, and most importantly, the data structure is different. Our sample units, which are the orbital solutions, are ordered based on their initial distances in phase space. The solutions evolve over time but their order remains unchanged. Therefore, statistical notions such as correlation and stationarity should be considered within this framework for the sample at fixed values of time. Because of the differences presented, an optimality theory, which is not currently available, might be needed for this case. However, we assume that it would not differ significantly from the orthodox analysis and that a decent working MBB procedure could be obtained with some good choice of parameters. This is tested in the section below.
Fig. 1 Kernel density estimation (red line) with bandwidth h = BW_{1∕3} and its 90% pointwise CI (red band) from an MCMC sample of n = 10^{4} units with l_{corr} = 140 of a DGMM(Eq. (13)  black line). 
3.3 Numerical experiments
We performed the KDEMBB procedure on two numerical experiments to ensure its validity. The double Gaussian mixture model (DGMM) with different degrees of correlation was used to calibrate the algorithm of the KDEMBB method, that is, to calibrate and test the tuning parameters (Eqs. (11)–(12)). The resulting algorithm was then applied on the Fermi map as an example of a real dynamical system for assessment.
Double Gaussian mixture model
The KDEMBB calibration was done on MCMC sequences that sample a double Gaussian distribution. Our particular choice of the DGMM, inspired by Cheng et al. (2019), is (13)
where ϕ(x) is the standard normal distribution.The MCMC sequence X_{1}, …, X_{n} was obtained by the MetropolisHasting algorithm with a Gaussian proposal distribution whose standard deviation σ_{p} characterizes the correlation length of the sequence (Metropolis et al. 1953; Hastings 1970). We could then either vary σ_{p} or perform thinning to obtain a sequence of a desired correlation length. The size of each MCMC sequence was n = 10^{4}. The initial state of the sequence was directly sampled from the distribution f_{DG} itself so that burnin was not necessary and the whole sequence was usable.
On each MCMC sequence, we applied the KDEMBB procedure with 500 MBB samples and the parameters specified above (Eqs. (11)–(12)) with the undersmoothing bandwidth h = BW_{1∕3} to get the uncertainty estimation – the standard error, the confidence interval (CI), and the confidence band (CB). Figure 1 shows an example of the density estimation of an MCMC sequence of 10^{4} units, with correlation length l_{corr} = 140; the sequence was generated by choosing σ_{p} = 0.25. It is clear in the figure that the true PDF (in black) lies inside the range of the 90% CI estimated by MBB.
To assess the precision of the MBB uncertainty estimation, at each value of l_{corr} we sampled 1000 different MCMC sequences and applied the KDEMBB process to each of them to obtain their estimation of the CI. We thus obtained a distribution of CIs estimated by MBB, which are depicted by the bands of mean ± standard deviation (Fig. 2). With the knowledge of the true PDF (Eq. (13)), the true CI and CB values were obtained from this collection of KDEs at each correlation length (Eqs. (6)–(7)). Figure 2 compares the true values of the pointwise 90% CI with the distribution of its values estimated by MBB at various correlation lengths. We see that the 90% CI of KDE estimated by MBB is accurate across a very wide range of l_{corr}, that is, the true values almost always lie inside the estimation bands.
Fig. 2 Estimation of the 90% CI of KDEs from different samples with different correlation lengths. Solid lines are the true values, and the bands denote the distribution of MBB estimations (mean ± standard deviation); both are computed from 10^{3} different samples, and each is composed of n = 10^{4} units generated from the DGMM by the MCMC algorithm with the same l_{corr}. 
Fermi map
The second test for the MBB scheme is a twodimensional chaotic Hamiltonian system: the Fermi map (Fermi 1949; Murray et al. 1985). This simple toy model, originally designed to model the movement of cosmic particles, describes the motion of a ball bouncing between two walls, one of which is fixed and the other oscillating sinusoidally. The equations of the Fermi map read: (14)
where u_{t} and ψ_{t} are the normalized velocity of the ball and the phase of the moving wall right before the t^{th} collision, respectively; the stochastic parameter of the system M was chosen here to be 10^{4}. The system was studied in the region of largescale chaos. Our sample was obtained by evolving Eq. (14) from n =10^{3} initial conditions. The initial conditions, u_{0} and ψ_{0}, are drawn from uniform distributions on [90–10^{−5}, 90 + 10^{−5}] and [0, 2π], respectively; they were then sorted by ascending value of ψ_{0}. The sorting step is imperative for quantifying the sample dependence as the initially ordered neighboring solutions still tend to be correlated afterward and the autocorrelation function can be calculated from ordered samples. Large initial phase variation will guarantee that chaotic diffusion is immediately perceptible and that the PDF of the velocity will center around its initial distribution. At each time t, we computed the KDE of the sample with the bandwidth BW_{1∕3}. The KDE uncertainty was estimated by applying the MBB 400 times with the same parameters specified above (Eqs. (11)–(12)). The whole process was then repeated 300 times with different sets of initial conditions. In the Fermi map experiment, the true analytical form of the density is not available. A numerical “true” density, which is obtained by calculating the KDE with bandwidth BW_{1∕5} from an n =10^{6} sample, was used instead to assess the validity of the MBB uncertainty estimation. From this “true” density and the KDEs from 300 samples, we were able to determine the true CB (Eqs. (6)–(7)).
Figure 3 shows an example of KDE and its CB estimated by MBB at t = 30. Although the estimated CB is valid, as the true curve lies completely in the band, the KDE itself looks quite jagged. The jagged KDE is the result of the undersmooth bandwidth h = BW_{1∕3}, so the bias is dominated by the variance error. In fact, with the ruleofthumb bandwidth h = BW_{1∕5}, we can have a smoother KDE (cf. the dashed blue line in Fig. 3). This choice is valid because its uncertainty is always smaller than that estimated with h = BW_{1∕3}. Therefore, we chose to use the uncertainty estimation computed with h = BW_{1∕3} as an upper bound of the uncertainty of a KDE with the ruleofthumb bandwidth. For example, a 95% CB of a KDE with h = BW_{1∕3} will represent an upper bound of a 95% CB of a KDE with h = BW_{1∕5} (Fig. 4). The same also applies for the CI.
Also from Fig. 4, we can see that the estimation of the CB follows the true value well. This is remarkable because, first, we can extend the use of MBB to measure the uncertainty of the density estimation from solutions of the chaotic dynamical system, where the correlation of the sample defined by initial distances in phase space changes over time, and second, our simple choice of MBB parameters appears to work well across very different models.
Fig. 3 Density estimation of Fermi map. The black line (“true” density) is the KDE with bandwidth h = BW_{1∕5} from a sample of the velocity u of n = 10^{6} solutions of Fermi map at t = 30. The red line denotes the KDE with bandwidth h = BW_{1∕3} with its 95% CB estimated by MBB (red band) from a sample of n = 10^{3} solutions at the same time. The dashed blue line represents the KDE with bandwidth h = BW_{1∕5} from the same sample. 
Fig. 4 Comparison of the CB estimation with its true value. The blue region represents the mean ±3 standard deviations of the CBs estimated by the MBB method of KDEs with bandwidth h = BW_{1∕3} from 300 different samples, each consisting of n = 10^{3} solutions of the Fermi map; the true values of the CB are also calculated from the KDEs with bandwidth h = BW_{1∕5} (dotted red line) and KDEs with bandwidth h = BW_{1∕3} (dotted blueline) from the same 300 samples. 
3.4 Combining samples
Having obtained a welltested uncertainty estimator of KDE for correlated data, the practical question arose of how to efficiently combine the various samples of different correlation lengths l_{corr}. Assuming we have m samples of the same size n, , where the correlation within each sample is different, the KDE of these m samples are , and their pointwise standard error can be estimated by MBB as .
If all the samples conform to the same probability density p, we can simply use the inverse variance weighting to get our combined KDE: (15)
where and are the individual KDE and its pointwise estimated variance, respectively. The variance of the weighted average will be . With this choice, the variance of the weighted average will be minimized and we can thereby have the most accurate estimation of the density p (Hartung et al. 2011).
This inverse variance weighting is only applicable if we assume that all our samples follow the same underlying distribution, as for example in the DGMM. In the case of the Solar System, different samples come from different sets of initial conditions. The densities evolving from different initial conditions will be different. Although they might converge toward a common density function after a large period of time, the assumption that different samples follow the same distribution is not generally true. However, if the differences between the density estimations from different samples are small compared to their insample density uncertainty, then we can assume a common true density that all the samples share, and therefore the inverse variance weighted mean and its minimized variance will in this case capture this assumed density function. A more conservative approach consists in not trying to estimate the common true density, but rather capturing the variability stemming from different samples and combining it with their insample uncertainty. In this case, a combined KDE can be introduced as a pointwise random variable whose distribution function is given by: (16)
where is defined in Eq. (5) and is the MBBestimation of the distribution of the pointwise KDE from sample X_{i}. If the m samples are good representatives of any other sample taken from a certain population, then is a reasonable choice for the KDE from an arbitrary sample generated from that same population. We sample in a pointwise manner. For each value of x, the same number of realizations are drawn from each of the m MBB distributions, which are assumed here to be Gaussian for practical purposes. It should be noticed that a realization of is not a continuous curve. When needed, we could choose the pointwise median of as the nominal continuous combined KDE.
4 Application to the Solar System
The goal of this paper is to have a consistent statistical description of the propagation of the dynamical uncertainty on the fundamental secular frequencies of the Solar System induced by its chaotic behavior: that is, simply put, to obtain their timeevolving marginal PDF. We first sampled the initial orbital elements of the Solar System planets that were close to the reference value, and then numerically integrated the secular equations (Eqs. (1)) from those initial conditions to obtain a sample of orbital solutions. Kernel density estimation was then used to estimate the marginal PDF of the frequencies of the sample at a fixed value of time, and finally the MBB method was applied to estimate the uncertainty of the density estimation.
The evolution of our sample can be divided into two stages: Lyapunov divergence and chaotic diffusion. In the first stage, because the initial density is extremely localized around the reference value, all solutions essentially follow the reference trajectory; the difference between the solutions is very small but diverges exponentially with a characteristic Lyapunov exponent of ~ 1∕5 Myr^{−1} (Laskar 1989). The solutions in the first stage are almost indistinguishable and the correlation between them is so great that regardless of how many solutions in the sample we integrated, the effective size of the sample is close to one. The second stage begins when the differences between the solutions are large enough to become macroscopically visible. The Lyapunov divergence saturates and gives place to chaotic diffusion. The correlation between the solutions starts to decrease, the distribution of the sample settles, and the memory of the initial conditions fades. It can take several hundred million years for the sample to forget its initial configuration. Contrary to the exponential growth in the first stage, the dispersion of the samples expands slowly with a power law in time (see Fig. 10). The time boundary between the two stages depends on the dispersion of the initial conditions: the wider they are, the faster the second stages come, and vice versa. If they are chosen to represent the uncertainty of the current ephemeris, the second stage should take place around 60 Myr in the complete model of the Solar System (Laskar et al. 2011a,b).
In this section, we focus on the statistical description of the fundamental frequencies of the Solar System in the second stage. The aim is to obtain a valid estimation of timeevolving PDFs of the frequencies beyond 60 Myr. However, the PDF evolution generally depends on the choice of initial conditions. Moreover, the simplification of the secular equations compared to the complete model of the Solar System could, a priori, provide results that are not sufficiently accurate.
Offsets of the initial eccentricities of the four planets: {Mercury, Venus, Earth,Mars}, which corresponds to i = {1, 2, 3, 4}.
4.1 Choice of initial conditions
For a complete model of the Solar System, the initial conditions should be sampled in such a way that they are representative of the current planetary ephemeris uncertainty. Nevertheless, for a simplified secular model (Eq. (1)), the difference from the complete model is greater than the ephemeris uncertainty. An optimized secular solution follows the complete solution initially but departs from it long before 60 Myr (Laskar et al. 2004). A direct adaptation of the current planetary ephemeris uncertainty to the initial conditions of the secular model can thus be misleading. Therefore, we adopted a more cautious approach, that is, to study first the effect of sampling the initial conditions on the PDF estimation.
Initial conditions can be sampled in many ways, especially in a highdimensional system such as the Solar System. Our choice of initial conditions isquite particular, but they encompass different possible ways of sampling initial conditions (Table 1). There were three batches, {X_{i}}, {Y_{i}}, and {Z_{i}}, which correspond to three different variation sizes ϵ of initial conditions. Each batch was composed of four different sets of samples. Each set contained 10 000 initial conditions, where the eccentricity of the associated planet is linearly spaced from the reference value, with the spacing ϵ corresponding to the batch it belongs to. We then integrated the secular equations (Eq. (1)) from these initial conditions 500 Myr into the past and 500 Myr into the future. For the batch {Z_{i}}, the integration time is 5 billion years in both directions. The frequencies were then extracted using FA (Sect. 2.2). It should be noted that we do not aim to obtain the joint probability distribution of all the fundamental frequencies, but rather their individual marginal PDFs (i.e., the PDF of one frequency at a time). The marginal PDFs of the frequencies were estimated by the KDE with the ruleofthumb bandwidth (BW_{1∕5}); upper bounds of their 95% CIs are measured by the MBB method with the bandwidth h = BW_{1∕3} and the optimized parameters (Eqs. (11)–(12)) from 1 000 MBB samples.
We first compare the evolution of the density of the four sets in each batch in Sect. 4.2, and the second test is performed to compare the statistics between the batches in Sect. 4.3. The robustness of the secular statistics is assessed by these two tests, which additionally shed light on the initialconditiondependence aspect of the statistics. All the density estimations from these sets are compared with those of the 2500 complete solutions obtained in the previous work of Laskar & Gastineau (2009) to test the accuracy of the secular statistics. It should be recalled that this numerical experiment needed 8 million hours of CPU time, the output of which was saved and could thus be used in the present study.
When comparing two sets of different sizes, because the rates of divergence in the first stage are similar, the wider set reaches the chaotic diffusion phase faster than the more compact set; hence, it is essentially diffusing ahead for a certain time in the second stage. Therefore, in order to have a relevant comparison, a proper time shift was introduced to compensate for this effect. We shifted {X_{i}} and {Y_{i}} ahead by 30 Myr and 20 Myr, respectively, while keeping the time of {Z_{i}} as reference. This choice was motivated by the fact that the transition to the chaotic diffusion of {Z_{i}} is around 50 – 60 Myr, which is indicated by the direct integration of the Solar System (Laskar et al. 2011a,b).
Fig. 5 Density estimation with 95% pointwise CI of the fundamental frequencies of the four sets of the batch . Frequency values are obtained by FA over an interval of 20 Myr centered at 150 Myr in the future. 
4.2 First test: different samples of the same variation size ϵ
Comparing the density estimation of different sets in the same batch was the first test of prediction robustness from our secular model. The evolution of the density estimations are sensitive to how initial conditions are sampled, and therefore this initialcondition sensitivity must be quantified for a valid prediction. In this first test we compared the timeevolving PDFs whose initial conditions are sampled with the same variation size ϵ but in different variables.
The result is quite clear. The different sets of the same batch slowly lose the memory of their initial differences due to chaos and then converge toward the same distribution. This convergence is illustrated by Fig. 5, which shows that the density estimation of of the four sets of the batch {Z_{i}} nearly overlap with one another at 150 Myr in the future. The rates of convergence of different batches are different. So although {X_{i}} and {Y_{i}} exhibit the same behavior as {Z_{i}}, they converge differently with disparate rates: At around 100–150 Myr in the future, the density estimations of the frequencies of {X_{i}} nearly overlap with one another; this occurs at 150–200 Myr for {Y_{i}}, depending on the frequency. Interestingly, for the samples that are integrated in the past, the rates of convergence are higher and the overlap generally happens at around −100 Myr (see Fig. 6).
4.3 Second test: different samples of different variation sizes
Comparing the density estimation of the three batches, {X_{i}}, {Y_{i}}, and {Z_{i}}, was our second test of robustness. Although the initial conditions of the three batches were varied around the same reference values, the ways they were sampled were different since the variation sizes were different. Differences in the initial variation sizes mean that the batches enter the diffusion stage at different times and also at different points in the phase space, so that the convergence between batches, if it exists, takes longer. The result of our test is summarized by the density estimation of the frequencies at two times in the past, − 100 Myr and − 200 Myr (Fig. 6). At − 100 Myr, the density estimations from different sets of each batch cluster around one another as described in the previous test. Each batch forms a cluster of density estimations, and the differences between the three clusters are noticeable. Moreover, the estimation uncertainty, depicted by the colored band, is quite large. Fastforward 100 Myr of chaotic mixing: at − 200 Myr, density estimations of the frequencies spread out, estimation uncertainty shrinks, and, most importantly, differences between the three batches are much smaller and continue to diminish even further with time. In the opposite time direction, the same phenomenon is observed but the rate of convergence between the batches is slower (Fig. 7). At 150 Myr in the future, the density estimations of the frequencies of {X_{i}} and {Z_{i}} have practically converged and those of {Y_{i}} are still trying to, and yet differences between the three batches are noticeable. However, the estimations of all 12 sets from the three batches nearly overlap with one another at 350 Myr, which demonstrates that the effect of different initial samplings vanishes via chaotic mixing. It should be noted that when looking at some specific properties of the PDF, such as means and variances, the differences between the sets are small. For example, the differences between the means of the PDF estimation of the 12 sets are generally smaller than 0.1″ yr^{−1} for most of the fundamental frequencies at − 100 Myr; at −200 Myr, the differences diminish to twice as small.
Fig. 6 Density estimation with 95% pointwise CI of the fundamental frequencies of the 12 sets coming from the three batches: {X_{i}} (yellow colors), {Y_{i}} (blue colors), and {Z_{i}} (green colors). Estimations of sets from the same batch have the same color. Frequencies are obtained by FA over an interval of 20 Myr centered at 100 Myr in the past (first and third rows) and 200 Myr in the past (second and fourth rows). The time reference is the time of {Z_{i}}, while the solutions of {X_{i}} and {Y_{i}} are shifted ahead by 30 Myr and 20 Myr, respectively. 
4.4 Final test: comparison with the complete model
The convergence of the density estimations from different sets of initial conditions (seen in the first two tests) does not guarantee convergence between the secular model and the complete model of the Solar System. As a result, whether the secular statistics will resemble that of the complete model is still not clear. In this final test, we respond to this question by comparing the density estimation from our secular solutions with those obtained from the 2500 solutions of the complete model integrated into the future (Laskar & Gastineau 2009). The initial conditions of the complete solutions are sampled in a way similar to the present work, that is, one variable (the semimajor axis of Mercury) is linearly spaced with a spacing of 0.38 mm and a range of about one meter. We shifted the complete solution backward by 70 Myr to adjust for the difference in the initial variation sizes and also for the difference between the initial divergence rates.
The density estimations of the frequencies of the 2500 complete solutions are shown along with the estimations from the secular model in Fig. 7. The estimations of the complete model are either close to or overlap with those of the secular model, even at 150 Myr. Both models even predict the same minor features in the frequency density, for example the second peaks of g_{3} or the tails of g_{4}. The origins of these features are related to the resonances associated with g_{3} and g_{4}, so having thesesame features indicates that the secular model could capture the resonance dynamics of the complete model. At 350 Myr, for most of the frequencies, the differences between the results of the two models are very small, especially for some frequencies, such as g_{1}, where it is difficult to distinguish between the two models. If we look at some specific properties of the PDF, such as its mean, the differences between the secular model and the complete model are on the same order as the variability of the results from the same secular model. These differences in the means of the PDFs are generally smaller than 0.05′′yr^{−1} at 350 Myr. Although some minor differences between the two models are still visible, especially in s_{1} for example, these differences diminish with time. This convergence between the two models strongly suggests the compatibility of the secular system with the direct integration of the realistic model of the Solar System used in Laskar & Gastineau (2009).
Fig. 7 Density estimation with 95% pointwise CI of the fundamental frequencies of the 12 sets coming from the three batches, {X_{i}} (yellow colors), {Y_{i}} (blue colors), and {Z_{i}} (green colors), and from the 2500 solutions of complete model of Laskar & Gastineau (2009) (red colors), which is denoted as La09. Estimations of sets from the same batch have the same color. Frequencies are obtained by FA over an interval of 20 Myr centered at 150 Myr (first and third rows) and 350 Myr (second and fourth rows) in the future. The time reference is the time of {Z_{i}}, while the solutions of {X_{i}}, {Y_{i}}, and La09 areshifted by 30 Myr, 20 Myr, and −70 Myr, respectively. 
Fig. 8 Density estimation with 95% pointwise CIof the fundamental frequencies of the set Z_{1}, which are obtained by FA over an interval of 10 Myr (blue colors), 20 Myr (yellow colors), and 50 Myr (green colors) centered at −150 Myr (top row) and −350 Myr (bottom row). 
4.5 A complementary test on frequency analysis
The fundamental frequencies of the Solar System are central in our work, and the method to obtain them is thus essential. In this section, we briefly examine the FA method (Sect. 2.2). The FA method searches a quasiperiodic approximation of the solution of the Solar System with constant frequencies over a time window Δt. So unique frequencies at time t are extracted from an oscillating sequence in the time interval [t − Δt∕2, t + Δt∕2]. For a quasiperiodic solution, the longer we choose the time interval Δt to be, the more accurate the extracted frequencies are. Nevertheless, the fundamental frequencies of the Solar System are expected to vary over a few Lyapunov times, that is, over 5 Myr. Therefore, we have a tradeoff between the extraction accuracy and the time variation of the frequencies when choosing Δt: When Δt is too large, the obtained frequency will tend to be the average of its variation over the same period. We chose Δt = 20 Myr as the standard FA interval. In some circumstances that require the detection of rapid changes in frequencies, such as the resonance transition, a smaller Δt is more favorable (see Sect. 6.3).
The extracted frequencies are sensitive to the choice of Δt, and yet their density estimation is relatively robust. Figure 8 compares the density estimation of the eccentricity frequencies , which are extracted via FA with three different Δt at two different times. The differences are generally small but still notable for g_{2} and g_{4} at 150 Myr, and they diminish with time.
5 Parametric fitting
From Sect. 4, we see that the longterm PDFs of the secular frequencies possess a distinct Gaussianlike shape that flattens as time passes. It is interesting to approximate these densities by a simple parametric model, such that its parameters can characterize the shape of the density and summarize its evolution. The model with its fitting parameters can also be used as an approximation of the numerical densities for later application. For this purpose, we used the density estimation of the secular frequencies of the Solar System from the batch {Z_{i}}, which is composed of40 000 different orbits over 5 Gyr. The inner fundamental frequencies (i.e., the frequencies of the inner planets, ) are obtained by FA over an interval of 20 Myr. For the outer fundamental frequencies (i.e., the frequencies of the outer planets, , ), the FA interval is 50 Myr. The frequency s_{5} is zero due to the conservation of the total angular momentum of the Solar System.
5.1 Skew Gaussian mixture model
Laskar (2008) found that the 250 Myraveraged marginal PDFs of the eccentricity and inclination of the inner Solar System planets are described quite accurately by the Rice distribution, which is essentially the distribution of the length of a 2D vector when its individual components follow independent Gaussian distributions. In in our case, the density estimations of the fundamental frequencies of the Solar System resemble Gaussian distributions, but many of them get skewed as time passes; this is especially true for the inner frequencies. To account for this skewness, we propose the skew normal distribution as the fitting distribution to the density estimation of the frequencies: (17)
where α is the parameter characterizing the skewness, ϕ(x) denotes the standard normal probability density distribution with mean μ_{0} and standard deviation σ_{0}, and Φ(x) is its cumulative distribution function given by (18)
where erf denotes the error function.
Some of the frequencies, interestingly, have several secondary modes in their density estimation apart from their primary one. Most of the secondary modes, if they exist, are quite small compared to the primary one. They are also often short lived; most of them emerge at the beginning of the diffusion stage and disappear quickly thereafter. Therefore, they are not included in this parametric model, the aim of which is to fit the longterm PDF of the fundamental frequencies. However, some persist for a long time and have a small but nonnegligible amplitude. To account for these secondary modes, we simply added Gaussian functions to the skew Gaussian distribution and adjusted for their amplitudes so that the fitting distribution is ultimately the skew Gaussian mixture model: (19)
where and m is the number of secondary modes. The secondary modes are much smaller than the primary mode: . In our case here, m = 1 for g_{4} and s_{3}, while the asymptotic secondary modes for the other frequencies can be considered negligible.
For the outer fundamental frequencies, we do not observe significant skewness or secondary modes in their density estimations. Therefore, the density estimations of the outer fundamental frequencies can be approximated by a simple Gaussian distribution.
The density estimations of the frequencies are shown at three wellseparated times in Fig. 9. The diffusion is clearly visible as the density estimations get more and more disperse over time. Moreover, the density estimations get more skewed as time goes by. For g_{3} and g_{4}, the skewness of the density estimations is small, so we assume α_{0} = 0 (Eq. (17)) for these frequencies for the sake of simplicity. The fundamental frequencies of the outer planets are very stable. Their variations are much smaller than those of the inner frequencies. Taking as an example the most unstable of the outer frequencies, g_{6}, its standard deviation at − 4 Gyr is only about 2 × 10^{−3}′′ yr^{−1}, while that of the most stable inner frequency, g_{2}, is 15 times larger. Therefore, when considering a combination involving an inner fundamental frequency and an outer one, the latter can be effectively regarded as a constant.
The result of our fitting model is also plotted in Fig. 9. It is remarkable that the density estimations of the frequencies are well approximated by the fitting curve over three different epochs. It should be noticed that the base of our fitting model – the skew Gaussian distribution – only has three parameters. Additional parameters are only needed for some frequencies, for example g_{4} and s_{3}. Nevertheless, such additional parameters only account for the minor features, and three parameters are sufficient to represent the bulk of the density estimations over a long timescale.
5.2 Evolution of the parameters
The parameters of our fitting models are extracted by the method of least squares, implemented by the routine curve_fit in the scipy package in Python. To retrieve the statistical distribution (mean and standard deviation) of the parameters of a given model, we implemented a bootstrap approach based on Eq. (16), with the assumption that pointwise standard errors of the KDE estimated by MBB are independent. We remark that, with such an assumption, the variance of the fitting parameters tends to be underestimated.
The time evolution of the mean of the parameters of the fitting models is shown in Fig. 10, along with their ± 2 standard error, for both the past and the future. It turns out that the evolution of is robustly fitted from 200 Myr to 5 Gyr in the past by the powerlaw function (20)
where T = t∕(1 Gyr), as shown in Fig. 10. For all the other parameters, we performed a linear fit. All these fits are summarized in Table 2.
The differences between past and future evolutions are small and generally tend to decrease with time. Therefore, the fit for the parameters in the past given in Table 2 should also be representative of their future evolution. In general, the parameters follow relatively smooth curves with distinct tendencies. The skewness parameters α_{0}, increasing in absolute value, show that the PDFs of the inner fundamental frequencies get more and more skewed over time. The center of the distributions, indicated by μ_{0}, does not change significantly compared to α_{0} and (Fig. 10). The secondary modes of g_{4} and s_{3} are also quite stable.
The diffusion of the frequencies is quantified by the increasing , which is closely linked to their distribution variance. As the exponents b of the power laws in Table 2 are different from unity, the chaotic diffusion of the fundamental frequencies turns out to be an anomalous diffusion process. Interestingly, all the inner frequencies clearly undergo subdiffusion, that is, the exponent of the power law b is smaller than 1 (b = 1 corresponding to Brownian diffusion). Therefore, an extrapolation of the variance of the inner frequencies based on the assumption of linear diffusion over a short time interval would generally lead to its overestimation over longer times. On the contrary, the exponents b are either smaller or larger than unity for the outer frequencies. It should be noted that, because the variations in the outer frequencies are very small, the value of the corresponding exponents b might be overestimated due to the finite precision of FA.
Fig. 9 Density estimation with 95% pointwise CI of the fundamental frequencies of the Solar System from the {Z_{i}} solutions inthe past at 500 Myr (red band), 1 Gyr (green band), and 4 Gyr (blue band); the dashed curves with the corresponding colors denote their fitting distribution (Eqs. (19) and Table 2). The frequencies are obtained by FA over an interval of 20 Myr. 
6 Geological application
The aim of this project is to have a reliable statistical picture of the secular frequencies of the Solar System beyond 60 Myr. It is interesting to put recent geological results in this astronomical framework. First, it can be used as a geological test of our study, and secondly the application provides a glimpse of how astronomical data could be used in a cyclostratigraphy study. We first show how the uncertainty of a widely used dating tool in astrochronology can be quantified, then we apply our work to two recent geological results, which are from the NewarkHartford data (Olsen et al. 2019) and the Libsack core (Ma et al. 2017).
Fig. 10 Evolution of the parameters of the skew Gaussian distribution in the fitting model (Eq. (19)) for the density estimation of the fundamental frequencies of the Solar System of {Z_{i}} solutions in the past (blue line) and in the future (green line). The bands denote the ± 2 standard error. The dashed orange lines denote the powerlaw fits for and the linear fits for μ_{0} and α_{0}. 
6.1 Astronomical metronomes
Although it is not possible to recover the precise planetary orbital motion beyond 60 Myr, some astronomical forcing components are stable and prominent enough such that they can be used to calibrate geological records in the Mesozoic Era or beyond (see Laskar 2020 for a review). The most widely used is the 405 kyr eccentricity cycle g_{2} − g_{5}, which is the strongest component of the evolution of Earth’s eccentricity. The inclination cycle s_{3} − s_{6} has also recently been suggested for the time calibration of stratigraphic sequences (Boulila et al. 2018; Charbonnier et al. 2018). Although s_{3} − s_{6} is not the strongest among the obliquity cycles, it is quite isolated from other cycles and thus easy to identify (Laskar 2020). The main reason that g_{2} − g_{5} and, possibly, s_{3} − s_{6} can be used as astronomical metronomes is their stability. Indeed, their uncertainty has to be small to be reliably used for practical application.
In previous work, the uncertainty of the frequency combination was derived from the analysis of a few solutions (Laskar et al. 2004, 2011a; Laskar 2020). Here we used a much larger number of solutions and the KDEMBB method to derive both the PDF of the frequencies and their statistical errors. Starting from the results of all the sets of orbital solutions (Sect. 4), we produced the compound density estimation of the fundamental frequencies and their relevant combinations following the conservative approach in Eq. (16). These data are archived with the paper.
With this result, we can reliably estimate the uncertainty of the astronomical metronomes. The uncertainty of the cycles g_{2} − g_{5} and s_{3} − s_{6} is in fact the density width of the period of these frequency combinations, which is shown in Fig. 11. As time goes by, the two metronomes become more uncertain as their density spreads due to the chaotic diffusion, but they are still reliable. At 400 Myr in the past, the relative standard deviations (ratio of standard deviation to mean) of the g_{2} − g_{5} and s_{3} − s_{6} metronomes are approximately 0.4% and 1.26%, respectively.
Fig. 11 Density estimation with 95% pointwise CI of the period of g_{2} − g_{5} (top row) and s_{3} − s_{6} (bottom row) from the combining solutions from {X_{i}}, {Y_{i}}, and {Z_{i}}. Frequencies are obtained by FA over an interval of 20 Myr centered at − 100 Myr (left column), − 210 Myr (middle column), and − 400 Myr (right column). 
Fig. 12 Density estimation with 95% pointwise CI (blue colors) of the secular frequencies (right column) and their combinations (left column) of the combined solutions from {X_{i}}, {Y_{i}}, and {Z_{i}}, along withthe corresponding values extracted from the La2010d solution (yellow lines), the NewarkHartford data (redlines; Olsen et al. 2019), and the current values (green lines). The frequencies are obtained by FA over an interval of 20 Myr centered at 210 Myr in the past. The time reference is the time of {Z_{i}}. 
6.2 NewarkHartford data
In Olsen et al. (2019) the astronomical frequencies were retrieved from a long and wellreserved lacustrine deposit. The frequency signals of the NewarkHartford data are very similar to that of the astronomical solution La2010d, which was taken from 13 available astronomical solutions (Laskar et al. 2011a). Having 120 000 astronomical solutions at hand, we can derive a more precise statistical analysis of this result. The fundamental frequencies from the geological record were obtained in the following way. The data were dated by the relatively stable 405 kyr cycle (g_{2} − g_{5}) and additionally verified by a zircon UPbbased age model (Olsen et al. 2019). An FA (Sect. 2.2) was performed on the geological data as well as on the eccentricity of the astronomical solution La2010d to retrieve their strongest frequencies. The FA of geological data was crosschecked with that of the astronomical solution to identify its orbital counterpart. For example, the third strongest frequency from the Newark data, 12.989″ yr^{−1}, was identified with the second strongest frequency from La2010d, 12.978″ yr^{−1}. which is g_{3} − g_{5}. The frequency g_{3} was then derived by summing the obtained combination with g_{5}, which is nearly constant. The same goes for g_{1}, g_{2}, and g_{4}. Definite values of astronomical fundamental frequencies were thus determined. Unfortunately, the uncertainty estimation of the geological frequencies is not yet available.
Our density estimation of the frequencies was obtained by combining all our samples in the conservative approach outlined in Sect. 3.4. Figure 12 compares the secular frequencies (right) and their combinations (left) extracted from Newark–Hartford data with our density estimation around the same time in the past. For , both the geological and La2010d frequencies lie relatively close to the main peak of their corresponding density, while their values for g_{4} are near thetail. The frequency combinations tell the same story since the geological values involving g_{4} are off from the main peak. Yet, they are all consistent with our density estimation as there is a nonnegligible possibility of finding a secular solution that agrees with the geological data. It should be noted that certain frequencies are significantly correlated, g_{3} and g_{4} for example. We cannot assume that they are independent; therefore, we have to calculate the density estimation of their combination directly.
Given the unavailability of the uncertainty in geological frequencies, the probability of finding the geological frequencies in a numerical orbital solution cannot be obtained directly. However, we can use La2010d, which is the solution from the complete model of the Solar System that matches best with the NewarkHartford data, as the benchmark for our secular statistics. There are several criteria for determining how good a solution is (i.e., how well it could match with the geological data). A simple and rather straightforward criterion that we used is , where g_{i} and are the frequencies from the astronomical solution and the geological data, respectively. A bettersuited solution will have smaller δ and vice versa. We found that in the range from − 200 Myr to − 220 Myr, out of the 120 000 solutions, there are around 5000 (roughly 4.2% of the total number) that have smaller s than that of La2010d at −210 Myr, which is the value originally used to compare with the geological data. It should be noted that La2010d is one of 13 available complete solutions. The 95% CI of the probability of obtaining such a good matching solution from the complete model of the Solar System is thus (1.37%, 33.31%) (Wilson 1927). Therefore, with the criterion δ, our result is statistically compatible with that of Olsen et al. (2019).
Fig. 13 Density estimation with 95% pointwise CI (red colors) of g_{4} − g_{3} (left column) and g_{4} (right column) at − 90 Myr (first row) and − 300 Myr (second row)of {Z_{i}}, overlaid with their histograms (gray blocks) for better visualization. Frequency values are obtained by FA over an interval of 10 Myr. 
Fig. 14 Percentage of the solutions (black line) whose g_{4} − g_{3} > 0.9 at each time (left) and over a 10 Myr interval (right), with their 90% CI estimation uncertainty (larger gray band) and their standard errors (smaller gray band). The time interval of the Libsack record, which is 80 Myr to 90 Myr, is denoted by the red band in the left plot and by a red line in the right plot. 
6.3 Libsack core
Laskar (1990, 1992) presented several secular resonances to explain the origin of chaos in the Solar System. In particular, the argument of the resonance (s_{4} − s_{3}) − 2(g_{4} − g_{3}) is currently in a librational state, that is, (21)
and moves out to the rotational state around − 50 Myr. The dynamics can even switch to the librational state of a new resonance: (22)
This transition corresponds to a change from the 2:1 resonance to the 1:1 resonance of two secular terms, g_{4} − g_{3} and s_{4} − s_{3}.
Ma et al. (2017) found a sudden change in the period of a long cycle from 2.4 Myr to 1.2 Myr in the Libsack core of the Cretaceous basin from around − 90 Myr to − 83 Myr. This change was also visible in the La2004 astronomical solution, and the long cycle was attributed to the frequency combination g_{4} − g_{3}, which is visible from the spectrum of the eccentricity of the Earth. Although the exact value before and especially after the transition is not clear, the change in the period is visible from the band power of the core (Fig. 1 of Ma et al. 2017).
This change in g_{4} − g_{3} observed in the Libsack core corresponds to a transition from the resonance (s_{4} − s_{3}) − 2(g_{4} − g_{3}), which is the resonance that the Solar System is currently at, to the resonance (s_{4} − s_{3}) − (g_{4} − g_{3}). With a large number of astronomical solutions, we could better understand this phenomenon. The transition is usually very fast: The frequency changes quickly to another value and then reverts back just as quickly. Therefore, to study the transition we used a smaller window for the FA, 10 Myr instead of 20 Myr. Figure 13 shows the density estimation of g_{4} − g_{3} at 90 Myr as well as at 300 Myr in the past. Both have a principle population in the range and a small but not insignificant one centered around 1.0″ yr^{−1}, which corresponds to the small chunk of g_{4} centered at 18.2″ yr^{−1}. The transition observed is a sudden jump in frequency from the main population to the secondary one of g_{4}, and therefore g_{4} − g_{3} as well.
The size of the secondary population is defined as the proportion of the solutions whose g_{4} − g_{3} > 0.9′′ yr^{−1} and is denoted as . The rate of transition is defined as the proportion of the solutions whose g_{4} − g_{3} > 0.9″ yr^{−1} over 10 Myr and is denoted as ′′ yr^{−1}). Both are shown in Fig. 14. During the predictable period, that is, from now until − 50 Myr, no transition is observed. After −50 Myr, a transition can occur; its rate rises until 100 Myr, when the percentage of a secondary population stays relatively stable at around 2.1% ± 0.5% at a time, and the rate of the transition could also be determined to be 8% ± 1% every 10 Myr during this period. At 80–90 Myr, when the transition was detected in the Libsack core, that rate of transition during this period is found with our numerical solutions to be 7.7% ± 4.5%.
7 Conclusion
In this work we give a statistical description of the evolution of the fundamental frequencies of the Solar System beyond 60 Myr, that is, beyond the predictability horizon of the planetary motion, with the aim to quantify the uncertainty induced by its chaotic behavior. The base of our analysis is 120 000 orbital solutions of the secular model of the Solar System. The PDF of the frequencies is estimated by the KDE, and its uncertainty is evaluated by the MBB; both methods are tested via numerical experiments.
We benchmarked the secular model by sampling the initial conditions in different ways and then compared the density estimation of their solutions with one another and finally with the complete model. The results are twofold. First, regardless of how initial conditions were sampled, their density estimation will converge toward a single PDF; after this overlap, a robust estimation is guaranteed. Secondly, the density estimation of the secular model is compatible with that of the complete model of the Solar System. This agreement means that the results of the secular model, with superior computational simplicity, can be used for application to geological data.
We observe that the density estimations of the fundamental frequencies can be well fitted by skew Gaussian mixture models. The time evolution of the parameters , related to the frequency variances, follows powerlaw functions. Interestingly for the inner fundamental frequencies, the exponents of such power laws are all smaller than 1, which indicates that they undergo subdiffusion processes.
We show several examples of how this result can be used for geological applications. First, the uncertainty of any astronomical frequency signal is fully quantified, so that, for example, a proper quantitative response can be given to the question of how stable the astronomical metronomes are. With this statistical framework, previous results from geological records beyond 60 Myr can also be interpreted with a more comprehensive approach. A more quantitative answer, not only about the possibility but also about the probability of the occurrence of an astronomical signal in geological data, can be made. Apart from these direct applications, a more systematic approach could make full use of the density estimation of frequencies. The method TimeOpt from Meyers & Malinverno (2018), for example, shows that it is possible to combine the uncertainty from astronomical signals with geological records to derive an effective constraint for both. In fact, any similar Bayesian method could use the density estimation of frequencies as proper priors.
Acknowledgements
N.H. is supported by the PhD scholarship of CFM Foundation for Research. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Advanced Grant AstroGeo885250) and from the French Agence Nationale de la Recherche (ANR) (Grant AstroMeso ANR19CE31000201).
References
 Bosq, D., & Guégan, D. 1995, Stat. Probab. Lett., 25, 201 [Google Scholar]
 Boulila, S., Vahlenkamp, M., De Vleeschouwer, D., et al. 2018, Earth Planet. Sci. Lett., 486, 94 [Google Scholar]
 Calonico, S., Cattaneo, M. D., & Farrell, M. H. 2018, J. Am. Stat. Assoc., 113, 767 [Google Scholar]
 Charbonnier, G., Boulila, S., Spangenberg, J. E., et al. 2018, Earth Planet. Sci. Lett., 499, 266 [Google Scholar]
 Cheng, G., Chen, Y.C., et al. 2019, Electron. J. Stat., 13, 2194 [Google Scholar]
 Efron, B. 1979, Ann. Stat., 7, 1 [Google Scholar]
 Fermi, E. 1949, Phys. Rev., 75, 1169 [Google Scholar]
 Gastineau, M., & Laskar, J. 2011, ACM Commun. Comput. Algebra, 44, 194 [Google Scholar]
 Gradstein, F., & Ogg, J. 2020, in Geologic Time Scale (Amsterdam: Elsevier), 21 [Google Scholar]
 Gradstein, F. M., Ogg, J. G., Smith, A. G., Bleeker, W., & Lourens, L. J. 2004, Episodes, 27, 83 [Google Scholar]
 Gradstein, F. M., Ogg, J. G., Schmitz, M., & Ogg, G. 2012, The Geologic Time Scale (Amsterdam: Elsevier) [Google Scholar]
 Hall, P., Horowitz, J. L., & Jing, B.Y. 1995, Biometrika, 82, 561 [Google Scholar]
 Hang, H., Steinwart, I., Feng, Y., & Suykens, J. A. 2018, J. Mach. Learn. Res., 19, 1260 [Google Scholar]
 Hart, J. D. 1996, J. Nonparametric Stat., 6, 115 [Google Scholar]
 Hartung, J., Knapp, G., & Sinha, B. K. 2011, Statistical Metaanalysis with Applications (Hoboken: John Wiley & Sons), 738 [Google Scholar]
 Hastings, W. K. 1970, Biometrika, 57, 97 [Google Scholar]
 Kass, R. E., Carlin, B. P., Gelman, A., & Neal, R. M. 1998, Am. Stat., 52, 93 [Google Scholar]
 Kreiss, J.P., & Lahiri, S. N. 2012, in Handbook of Statistics (Amsterdam: Elsevier), 30, 3 [Google Scholar]
 Kuffner, T. A., Lee, S. M.S., & Young, G. A. 2019, ArXiv eprints [arXiv:1909.02662] [Google Scholar]
 Kunsch, H. R. 1989, Ann. Stat., 1217 [Google Scholar]
 Laskar, J. 1985, A&A, 144, 133 [NASA ADS] [Google Scholar]
 Laskar, J. 1988, A&A, 198, 341 [NASA ADS] [Google Scholar]
 Laskar, J. 1989, Nature, 338, 237 [Google Scholar]
 Laskar, J. 1990, Icarus, 88, 266 [Google Scholar]
 Laskar, J. 1992, IAU Symp., 152, 1 [Google Scholar]
 Laskar, J. 1993, Physica D, 67, 257 [Google Scholar]
 Laskar, J. 2005, in Hamiltonian Systems and Fourier Analysis: New Prospects For Gravitational Dynamics, eds. D. Benest, C. Froeschle, & E. Lega (Cambridge: Cambridge Scientific Publishers Ltd) [Google Scholar]
 Laskar, J. 2008, Icarus, 196, 1 [Google Scholar]
 Laskar, J. 2020, in Geologic Time Scale (Amsterdam: Elsevier), 139 [Google Scholar]
 Laskar, J., & Gastineau, M. 2009, Nature, 459, 817 [Google Scholar]
 Laskar, J., Robutel, P., Joutel, F., et al. 2004, A&A, 428, 261 [Google Scholar]
 Laskar, J., Fienga, A., Gastineau, M., & Manche, H. 2011a, A&A, 532, A89 [Google Scholar]
 Laskar, J., Gastineau, M., Delisle, J.B., Farrés, A., & Fienga, A. 2011b, A&A, 532, L4 [Google Scholar]
 Liu, R. Y., Singh, K., et al. 1992, Explor. Limits Bootstrap, 225, 248 [Google Scholar]
 Ma, C., Meyers, S. R., & Sageman, B. B. 2017, Nature, 542, 468 [Google Scholar]
 MaumeDeschamps, V. 2006, Stoch. Dyn., 6, 535 [Google Scholar]
 Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. 1953, J. Chem. Phys., 21, 1087 [Google Scholar]
 Meyers, S. R.,& Malinverno, A. 2018, Proc. Natl. Acad. Sci. U.S.A., 115, 6363 [Google Scholar]
 Milankovitch, M. 1941, Konigl. Serb. Akad. Beograd Spec. Publ., 132 [Google Scholar]
 Mogavero, F. 2017, A&A, 606, A79 [Google Scholar]
 Murray, N. W., Lieberman, M. A., & Lichtenberg, A. J. 1985, Phys. Rev. A, 32, 2413 [Google Scholar]
 Olsen, P. E., & Kent, D. V. 1999, Philos. Trans. Roy. Soc. London A: Math. Phys. Eng. Sci., 357, 1761 [Google Scholar]
 Olsen, P. E., Laskar, J., Kent, D. V., et al. 2019, Proc. Natl. Acad. Sci. U.S.A., 201813901 [Google Scholar]
 Parzen, E. 1962, Ann. Math. Stat., 33, 1065 [Google Scholar]
 Robinson, P. M. 1983, J. Time Ser. Anal., 4, 185 [Google Scholar]
 Rosenblatt, M. 1956, Proc. Natl. Acad. Sci. U.S.A., 42, 43 [Google Scholar]
 Silverman, B. W. 1986, Density Estimation for Statistics and Data Analysis, (Cambridge: CRC Press), 26 [Google Scholar]
 Wilson, E. B. 1927, J. Am. Stat. Assoc., 22, 209 [Google Scholar]
All Tables
Offsets of the initial eccentricities of the four planets: {Mercury, Venus, Earth,Mars}, which corresponds to i = {1, 2, 3, 4}.
All Figures
Fig. 1 Kernel density estimation (red line) with bandwidth h = BW_{1∕3} and its 90% pointwise CI (red band) from an MCMC sample of n = 10^{4} units with l_{corr} = 140 of a DGMM(Eq. (13)  black line). 

In the text 
Fig. 2 Estimation of the 90% CI of KDEs from different samples with different correlation lengths. Solid lines are the true values, and the bands denote the distribution of MBB estimations (mean ± standard deviation); both are computed from 10^{3} different samples, and each is composed of n = 10^{4} units generated from the DGMM by the MCMC algorithm with the same l_{corr}. 

In the text 
Fig. 3 Density estimation of Fermi map. The black line (“true” density) is the KDE with bandwidth h = BW_{1∕5} from a sample of the velocity u of n = 10^{6} solutions of Fermi map at t = 30. The red line denotes the KDE with bandwidth h = BW_{1∕3} with its 95% CB estimated by MBB (red band) from a sample of n = 10^{3} solutions at the same time. The dashed blue line represents the KDE with bandwidth h = BW_{1∕5} from the same sample. 

In the text 
Fig. 4 Comparison of the CB estimation with its true value. The blue region represents the mean ±3 standard deviations of the CBs estimated by the MBB method of KDEs with bandwidth h = BW_{1∕3} from 300 different samples, each consisting of n = 10^{3} solutions of the Fermi map; the true values of the CB are also calculated from the KDEs with bandwidth h = BW_{1∕5} (dotted red line) and KDEs with bandwidth h = BW_{1∕3} (dotted blueline) from the same 300 samples. 

In the text 
Fig. 5 Density estimation with 95% pointwise CI of the fundamental frequencies of the four sets of the batch . Frequency values are obtained by FA over an interval of 20 Myr centered at 150 Myr in the future. 

In the text 
Fig. 6 Density estimation with 95% pointwise CI of the fundamental frequencies of the 12 sets coming from the three batches: {X_{i}} (yellow colors), {Y_{i}} (blue colors), and {Z_{i}} (green colors). Estimations of sets from the same batch have the same color. Frequencies are obtained by FA over an interval of 20 Myr centered at 100 Myr in the past (first and third rows) and 200 Myr in the past (second and fourth rows). The time reference is the time of {Z_{i}}, while the solutions of {X_{i}} and {Y_{i}} are shifted ahead by 30 Myr and 20 Myr, respectively. 

In the text 
Fig. 7 Density estimation with 95% pointwise CI of the fundamental frequencies of the 12 sets coming from the three batches, {X_{i}} (yellow colors), {Y_{i}} (blue colors), and {Z_{i}} (green colors), and from the 2500 solutions of complete model of Laskar & Gastineau (2009) (red colors), which is denoted as La09. Estimations of sets from the same batch have the same color. Frequencies are obtained by FA over an interval of 20 Myr centered at 150 Myr (first and third rows) and 350 Myr (second and fourth rows) in the future. The time reference is the time of {Z_{i}}, while the solutions of {X_{i}}, {Y_{i}}, and La09 areshifted by 30 Myr, 20 Myr, and −70 Myr, respectively. 

In the text 
Fig. 8 Density estimation with 95% pointwise CIof the fundamental frequencies of the set Z_{1}, which are obtained by FA over an interval of 10 Myr (blue colors), 20 Myr (yellow colors), and 50 Myr (green colors) centered at −150 Myr (top row) and −350 Myr (bottom row). 

In the text 
Fig. 9 Density estimation with 95% pointwise CI of the fundamental frequencies of the Solar System from the {Z_{i}} solutions inthe past at 500 Myr (red band), 1 Gyr (green band), and 4 Gyr (blue band); the dashed curves with the corresponding colors denote their fitting distribution (Eqs. (19) and Table 2). The frequencies are obtained by FA over an interval of 20 Myr. 

In the text 
Fig. 10 Evolution of the parameters of the skew Gaussian distribution in the fitting model (Eq. (19)) for the density estimation of the fundamental frequencies of the Solar System of {Z_{i}} solutions in the past (blue line) and in the future (green line). The bands denote the ± 2 standard error. The dashed orange lines denote the powerlaw fits for and the linear fits for μ_{0} and α_{0}. 

In the text 
Fig. 11 Density estimation with 95% pointwise CI of the period of g_{2} − g_{5} (top row) and s_{3} − s_{6} (bottom row) from the combining solutions from {X_{i}}, {Y_{i}}, and {Z_{i}}. Frequencies are obtained by FA over an interval of 20 Myr centered at − 100 Myr (left column), − 210 Myr (middle column), and − 400 Myr (right column). 

In the text 
Fig. 12 Density estimation with 95% pointwise CI (blue colors) of the secular frequencies (right column) and their combinations (left column) of the combined solutions from {X_{i}}, {Y_{i}}, and {Z_{i}}, along withthe corresponding values extracted from the La2010d solution (yellow lines), the NewarkHartford data (redlines; Olsen et al. 2019), and the current values (green lines). The frequencies are obtained by FA over an interval of 20 Myr centered at 210 Myr in the past. The time reference is the time of {Z_{i}}. 

In the text 
Fig. 13 Density estimation with 95% pointwise CI (red colors) of g_{4} − g_{3} (left column) and g_{4} (right column) at − 90 Myr (first row) and − 300 Myr (second row)of {Z_{i}}, overlaid with their histograms (gray blocks) for better visualization. Frequency values are obtained by FA over an interval of 10 Myr. 

In the text 
Fig. 14 Percentage of the solutions (black line) whose g_{4} − g_{3} > 0.9 at each time (left) and over a 10 Myr interval (right), with their 90% CI estimation uncertainty (larger gray band) and their standard errors (smaller gray band). The time interval of the Libsack record, which is 80 Myr to 90 Myr, is denoted by the red band in the left plot and by a red line in the right plot. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.