Quant Finance Notes
1. Alternative Data
- 
- Dessaint, O., T. Foucault, and L. Fresard (2022). Does alternative data improve financial forecasting? The horizon effect. Working paper.
- 假设分析师在进行盈利预测时,需要最优地分配其投入到不同时间尺度预测的精力,从而最小化预测误差以及获取不同时间尺度预测信息的成本这二者之和。另类数据的出现降低了获取短期预测数据的成本,并同时提高了短期预测数据的准确度。因此,它促使分析师将更多的精力投入到获取和分析短期预测信息上,以此来提高短期预测的准确度。然而顾此失彼,由于分析师的精力是有限的,这造成的后果是降低了他们长期预测的准确度。
 
- Wolfe Research | Global Stock Selection with Proprietary Global Trademark Data - USPTO, foreign applications, Madrid filing, international registration
- Trademark is not a proxy of advertising spending. Firms with very high advertising spending may over-spend and suffer from agency problem.
- Signal transformations: growth rate, vintage ratio (long term number / short term number), long/short lookback window; residualizing size (log revenue) and sector.
- Use life cycle of trademark to construct signals: applications, success rate of application, renewal ratio, trademark age, age dispersion, dispersion in trademark category, secretive foreign priority application (to keep trademark information under protection, and re-file in US using international registration)
- Trademark category to construct linkage.
 
- Wolfe Research | The Intangible Asset Premia - KC (Knowledge Capital): accumulate past R&D spending with discount (perpetual inventory method) 
- OC (Organizational Capital): accumulate 30% of past SG&A expenses. 
- IAI (Intangible Asset Intensity), KCI, OCI: 
- OCI+KCI combined has consistently more relevant than growth factor (as risk factor). 
- Firm with higher IAI score has higher momentum scores. 
 
- Wolfe Research | Patent, Innovation, and Alpha - Innovation industry: all industries where at least 50% firms have patent grants. High R&D spending (log R&D residualized by log market cap) is negatively correlated to future return, but is positively correlated to future return in innovation industry group.
- Signals: - number of unique patent class / number of patent
- Inward citation number, unique inward citation assignee; Outward citation number, unique outward citation assignee
- Citation as linkage; patent class as linkage.
- Patent maintenance.
 
 
- Wolfe Research | Innovation Relevance - Total citations for all patents that a firm has cited to form a firm i's knowledge base (we can normalize within each patent class for citation) where indicator of indicates whether a patent has been cited by firm i, and is the total number of citations of patent at time . Then technological obsolescence is defined as the rate of change over a window . 
- Investors disagree most on high innovation relevance firms, we can assign higher risk budget to these firms. 
 
- 
- Patent Data Challenges: Patent data is valuable but can be problematic due to truncation (not all patents are granted by the end of the study period) and changes in inventor composition over time. (Both patent grants and patent citations).
- Biases in Aggregation: When patent data is aggregated at the firm level, biases can persist even after common adjustment methods are applied.
- Correlation with Firm Characteristics: Patent and citation biases are correlated with firm characteristics such as size, market-to-book ratio, and R&D intensity.     
 
- 
- This paper attempts to study the causal effect of examiner busyness on patent quality and firm value. Using a broad set of patent quality measures, we find strong evidence that patents allowed by busy examiners exhibit significantly lower quality.
 
- 
- 科技关联度 (II)
- Methodology: - Patent text analysis: use external sources such as Wikipedia and professional dictionaries to establish the professional termi- nology for every patent in our sample. This process enables us to define two patents as similar if they share the same professional terminology.
- Remove “boilerplates” (i.e., long lists of terminology used in patent texts to illus- trate the invention generality).
- Patent similarity: Two patents' distance is represented as a vector of their common terms weighted by a TFIDF (term frequency– inverse document frequency) variant.
- Firm similarity: log of the sum of similar patent pairs discounted by the age of the newer patent in each pair and normalize it by the log of the product of the total number of patents for each firm in the pair.
 
- 基本面解释:For R&D-to-Total Assets and ROA terms, peer firms have both contemporaneous correlation and prediction power. 
- 信息扩散缓慢的原因是投资者注意力不足,而不是投资者完全完全意识不到关联。注意力不足意味着投资者未来能认识到关联,因而会有信息的进一步扩散和关联动量。而后者意味着投资者压根就看不到关联的存在,因此也就没关联动量效应了。
 
- 
- Data - The collected news articles are classified into media reservoirs: General, Corporate, FX, and Country Equity.
- Articles are classified into 347 narratives: including 53 pre-specified Journal of Economic Literature (JEL) narratives and around 300 additional narratives. The 347 narratives are classified into 14 narrative tags including Geopolitics, Macro, Micro, etc. The narrative series are provided by MKT MediaStats, LLC.
 
- Construction - Narrative Intensities: Negative (positive) intensity is the fraction of negative (positive) sentiment articles pertaining to a narrative out of the overall discussion, with a value in [0,1].
- Narrative market beta: whether narratives can explain excess market returns. univariate regressions of the one-month market excess returns on contemporaneous one-month intensity changes.
- stock-level narrative betas: univariate regression of stock return and intensity changes.
 
- Conclusion: - Financial analysts also tend to underreact to narrative-sensitive stocks
- Narrative momentum is different from price momentum.
 
 
- 
- Firms are more likely to manipulate their announcements when media coverage is more extensive.
- Negative news is more likely to be reported than positive news.
- The presence of financial journalists can lead to more efficient pricing.
 
- 
- Proxies for Corporate Sales: The authors construct three proxies for real-time corporate sales using distinct information sources: in-store foot traffic (IN-STORE), web traffic to companies' websites (WEB), and consumers' interest level in corporate brands and products (BRAND).
- Predict SUE, SUR, Analyst forecast error.
- Check analyst coverage and media exposure, and market attention level to see if the market consensus really matters, whether we want to trade the surprise from consensus or just the quarterly change.
 
- Wolfe Research | Global shipping and supply chain alpha - S&P global panjiva supply chain intelleigence (US, Brazil, India, Mexico); FactSet (US only)
- BOL: bill of lading form.
- Features: - shipping volume (predict sales), (level/growth)
- supplier, product, country of origin (diversity)
- supply-chain network (company's position in supply chain)
- shipping network momentum
 
 
- Wolfe Research | Alpha insights from global job postings data - RavenPack dataset (most likely use LinkUp)
- Term construction: job postings level/growth (we can use SOC median salary as importance weight); technical skill intensity, level/growth, uniqueness in skills, adoption of new technical skills, skill importance (TF-IDF).
 
- DB Research | Macro and Micro JobEconomics - LinkUp job posting dataset: scrape from company website. Data since 2007, description data since 2014. Includes SOC job classification, geolocation data, and technical skills data. Most coverage in the USA.
- Term construction similar for micro.
- Macro: use to predict employment, PMI index, CPI, retail, consumer sentiment.
 
- The Use and Usefulness of Big Data in Finance: Evidence from Financial Analysts - The paper offers robust evidence that the integration of alternative data into analyst reports is both increasing and economically valuable.
- By improving earnings forecast accuracy and bolstering the trading commission revenue (as a proxy for Sell-side analysts’ value, data from ANcerno's institutional investor trade data ) of brokerages, alternative data serves as a tool to enhance the relevance of sell-side research in a rapidly evolving data landscape.
 
- 
- Digital traffic (website pageview, visit, visit duration, other statistics) can predict quarterly earnings KPIs. It provides additional information to analyst consensus and market consensus (positive stock return).
- All these effects only apply to firms with consumer-oriented websites. A website is defined as consumer-oriented if it satisfies at least one of the following conditions: (i) it allows direct transactions by customers. Examples include amazon.com, nike.com, and bestbuy.com; (ii) the website itself is the product, and visits to these websites are de facto consumption of products. Examples include nytimes.com, google.com, facebook.com, netflix.com; (iii) while the website is not directly geared towards customer transactions, the website hosts detailed product information and is likely to attract visits from existing and prospective customers. Examples include nissanusa.com and bmwusa.com.
- The Authors define a alternative value measure: market-to-visits, market-to-pageviews.
- Data from SimilarWeb.
 
2. Analyst
- Analyst Forecast Bundling Intensity and Earnings Surprise 
- The authors explore how financial analysts convey information about a company's earnings without necessarily making full revisions to their earnings forecasts. They achieve this by increasing what they term 'bundling intensity,' which refers to the extent to which an analyst's report that includes an earnings forecast revision also includes revisions to price targets and/or recommendations that have the same direction as the earnings forecast revision. 
- The researchers have developed a measure called BF_Score at the firm level to quantify bundling intensity. Their findings suggest that BF_Score is a significant predictor of earnings surprises based on analyst forecasts. These surprises often result from biases in consensus earnings forecasts, which are influenced by the information analysts communicate through bundling intensity. Below is definition of BR score, where TP is target price.  
- The use of bundling and the predictive power of BF_Score increase during times of higher macroeconomic uncertainty, when analysts have greater incentives to avoid bold revisions to their earnings forecasts. 
 
- 
- Firms covered by a larger number of analysts generate fewer patents and patents with lower impact.
- The evidence is consistent with the hypothesis that analysts exert too much pressure on managers to meet short-term goals, impeding firms' investment in long-term innovative projects.
 
- 
- The paper extends the literature on analyst herding by demonstrating that sell-side analysts not only mimic peers’ forecasts for the same firm but also update their beliefs based on information gleaned from peers covering different firms within their portfolio.
 
- 
- Our empirical analysis shows that while the average analyst does not generate statistically significant alpha relative to the returns of a long-only portfolio benchmark, a subset of analysts exhibits persistent alpha. Motivated by this heterogeneity, we introduce a "fund-of-analysts" framework that first predicts analyst performance and then dynamically allocates weights across analysts based on predicted analyst performances.
 
3. Anomalies
- Crowdsourced employer reviews and stock returns
- Extrapolative beliefs in the cross-section: What can we learn from the crowds?
- Chinese Stock Market Shell Value
- Size and Value in China
- Time Series Momentum
- Tracking Retail Investor Activity
- Overnight Return Reserval
- Idiosyncratic Volatility
- Volume
- ESG
- Loughran, Tim, and Bill McDonald. "Measuring firm complexity." Journal of Financial and Quantitative Analysis (2023): 1-28.- Measure firm complexity: use 10-K filing text data. - RHS: 374 pre-defined words related to firm complexity.
- LHS: use audit fees (adjusted by size and industry) as complexity proxy.
- Run lasso regression -> identify 50+ words as final firm complexity related set.
- Complexity = percentage of complexity word set in 10-K filiing corpus length.
 
 
- Measure firm complexity: use 10-K filing text data. 
- Cohen, Lauren, and Dong Lou. "Complicated firms." Journal of financial economics 104.2 (2012): 383-400.- Complication of a firm is measured by income segment.
- The more complicated the firm, the more pronounced the return predictability. In addition, we find that sell-side analysts are subject to these same information processing constraints, as their forecast revisions of easy-to-analyze firms predict their future revisions of more complicated firms.
 
4. Asset Pricing
- 
- The paper provides evidence that the capital asset pricing model (CAPM) seems to hold on days when influential firms announce earnings, challenging the conventional wisdom that the beta-return relationship is generally flat in the market. The findings have implications for investors, suggesting that strategic trading around earnings announcements could yield significant returns.
 
- 
- Relative Value Metric - The paper defines the company’s relative value as: where MV is the market value of the company (constructed from total assets adjusted by replacing book equity with market capitalization for the common stock component) and TA is total assets. 
- The authors collect 23 descriptive measures (or “descriptors”) of firm characteristics covering eight broad categories: - Profitability: e.g., realized return on assets (RoA) and analyst forecast RoA.
- Growth: e.g., long-term growth (LTG) forecasts, one-year and five-year historical growth rates.
- Investment: e.g., capital expenditure ratios, R&D spending, retained earnings.
- Liquidity: e.g., working capital ratio, slack ratio, various liquidity ratios, and trading liquidity.
- Leverage: Measured by book debt-to-equity ratio.
- Market Risk: Captured by the firm’s stock beta.
- Size: Usually expressed as the natural logarithm of total assets.
- Momentum: Measured by stock return momentum over six- and 12-month horizons.
 
- The authors combine similar descriptors within each category into single valuation factors. They do this by first standardizing each descriptor (using winsorization and z-score transformation) and then averaging them to form a factor. When multiple descriptors exist within a category, a Bayesian weighted approach is implemented: Weights are estimated via a constrained regression (imposing that weights sum to one) with an equal-weighting prior. 
- Once the eight factors are constructed, the relative value of each company is modeled through a cross-sectional contemporaneous regression at each date where: - q_t is the vector of standardized logarithmic relative value across companies.
- G_t is a matrix of industry dummy variables (based on 49 industry groups from SIC codes) to serve as a local bias correction.
- F_t represents the matrix of valuation factors.
- c_t contains the cross-sectional slope estimates (i.e., the market pricing of each valuation factor).
- e_t is the regression residual that represents the temporary misvaluation (mispricing) of individual companies.
 
- The regression residuals (e_t) from the model, which capture the deviation of a company’s actual relative value from the “fair” value predicted by the model, are interpreted as temporary mispricing. A long-short portfolio that goes long on companies with the lowest residuals (undervalued) and short on those with the highest residuals (overvalued) delivers strong performance 
 
5. Behavorial Finance
- Behavioral Finance: an Introduction
- Short- and Long-Horizon Behavioral Factors
- Nominal Price Ilusion
- PEAD
- Financial Statement Related
- Wolfe Research | Seeking alpha from insider transactions- Form 4 fillings from EDGAR database
- Findings:- Insider purchases are more effective than sales (might be personal liquidity needs)
- Insider purchases after positive earnings surprise is a strong confirmatory signal
- Collective insider purchases by multiple executives is strong
- Infrequent insider transactions are more informative than reoccurring trades.
 
 
- 
- Institutional investors' trading behavior is influenced more by news sentiment than by the anomalies themselves. Institutions tend to trade “in the wrong direction.” In the sample of overreaction anomalies, they are net sellers of the long leg relative to the short leg
 
- 
- Confirmatory bias—the tendency to overweight evidence that supports one’s prior views and disregard contradicting information
- It is proved by excess volume, excess volatility, momentum and bubbles/crashes.
- Forecast revisions: Analysts are ~2 pp less likely to revise in the direction of a current surprise when its sign differs from the surprise one or two quarters earlier.
- Analyst heterogeneity: When an analyst’s own prior revision sign conflicts with a new earnings surprise, she is ~20 pp less likely to update in that direction. 
- Forecast dispersion: Cross-analyst dispersion in annual forecasts is higher following sign-changes in the last two SUEs. 
 
6. Crypto
- Decentralized mining in centralized pools
- Majority is not enough: Bitcoin mining is vulnerable
- Blockchain without waste: Proof-of-stake
- A Survey of Attacks on Ethereum Smart Contracts
- Multi-factor in Cryptocurrency
- Kogan, Shimon, et al. "Are cryptos different? evidence from retail trading." Journal of Financial Economics 159 (2024): 103897.- While investors exhibit contrarian behavior in stocks and gold, they follow a momentum-like strategy with cryptocurrencies, holding onto their investments even after large price movements.
- Retail investors may view cryptocurrency price changes as indicators of the likelihood of future widespread adoption, leading them to update their price expectations in the direction of the price change.
 
7. Event
- Lottery-like Stocks
- Bargeron, Leonce, and Alice Bonaime. "Why do firms disagree with short sellers? Managerial myopia versus private information." Journal of Financial and Quantitative Analysis 55.8 (2020): 2431-2465.- Disagreement Definition: The paper defines disagreement as situations where firms engage in significant share repurchases while short interest increases.
- The authors explore whether such repurchases are driven by managerial myopia (an attempt to defend inflated stock prices for short-term gains) or by private information (managers possessing positive, value-relevant information that the market is not yet aware of).
- The paper finds that repurchases are more likely motivated by managers' private information rather than agency issues or a defense of overvalued stock.
 
- Boudoukh, Jacob, et al. "Information, trading, and volatility: Evidence from firm-specific news." The Review of Financial Studies 32.3 (2019): 992-1033.- Identified news (relevant to firm events) explains approximately 20%-40% of overnight volatility and 6% during trading hours
 
8. Fundamental
- 
- Traditional earnings like GAAP earnings: high transitory volatility
- Use Street Earnings:- Adjusts a company’s reported earnings to exclude non-recurring, non-operational, or one-time items. It aims to reflect the underlying, sustainable profitability of a business by filtering out short-term noise, providing investors and analysts with a clearer picture of long-term value creation.
- Derived from analyst adjustments (e.g., I/B/E/S consensus estimates from Thomas Reuters)
 
 
- PE / PB / PS - PE(市盈率):核心逻辑:PE反映企业盈利能力的定价效率,适用于盈利稳定、可预测性强的行业。 - 消费行业(食品饮料、家电、零售)- 需求刚性,现金流稳定,盈利波动小(如伊利股份、贵州茅台)
- 例如:消费行业PE通常基于长期稳定的净利润计算,适合用PE判断估值高低。
 
- 医药医疗行业- 老龄化趋势下需求持续增长,创新药企业成熟期盈利稳定(如恒瑞医药)
- 注意:研发阶段的生物医药企业可能亏损,需结合其他指标(如PS)。
 
- 传统制造业(机械、汽车零部件)- 技术成熟,竞争格局清晰,盈利增长平稳(如三一重工)
 
- 公用事业(电力、燃气)- 垄断性强,盈利受政策调控,PE可反映长期现金流价值
 
 
- 消费行业(食品饮料、家电、零售)
- PB(市净率)核心逻辑:PB衡量企业净资产价值,适用于资产密集或盈利波动大的行业。 - 强周期行业(有色金属、钢铁、化工) - 盈利受大宗商品价格影响大,PE在周期低谷时失效,PB更稳定(如中国神华)
 
- 金融行业(银行、券商、保险) - 资产规模大且易量化(如银行信贷资产),PB反映资产质量与安全边际(如工商银行)
 
- 重资产行业(房地产、航空、航运) - 固定资产占比高,PB可评估清算价值(如万科A)
 
- 科技硬件制造(半导体、消费电子) - 设备和专利等资产价值显著,但需注意技术迭代风险(如中芯国际)
 
 
- PS(市销率)核心逻辑:PS关注营收增长潜力,适用于高投入、高增长但盈利滞后的行业。 - 新兴科技行业(人工智能、云计算、半导体)- 初期研发投入大,盈利周期长,营收增速替代盈利成为核心指标(如英伟达、特斯拉)
 
- 生物医药与医疗器械- 创新药研发阶段亏损,但市场潜力大,PS反映管线价值(如Moderna)
 
- 新能源与高端制造(锂电池、光伏) - 行业扩张期需大量资本开支,PS衡量市场份额争夺能力(如宁德时代)
 
- 互联网与平台经济(电商、社交媒体) - 用户增长优先于盈利,PS结合用户价值评估(如亚马逊、字节跳动)
 
 
- 新兴科技行业(人工智能、云计算、半导体)
 
- 
- Disclosure Patterns and Timing: - Withholding of Negative News Early in the Quarter: - Managers are less likely to issue a revenue forecast (i.e., voluntary disclosure) when real-time abnormal revenue is negative during the early part of the fiscal quarter. 
- Increased Disclosure as the Quarter Progresses: - As the quarter moves closer to the mandatory earnings announcement, the withholding of negative news diminishes. This change is likely driven by an increase in litigation risk, heightened analyst scrutiny, and the expectation that the impending public revelation will force managers to disclose bad news. 
- Asymmetry in Disclosure: - The analysis shows that it is primarily the “bad news” (i.e., weeks with abnormal revenues in the bottom quartile) that is withheld early, whereas there is no significant increase in the voluntary disclosure of good news. This finding is consistent with classic disclosure models where economic incentives lead firms to delay negative information until external disciplinary mechanisms (like investor reaction or legal risk) compel disclosure. 
 
- Market Reaction and Insider Trading: - Delayed Incorporation into Stock Prices: - Although the real-time revenue measure is strongly informative—as evidenced by its positive correlation with future abnormal returns—the market does not fully and immediately price in the information. The gradual “leakage” of performance signals over the quarter suggests a dynamic process of information disclosure. 
- Insider Trading Behavior: - The paper documents that in weeks where there is significant abnormally negative real-time revenue and no corresponding public disclosure, insider managers are more likely to sell their shares. This behavior implies that managers might use their private information for personal gain when they choose not to disclose. 
 
- Role of Disciplinary Mechanisms: - The withholding and subsequent release of information are found to be more pronounced in firms characterized by:- High Analyst Coverage: Greater monitoring by equity analysts increases pressure on managers to eventually release negative information.
- High Institutional Ownership: With more sophisticated investors monitoring performance, non-disclosure becomes costlier.
- High Litigation Risk: The prospect of legal or reputational consequences forces managers to disclose adverse information as the quarter’s end nears.
 
 
- The withholding and subsequent release of information are found to be more pronounced in firms characterized by:
 
- 
- Managerial Disclosure Behavior:The behavior captured by the PQS (Post‐quarter Sales (PQS): Sales activity occurring after quarter-end but before the earnings announcement) measure indicates that managers do not fully disclose all privately held post-quarter performance information at the earnings announcement. Instead, they understate positive signals—resulting in lower-than-expected announcement returns and delayed price adjustments in the post-announcement period. This may be driven, in part, by personal trading motivations.
 
9. Linkage
- Shared Analyst Coverage: Unifying Momentum Spillover Effects 
- Market data linkage - Sarmento, Simão Moraes, and Nuno Horta. "Enhancing a pairs trading strategy with the application of machine learning." Expert Systems with Applications 158 (2020): 113490.- Stock mktdata -> dimension reduction -> clustering -> within each cluster: - The pair’s constituents are cointegrated.
- The pair’s spread Hurst exponent reveals a mean-reverting character. ()
- The pair’s spread diverges and converges within convenient periods. (1 < half life < 12)
- The pair’s spread reverts to the mean with enough frequency. (yearly mean-cross >= 12)
 
 
- Stock mktdata -> dimension reduction -> clustering -> within each cluster: 
 
- Sarmento, Simão Moraes, and Nuno Horta. "Enhancing a pairs trading strategy with the application of machine learning." Expert Systems with Applications 158 (2020): 113490.
- 
- Multi-industry classification using business description (10K + broker report + earnings call) + Bag of Words + LDA
 
- Bagnara, Matteo, and Milad Goodarzi. "Clustering-based sector investing." (2023). - Data: 94 firm characteristics for CRSP from Dacheng Xiu's Paper (Empirical Asset Pricing with Machine Learning)
- Bisecting K-means clustering
- Clustering feature importance: rank feature variation/PCA on cluster-centroid vector
 
- TT Shi et al. (2023) Production Complementarity and Information Transmission Across Industries - Sales segment + industry input-output relation to construct linkage (Benchmark Input-Output Surveys of the Bureau of Economic Analysis to identify product complementary relationships)  
 
- Linkage and information discreteness: 
- 行为金融学中的温水煮青蛙:人的这种缺陷叫做limited attention(有限注意力)。由于人们的认知资源是有限的,在任何给定的时刻,我们的大脑都偏好去处理那些最显著、最重要的信息,而忽视那些不显著的、经济效应微弱的因素。一系列频繁但微小的变化对于人的吸引力远不如少数却显著的变化;因此投资者对于连续信息造成的股价变化反应不足。 
- 信息离散性(information discreteness,ID):ID 低(说明信息连续性强)的动量才是高质量动量 
 - Da et al. (2014) 说明,与传统动量相比,通过 ID 因子筛选找到的高质量动量能够获得更高的超额收益,且该收益在样本外的持续性更强(这有助于我们降低调仓频率、减少换手率、节约交易成本)。 
- Information discreteness (ID) serves as a cognitive trigger that reduces investor inattention and improves inter-firm news transmission. 
 
- 
- Cross-stock momentum: Based on asymmetry in lead-lag linkages and differences between long-run and short-run co-movements. 
- Factor momentum: The phenomenon where returns of certain factors (like size, value, or industry factors) exhibit momentum. 
- The asymmetry in cross-stock linkages is a key differentiator from factor momentum. The paper shows that cross-stock momentum is not entirely driven by factor momentum. 
- The author used Principal Portfolio (PP) Methodology invented in Kelly (JoF 2023) Principal Portfolios. The Principal Portfolio (PP) methodology optimizes portfolio returns by leveraging cross-stock predictability captured in the Prediction Matrix. This approach extends traditional asset pricing by incorporating cross-stock signals. 
- The Prediction Matrix, , aggregates the relationship between lagged signals () and returns () over a rolling window : Diagonal Elements: Capture own-stock predictability, . - Off-Diagonal Elements: Represent cross-stock predictability, . - Signals () are normalized within to reduce noise and manage outliers. The matrix is lagged by one period to ensure independence between returns and predictors. 
- SVD: The Prediction Matrix is decomposed via SVD: , : Orthogonal matrices with left () and right () singular vectors. - : Diagonal matrix of singular values () ranked by importance. - This decomposition simplifies identifying key patterns in cross-stock return predictability. 
- Portfolio Construction: Portfolio weights () are derived by combining signals and the prediction matrix. Optimal weights maximize return subject to a constraint on matrix norm: : Number of leading components retained for dimensionality reduction. Usually . - Principal Portfolios (PPs): Constructed by linear combinations of and , balancing computational simplicity and predictive power. - The PP return is: 
 
- 
- The paper leverages a unique feature of stock display on trading platforms in China, where the order of stock display is determined by the stock's listing code. This feature creates an attention spillover effect, where investors are more likely to notice and trade stocks with listing codes adjacent to those of stocks they currently hold.
- The authors propose that overconfident investors, following positive investment experiences, are likely to increase their trading activities and are more likely to direct their attention to neighboring stocks on the display.
 
- 
- Conference call transcripts -> topic model -> firm similarity -> linkage signals
 
- 
- cross-firm similarity measure based on the various topics extracted from Management Discussion and Analysis texts
 
- 
- Main idea: linkage from bond market credit-rating comovements.
- This study identifies a "market segmentation" effect between the equity and bond markets, showing that information from bond markets is often not incorporated promptly by equity market investors.
- Firms are connected through "credit-rating comovements," defined as instances when two firms' bond ratings are updated in the same direction within a ±10-day window.
 
- Chen, Xin, and Huaixin Wang. "News Links and Predictable Returns." Available at SSRN 4458612 (2023). - Main idea: news-implied linkages in China where firms are connected based on shared media coverage.
- News-based links were established by identifying instances where two firms were mentioned in the same article within a 12-month window.
- The authors perform robustness checks to validate these results, including a placebo test using shared media platforms, demonstrating that only specific news stories—not general media coverage—predict future returns.
- They explore linkage complexity, showing stronger predictability when linkages are more complex (e.g., higher numbers of shared stories or connections).
 
- 
- Main idea: high overnight returns for peer stocks predict elevated opening prices for focal stocks, followed by intraday reversals, while peer intraday returns consistently predict positive future intraday returns for focal stocks.
- Retail investors, who trade primarily on overnight information due to news salience, and professional investors, who engage in intraday trading, correcting the market.
- Predictable patterns arise not only from underreaction but from a systematic interplay between the different investor types. Retail-driven overnight price distortions are followed by intraday reversals managed by professionals
 
- Data-driven graph learning 
- features: 8 in total, MOM and MACD.  
- From Kalofolias (AAAI 2016) How to learn a graph from smooth signals, we can define a convex optimization problem: where is the feature matrix with -days lookback window, where is a diagonal matrix with . The graph adjacency matrix we want to estimate represents the network at day t for constructing network momentum, with the -th entry measuring the strength of similarity of individual momentum between asset i and asset j. In the objective function, the first trace term measures the spectral variations of on the learned graph adjacency matrix , encouraging connections between nodes with similar features. It is derived from Laplacian smoothness under the mild assumption that each column of  is a low-pass graph signal. 
- The above is derived from: Consider a matrix , where each row resides on one of m nodes of an undirected graph G. In this way, each of the n columns of X can be seen as a signal on the same graph. A simple assumption about data residing on graphs, but also the most widely used one is that it changes smoothly between connected nodes. An easy way to quantify how smooth is a set of vectors on a given weighted undirected graph is through the function - where denotes the weight of the edge between nodes i and j and is the graph Laplacian, being the diagonal weighted degree matrix. In words, if two vectors and from a smooth set reside on two well connected nodes (i.e. is large), they are expected to have a small distance so that is small. 
- In our empirical analysis, we combine distinct graphs learned from from five different lookback windows such that trading days as follows: (graph ensemble) . - To mitigate the effects of scale differences in constructing network momentum, which may arise due to the difference in the number of connections certain assets have - with some connected to numerous other assets and others only to a few - we also apply a graph normalisation as follows: - (graph normalisation) , where is a diagonal matrix with . 
- Another way to solve above equation (#mjx-eqn- - ) is described in Pu et al. (2023) Learning to Learn Financial Networks for Optimising Momentum Strategies.  - The Algo L2G reformulates optimisationbased graph learning into an unrolling neural network. By leveraging the inherent modularity of neural networks, where different layers can be easily stacked for forward propagation, we propose to incorporate an additional layer into L2G for directly constructing network momentum.  - A upgrade version of L2G algorithm is described in Pu et al. (NIPS 2021) Learning to learn graph topologies 
 
- 
- Methodology: The authors use a novel test statistic to discriminate between two hypotheses: increased product differentiation (H1) or increased standardization (leading to decreased return comovement) and cost-cutting (leading to increased return comovement) (H2). They exploit changes in stock return comovement following tariff cuts to infer strategic reactions.
- Empirical Findings: The study finds that tariff cuts lead to a significant increase in return comovement, particularly among "followers" within an industry, suggesting a move towards greater standardization and cost-cutting strategies (H2) rather than increased product differentiation (H1).
- Leader definition: sales-based market shares, financial ratios and R&D.
 
- 
- Consider a firm’s competitiveness based on the manner by which other firms mention it on their 10-K filings.
- C-Rank: 10-K cross-mention graph + page-rank algo -> preferred measure of firm-level competition rank
- A firm’s effective competition status stems mostly from competing with companies outside of its sector.
- C-Rank might identify an element of a firm’s risk profile. If the firm is “targeted” by strong competitors, it can increase the uncertainty about the firm’s future performance and value, then the outperformance of high C-Rank firms might manifest compensation for risk.
 
- 
- Use factset supply chain data 
- Customer Momentum: Here we assume that a company has N customers, and let be the sales ratio; thus, customer momentum is defined by the following: 
- Weighting Method Based on Network Centrality: almost all of the sales ratios are unavailable. Thus, we use network centrality in network theory as the weight of customer momentum. Let be the edge betweenness centrality between supplier i and customer j. 
- Multilayer Customer Information  
 
- Earnings Propagation Effects through the Global Supply Chain Network - In this paper, we separate the regression model that examines propagation from customers and the regression model that examines propagation from suppliers and then estimate the coefficients of the following two regression models:
 

- Deep GNN methods - Wu, Mian, et al. "Firm connection and equity return predictability–Graph-based machine learning methods." The British Accounting Review (2024): 101436.- Four types of linkage: analyst co-coverage, geographical, industrial, and technological linkage.
- GAT for each linkage type and only through connected firms under each linkage -> aggregate all four linkage output -> LSTM -> output
 
- Cheng, Rui, and Qing Li. "Modeling the momentum spillover effect for stock prediction via attribute-driven graph attention networks." Proceedings of the AAAI Conference on artificial intelligence. Vol. 35. No. 1. 2021.- merge technical indicators and textual media features preserving their interactions -> RNN -> GAT -> Output
 
- FinHGNN: A conditional heterogeneous graph learning to address relational attributes for stock predictions- Node type: firm (fundamentals on LSTM embedding), analyst (CBOW embedding), news theme (CBOW embedding)
- Edge type: firm-firm, theme-theme, analyst-analyst, theme-firm, analyst-firm
- GAT on different link type then aggregate
 
- Graph representation learning for similarity stocks analysis- knowledge graph construction -> graph representation learning -> stock embedding, similarity, momentum spillover
 
 
- Wu, Mian, et al. "Firm connection and equity return predictability–Graph-based machine learning methods." The British Accounting Review (2024): 101436.
- 
- Previous HAR-DRD method: decomposing the return covariance matrix into the diagonal matrix of realized volatilities and the correlation matrix: - where is the diagonal matrix with the elements of the square roots of on the main diagonal, that is, , and . is the correlation matrix. We can estimate above using - where is the dimensional vectorized version of the lower triangular part of and (resp. ) is computed as (resp. ). 
- Using graph information, we can estimate variance and correlation as follows, - where is the normalized adjacency matrix. Specifically, is a adjacency matrix indicating the connections between assets with diagonal elements as 0 , and , where . Therefore , represent the neighborhood aggregation over daily, weekly, and monthly horizons. represent the effects from connected neighbors over different horizons. Moreover, we apply the idea of the graph effect to modeling correlations according to the model - where is the normalized adjacency matrix. Specifically, is a adjacency matrix indicating the connections between pairwise correlations with diagonal elements as 0 , and , where . 
- Choices of graphs - Variance: Complete, Sector, Graph-Lasso
- Correlation: Complete, Line graph
 - Given a graph , its line graph is a graph such that - each node of represents an edge of ;
- two nodes of are adjacent if and only if their corresponding edges share a common endpoint in .
 
- Extension: Graph neural networks for forecasting multivariate realized volatility with spillover effects 
 
- He, Wei, et al. "Similar stocks." Available at SSRN 3815595 (2021). - Similarity between two stocks is measured by the distance between their characteristics such as price, size, book-to-market, operating profitability, and investment-to-assets.
- Retail investor behavior, including attention spillover and categorical trading, plays a significant role. Retail order imbalance increases for high similar-stock return portfolios, reflecting stronger demand from individual investors.
- The similarity effect is stronger among firms with low institutional ownership, suggesting retail investors are primary drivers.
 
- 
- Joint news has a higher degree of attention spillover than self-mentioned news. Measured by increase in Google search activity and EDGAR filings for connected firms, more so than self news.
- Define the degree of investor attention spillover to a given firm i, from firms connected to i through joint coverage, , as the centrality-weighted (node centrality) sum of abnormal joint news coverage across the connected firms.
- Both JointNews and Market-JointNews (aggregation to market level) istrongly and negatively predicts the one-month-ahead (market) return. It increases the overall attention to the firms and resulting in high valuation and low future stock returns.
 
- 
- When a firm’s economically linked firms have good (bad) news, and its stock price is near (far from) the 52-week high, it has an underreaction to the good (bad) news about economically linked firms.
- The nearness to the 52-week high significantly moderates the predictability of supplier returns based on customer returns. Long-term returns for firms close to their 52-week high are higher when customers exhibit strong performance.
 
- 
- The extent of cross-predictability is negatively related to the level of information in the market, measured by the level of analyst coverage or by the level of institutional ownership.
- Analyst coverage measure: analyst is considered actively engaged in a stock for a 12-month period after making an EPS forecast on that stock.
- Institutional ownership measure: sum the holdings of institutional investors in the stock at a given quarter-end report date and then divide by the number of outstanding shares. An alternative proxy is the number of different institutional investors in the stock.
 
10. Machine Learning
- Liu, Quan, et al. "PREDICTION OF EARNING SURPRISE USING DEEP LEARNING TECHNIQUE." 
- 
- Textual Factor (TF) Generation: The authors generate TFs through three main steps:- Representing text using vector word embedding (Word2Vec).
- Clustering these vectors using Locality-Sensitive Hashing (LSH) to identify topics.
- Applying topic modeling to identify interpretable textual factors. (Use topic exposure as latent factors, apply standard factor analysis framework)
 
 
- Textual Factor (TF) Generation: The authors generate TFs through three main steps:
- Detecting Misreported Accounting A Machine Learning Approach using Text Data - 10-K filing MD&A part -> extract text -> train on SEC AAERs misreported identifier
 
- Global and local fitting - 
- Combine local model with global model, either using two stage hard transfer where global model is in the first stage and local model is in the second stage with L2 penalty on value away from global value
- This can be a one-stage soft transfer learning as well (P21 in paper)
 
- 
- Hybrid model that incorporates uniform models with industry-specific ideas is the best.
- Hybrid models are trained on returns in excess of industry medians, and features are normalized within industries, which reduces noise along the cross industry dimension. And unlike the industry-specific Specialist models, the Hybrid approach avoids partitioning the cross-section of stocks into small subsamples.
 
 
- 
11. Macro
- 
  - 当主观预期上调的时候,财务约束更大的公司未来的预期收益率更低。当主观预期下调的时候,财务约束更大的公司未来的预期收益率更高。
- Factor timing
 
- BAB beta factor - 
- CAPM贝塔分解为两个组成部分:一个反映市场对未来现金流的新闻,另一个反映市场对折现率的新闻。
- 现金流Beta衡量股票收益与公司基本面现金流冲击的相关性。这类冲击反映企业盈利、分红政策或行业前景等长期、永久性变化对股票价值的影响。衡量股票对基本面长期风险的暴露,对应“坏Beta”,需高溢价补偿。
- 折现率Beta衡量股票收益与市场折现率冲击的相关性。这类冲击反映投资者对未来现金流预期风险的短期调整(如利率变化、风险偏好波动),导致股价的暂时性波动。例如,美联储加息(提高贴现率)可能短期内压低所有股票估值,但长期影响有限。衡量股票对市场短期情绪或政策风险的暴露,对应“好Beta”,溢价较低。
  
- 
- 杠杆约束 - 机构投资者的限制:共同基金、养老基金等机构常面临严格的杠杆限制(如监管要求或内部风控规则),无法自由借贷以放大投资规模。
- 个人投资者的限制:普通投资者可能因保证金要求、信用额度或风险厌恶心理而难以有效使用杠杆。杠杆约束的现实背景
 
- 当杠杆受限时,投资者无法直接通过借贷放大风险,转而通过调整资产配置比例间接实现类似效果: - 超配高Beta资产:高Beta资产(如小盘股、高波动股票)在市场上涨时涨幅更大,下跌时跌幅更深。投资者通过增持这些资产,可以在不借贷的情况下“模拟”杠杆效果,追求更高的收益潜力。
- 低配低Beta资产:低Beta资产(如大盘蓝筹股、债券)风险较低,但收益弹性不足。
 
- 需求推动高Beta资产价格虚高:当大量投资者涌入高Beta资产时,其价格被推高,导致预期收益下降。低Beta资产被低估:低Beta资产因需求不足而被抛售,价格被低估,预期收益上升。
- BAB因子与小盘股效应并不矛盾:规模因子SMB和价值因子HML (在控制市场Beta后) 独立于市场Beta。
- BAB因子通过做空高beta股票,做多低beta股票 (带杠杆),来构造零beta投资组合。
 
- Herculano, Miguel C. "Betting Against (Bad) Beta." arXiv preprint arXiv:2409.00416 (2024). - 结合以上两文,构造BABB因子。
- 通过双重排序(Double-Sorting),同时筛选低Beta和低现金流Beta的股票:- 第一层排序:按市场Beta(β)将股票分为高、中、低三组;
- 第二层排序:在每组内按现金流Beta(β_CF)再次排序,构建3×3组合;
- 最终策略:做多低Beta/低现金流Beta组合,做空高Beta/高现金流Beta组合。
 
 
 
- 
12. Microstructure
13. Miscellaneous
- Information Aggregation 
- Missing Financial Data 
- Interest Rate 
- Information Search 
- Fund Research 
- 
- A strong Seasonal February reversal exists. The reversal is associated with a spike in turnover for recent loser stocks, which we attribute to an appetite for lottery-like stocks by retail investors around the Chinese New Year season.
- Excluding February, there is a strong mid-to-long term momentum signal in China stock market.
- Momentum construction: at beginning of month T+1, calculate signal as closing price at the end of month T-1 divide by the highest price in the past 52 weeks. Skip one month due to strong short-term reversal in China.
 
- 
- China theme/concept linkage.
 
- 
- Main idea: This study introduces a decomposition model to quantify these effects, aiming to clarify the contributions of each component to stock price movements.
- The authors develop a return variance decomposition model that categorizes stock price variance into four components:- Noise: Non-informational price movements due to liquidity issues, overreaction, and other trading frictions.
- Private Firm-Specific Information: Informed trading that reveals proprietary insights about a firm.
- Public Firm-Specific Information: Information from publicly available sources, such as news and announcements.
- Market-Wide Information: Broad economic or market news that impacts all firms.
 
 
14. Momentum and Factor Timing
- Structural Breaks: Advances in Financial Machine Learning Chapter 17 - CUSUM tests: These test whether the cumulative forecasting errors significantly deviate from white noise.
- Explosiveness tests: Beyond deviation from white noise, these test whether the process exhibits exponential growth or collapse, as this is inconsistent with a random walk or stationary process, and it is unsustainable in the long run.
- Right-tail unit-root tests: These tests evaluate the presence of exponential growth or collapse, while assuming an autoregressive specification.
- Sub/super-martingale tests: These tests evaluate the presence of exponential growth or collapse under a variety of functional forms.
 
- A tug of war - Lou, Dong, Christopher Polk, and Spyros Skouras. "A tug of war: Overnight versus intraday expected returns." Journal of Financial Economics 134.1 (2019): 192-213.
- Lou, Dong, Christopher Polk, and Spyros Skouras. "The day destroys the night, night extends the day: A clientele perspective on equity premium variation." London School of Economics Working Paper (2022).
- Main idea: overnight momentum and intraday reversal.
- High overnight returns tend to continue overnight in future months but exhibit a reversal during the intraday period, suggesting a “tug of war” effect.
- Clientele Hypothesis: The study attributes these effects to different types of investors. This segmentation results in a predictable pattern where:- Retail Investors drive demand at the open, impacting overnight returns.
- Institutional Investors provide liquidity intraday, which causes reversal effects.
 
- The study reveals a robust negative relation between past overnight returns and future intraday returns, a pattern they describe as "the day destroys the night." Conversely, intraday returns positively forecast overnight returns ("night extends the day"), reflecting a continuation effect.
 
- 
- CNN on OCHL charts: open, close, high, and low prices, trading volume, and moving average price over the past 5, 20, and 60 days to forecast short (five-day), medium (20-day), and long (60-day) horizons return.
- Transfer learning: they show that the predictive patterns identified by the CNN from daily U.S. stock data transfer well to international markets and to other time scales.
 
- 
- Construct various lag lengths moving average to cover different time horizons, ranging from short-term (3–20 days) to long-term (up to 1000 days).
- Each month, the expected return for each stock is predicted using a cross-sectional regression of returns on the normalized MA signals.
- Then use the estimated coefficients for next month return prediction.
 
- Momentum with ML/DL - Lim, Bryan, Stefan Zohren, and Stephen Roberts. "Enhancing time series momentum strategies using deep neural networks." arXiv preprint arXiv:1904.04912 (2019).- Using 8 MOM + MACD features with some look-back window, train DNN to optimize Sharpe/expected return.
 
 
- Lim, Bryan, Stefan Zohren, and Stephen Roberts. "Enhancing time series momentum strategies using deep neural networks." arXiv preprint arXiv:1904.04912 (2019).
- 
- This paper investigates how differences in investor beliefs (disagreement) evolve in response to large information shocks. The authors use a unique dataset and focus on securities constrained by short selling. 
- Example  
- Therefore, with short-sell restriction, the price at t0 will always reveal the view of optimistic agents. 
- After a positive price shock, the beliefs of the most optimistic agents are too optimistic at time 0 and decay towards rational beliefs over a roughly 5-year period which results in a strong, persistent negative abnormal returns. 
- After a negative price shocks, the beliefs of the most optimistic agents are also too optimistic, suggesting these agents initially underreact to the new negative information, but that this underreaction is resolved after only 1 year which results in the shorter-lived negative abnormal returns.  
 
- 
- Momentum is strongly linked to time-varying risk exposure. The conditional factor model explains a significant portion of the momentum premium. 
- Candidate models are residual momentum, traditional momentum and conditional factor model defined by IPCA. IPCA estimator is defined as the conditional expectation of the factor component of returns, , where . To focus squarely on the role of time-varying risk exposures, our analysis treats the expected factor return as constant: . 
- IPCA Kelly, Pruitt, and Su (JFE 2019) explain: where is the future asset return in period , is time-varying observable asset characteristics (instrument vector), is a fixed mapping from observable characteristics to latent risk factors, is the latent factor return. This esti - In traditional asset pricing, is factor exposure for asset i estimated by Fama-Macbeth and  is the (double-sorted) factor (portfolio) return (FF 3/5 factors).
- IPCA allows time-varying observable characteristics, inference from not only return-covariance, and parameter efficient (no need to estimate factor exposure, but mapping). It is similar to BARRA model but maps to low-dimentional latent risk factor space.
- Kelly, Pruitt, and Su (JFE 2019) propose to estimate  recursively
  - The linear form IPCA has been extended to DNN in Gu, Shihao, Bryan Kelly, and Dacheng Xiu. "Autoencoder asset pricing models." Journal of Econometrics 222.1 (2021): 429-450.
 
 
- 
- Public news chiefly moves prices overnight, whereas private information (revealed through trading) drives intraday returns.
- Intraday-based portfolios exhibit strong momentum over both short and long horizons, without subsequent long-run reversal; overnight-based portfolios show long-run reversal but no short-run momentum.
 
- 
- Individual investors are more likely to sell stocks near the 52WH due to the disposition effect (a tendency to sell assets that have increased in value) and anchoring bias (relying too heavily on the 52WH as a reference point).
- Stocks that experience high levels of limit order selling by individual investors at and around the 52WH tend to have abnormally high returns in the period following the 52WH.
 
- 
- The ML models inputs: 12 monthly cumulative return, LHS: the upcoming month excess return.
- ML model prediction is signigicantly positive even after controlling for popular risk factors.
 
- 
- Trade long-term momentum when they agree with short term momentum (1/3 months return).
- Turning points occur when slow and fast signals disagree, marking potential trend reversals that tend to inflict “whipsaw” losses on static strategies .
- Corrections (slow ≥ 0, fast < 0) tend to revert: ~61% are followed by Bull/up months. Thus, fast short-bets in Corrections often lose, favoring slower positions.
- Rebounds (slow < 0, fast ≥ 0) tend to continue: ~56% are followed by up months, so fast long-bets in Rebounds are rewarded .
 
- Wang, Huaixin. "Style Switching and Asset Pricing." Available at SSRN 4686997 (2024). - When investors extrapolate past returns, they switch between “twin” styles (assets with similar characteristics) and “dual” styles (assets with opposite characteristics), inducing cross‐asset momentum among twins and cross‐asset reversals among duals .
 
15. NLP
- Wolfe Research | Text mining unstructured corporate filing data- EDGAR 10-K and 10-Q filings (by section comparison)
- Features:- Sentiment and tone analysis
- chanes in sentiment
- distance measures (YoY embedding/BoW)
 
 
- Learning Fundamentals from Text - Use attention machenism to weigh the importance of different paragraphs in a document, focusing on those that are most relevant to market reactions. The document-level aggregated vector is then used to predict the target variable, which is the direction of stock returns around the filing date.
 
- 
- Investor interactive platforms (IIPs) in China 
- random sample of around 50,000 questions and then employed a state-of-the-art BERT-based algorithm to classify the remaining 2.45 million postings. - Findings: About 80% of questions seek clarification or explanation about specific items in financial reports or company operations, 16.6% are comments or suggestions to management, and the remaining questions pertain to verifying rumors or addressing misunderstandings.
 
- These platforms alleviate common investor challenges, stimulate trading, improve market liquidity, and enhance the informativeness of stock prices. 
 
- Cohen, Lauren, and Quoc Nguyen. "Moving Targets." Available at SSRN 4736129 (2024).- Managers publicize performance targets (e.g., revenue, same-store sales, product metrics) in earnings-call presentations. When they fail to hit a given target, they often shift the discussion to a different metric—“moving the target” to ensure they still clear a self-set hurdle.
- There is no immediate announcement reaction but there will be underperformance after moved targets. Shifts in non-financial targets (e.g., subscriber counts, product units) predict larger underperformance than purely financial ones.
- Analyst attention: when analysts explicitly question a dropped target and management is forced to address it, the underperformance effect is attenuated, indicating that inattention drives the gradual price drift.