Multivariate Statistical Modeling of Stock Investments During the COVID-19 Outbreak (2021-EU-30MP-767)

Level: Intermediate

Chi-hong Ho, Student, STEAMS Training Group
Mason Chen, Student, Stanford OHS

In late February 2020, the COVID-19 pandemic began gaining momentum, and the stock market subsequently crashed. Many factors may have contributed to the fall of stock prices, but the authors believed the pandemic may have been the main cause. The authors’ objectives for this project were to learn about stock investments, earn money in the stock market, and find a model to help determine the timing and amount of the trading. All of the data was Z-standardized to help eliminate bias and for ease of comparison. Specifically, the authors used three Z-standardization values: Z (within stock), Z (NASDAQ Ratio) and Z (Group NASDAQ Ratio). The authors compared the current stock price with the previous stock price, NASDAQ stock price and group NASDAQ price average respectively. After that, the authors combined three Z-values and applied it to the stock index, which much decreases the data bias and is better for reducing risks in investment. Determining the outliers is to acknowledge the timing of investment. Quantile Range Outlier is easily affected by the skew factor, because it is usually used in the normal distribution. Robust Fit Outliers is a better tool, because it can eliminate skew factors. The authors established a model to help people invest the right amount of money.

Auto-generated transcript...

Speaker	Transcript
Chi-hong Ho	Hey, no. Come on. Okay, hello everyone I'm Chi-hong Ho, junior at Henry M Gunn High School. My partner is Mason Chen; he is a sophomore at Stanford Online High School. Our project is to finding the multivariate statistical modeling of stock investment during COVID-19 outbreaks.
	In late February due to the coronavirus pandemic spread out in the world, the stock markets start crashing. There are several factors causing this year's stock market crash.
	There are several factors causing this year's stock market crash, such as a COVID-19 pandemic.
	OPEC/Russia/USA oil price war. Also 2.2 trillion bailout package from US government.
	And some companies laid off their employees. There is more than 30% unemployment rate due to the pandemic. Let's compare with the 1929 Great Depression unemployment rate.
	At that time it was about 25%. Also in November, the US during the presidential election. And then the manufacturing supply chain shut down because of the coronavirus pandemic spread out in China before.
	Yeah, the COVID-19 pandemic influenced the stock market as the past stock market crash that happened in 1929 Great Depression, 1987 Black Monday.
	Because the COVID-19 pandemic had a huge impact on the world causing lots of deaths, the stock market should get more influenced by pandemic. Compared to US history, the Great Depression of 1929
	and Black Monday 1987 both crashes continued for a long period. COVID-19 wasn't. This year, stock market decreased by 25% from the peak in March to April.
	Before March, the COVID-19 pandemic in the US was not spread out as fast as other countries. After the COVID spread was global, lots of countries were locked down. National and global-wide lockdown situation would affect the stock market.
	Look at the graph in the left corner. That is the situation that happened Korea. Asian countries got COVID pandemic before America.
	We use the Asian countries' situation to predict what will happen to the US. Based on this graph with this color that is COVID-19 inflection point is for Phase II, it may
	impact the stock curve significantly, because it seemed that the case growing speed will be little bit decreased in this short period.
	Okay. In the left side of graph, it's the correlation map of the case versus the date. This is its correlation map in the US.
	The cases were added in early February, and much through in late March and early April.
	The right graph is the stock market down by the date, which means that the case interest to maps are related to each other. When the case grew much, the stock market start to crash and the lowest point was more than 35% down.
	Compared with China, South Korea and the US, we knew that the Asian countries got COVID pandemic before the US. We could conclude that what will happen in the next few months in the US.
	Based on just that specific table and data below, I found that the duration of Phase III is really short. And it's now safe for us and we can go back to work.
	Compare with the direction of Phase IV, which is double the time of Phase III, so in this time we are we are feel more safer to go back to work.
	After we look at the Asian countries pandemic phase, we can predict the Phase III and Phase IV duration of the US. We would estimate the US end of Phase III should be on
	April 15 and Phase IV should be on May 25 for the best case. The worst case is end of Phase III is around April 30 and the end of Phase IV on June 10.
	Our recent projects to define stock investment strategy, our objectives are learning and experiencing the stock investment, earning money in the stock market, and building a model for judging the times
	to trade or exchange stock. Firstly we own eight high technology stocks which were purchasing 2008 and 2009, and average gain is about 400% in March.
	Some of the stocks are the top 20 of the Standard and Poor 500 stocks, with an average gain of more than 800%. We want to find a time to sell those stocks and get money back.
	Because of the COVID-19 situation and the stock market crashed, the stock price was not as high as in March. So we wanted to sell quickly.
	After selling high tech stocks, we look at 23 COVID impact stocks, which stocks lost ground because of the pandemic.
	We want to wait for those stocks surge after few months.
	Our choice of COVID impacted stocks should have a minimum of 3 billion market cap. When we down to trading stock, we could have some exchanges. We should...we sold one tumbling stock and bought one rising stock for balance. We need to make sure our stocks will surge in the few months.
	We separate the stock transaction decision chart into three levels. The first level is to decide
	what we will buy, what we will sell and to exchange. In level two, we are picking the stocks from selling group, buying group, and exchange pair from the two groups. The third level is the tool we will use. We may use the Z index, we will use the outlier detection tools.
	The function of the standardization is to give us the idea of how far from the mean to the real data points. Why do we need to use it? Because we need to convert the actual data to an index that's easier for us to compare. Also using the standardization to eliminate the bias of raw data.
	On the left side, there are blue boxes, purple boxes and red boxes. The real data
	is in the input we collect, which is in the blue box. The new index after the Z standardization is found in the red box. That is our output.
	The Z standardization is a tool we use, which is in the purple box. In the blue box the cubicle represent NASDAQ stocks,
	which is popular and lots of people will invest in. High tech stocks had grown a large amount during five years.
	The range of Z standardization is from -3 standard deviation up to +3 standard deviation.
	After the Z standardized, we can get Z within stock, Z NASDAQ ratio, and Z group NASDAQ ratio.
	The Z within stock is to compare the stock price with the past five years' previous stock price. The, the Z NASDAQ ratio compared the stock price with the NASDAQ stock price, the Z group NASDAQ ratio compares the stock price with the group NASDAQ
	mean. We use the Z standardization to help us look up the ris. In the end we will combine all three Z score into a new stock index. The stock index can help us to lower the risk of transactions.
	Here's the data table after we standardized with the raw data. We can use the stock...the stock price index change.
	USA stock has been in downward trending since a peak around in mid February. Some stocks are more robust and certain ones are impacted by COVID-19.
	We establish this modeling algorithm on March 7-8 and and the database on March 14-15. The red in that index shown in this figure speaker,
	that are good time...present a good time to sell out those stocks and we can earn more money than other not red index.
	Also, there are some index marked by color blue. That is the time to we can consider to buy the 23 COVID impacted stock because we can lower the cost and we can gain more in the future.
	The reason why we use the outlier algorithm is that the outlier is helping us determine the timing of trading stocks.
	The best way to determine the outliers is to use quantile range outliers. Firstly, we used to find the entire quartile range which is Quarter 3 minus Quarter 1.
	The algorithm is outlier equal to Q1 minus or Q3 plus x IQR, x equal to 1.5 for regular or 3 for extreme.
	Why do we need to choose the extreme outlier? Because regular outlier cannot shows the longer timing that we wanted.
	Extreme values are found using a multiplier of the entire quartile range, the distance between two specified quantities. So extreme outlier can have a long
	detecting level that we can use in the investment, which can help us reduce our risk. But technically the quantile range outliers algorithm is used in the normal distribution situation.
	The stock market is not as much as the normal distribution, so the outliers will be influenced by the skew factor.
	Thus we need to use more powerful tools that are not influenced by the skew factor.
	Because of the stock performance, we would not care more about the tails than the center of the distribution.
	The next tool we use is robust fit outliers. We use robust fit outliers to ignore the skew factor.
	Outliers and distribution skewness are very much related. If you have many so called outlies in one in one tail of distribution, then you'll have skewness in the tail.
	In quantile range outlier detection, the assumption is normal distribution. So skewness in the distribution will introduce an inaccuracy in the outlier detection methodology. If the distribution is significantly skewed, like it probably is in stock market data,
	the robust fit outlier are a better method to find the outliers accurately because they tend to ignore the skew factor. The robust fit outlier estimates
	a center and spread. Outliers are defined as those values that are K times robust spread from the robust center.
	The robust fit outliers provide several options for coupling the robust estimate and multiply K, as well as provide tools to manage the outlier found. We use K=2.7 for regular and 4.7 for extreme outliers.
	After we use the regular robust fit outliers, we can find out the outlier in the selling index data. Look at the right graph.
	There are so many shaded red cells in F5 and F8 columns, indicating that we can consider to sell those stocks to maximize our profit, because the stock price is selling above average. Each column is showing some stock index change by day going down to the column.
	The reason why we use the extreme outlier for buying index is that the buying index is dropping. That means it is really difficult to detect the outliers of the buy index. Not like the selling.
	Selling index is rising, which is easier to for us to determine the outlier.
	On this page, there are some color blocks in the data table, like B6, B13, B15 and B19, which indicates that we can consider to buy the stocks.
	Lots of people make money by investing in stocks and most people may choose the right stock to invest in for reasonable ROI.
	But investors are challenge to find the right amount of money to invest. Also other human psychological factors will favors our certain investment.
	We can determine the amount of stock we buy, sell or exchange based on this model, which can minimize a personal investment bias and reduce the overall financial risk.
	The model provides two ways to judge the amount of investment. The first one is the color block analysis. Now in that analysis,
	the blocks with dark green are good to sell and the blocks with orange or red is good to buy or it's good to exchange.
	And then in the bottom, there are the transaction levels we define. The L10 is the least investment amount and L1 is the greatest investment amount.
	If we want to sell the stocks, we will sell not too high, is in the L5 amount. And do the exchange, we also choose the L5 model. And if we do the buying, we can just to consider to buy more, so we just buy L2 amount.
	So based on this model, we can just
	manage our
	...based on this model, in the investment you will reduce your financial risk.
	Then the function....okay...this is...in the Phase III is in the exchanging part. Also we are using Z standardization for...to convert the data point into this index, but this is the exchange index.
	We set up the exchange threshold Z exchange index should be greater than 15. This is an average index we calculate, which can tell investor the time.
	On the left side, there's the line chart, which shows the change of each exchange pair. Based on this line chart, we can see the trend of S5-B1 is about 15.8 and S5-B14 is about 15.16.
	So that means we can consider to doing that exchange between the S5-B1 or S5-B14.
	After Z standardized, we can get the stock...sell stocks index and the exchange index.
	The selling index
	is to compare the stock price with past five years stock price. The
	Z NASDAQ ratio is compared to stock index with the average stock price. So the exchange index is compared to stock selling index, which was the stock buying index.
	We use Z standardization to help us look after risk. We consider about 184 choices and we need to make sure our investments will be in the right timing and pick up the right pair to do the exchange.
	We're also using the
	quantile range outlier algorithm to help us determine the timing.
	The small value of Q provides a more radical set of outliers than the large value.
	Look at right side table. We use quantile range outlier methods and get a top thee outlier whose exchange index value is greater than 19.
	This is the second time we consider the exchange index. We found out that there are top index, which are the signal and help...and it's the best timing for us to do the exchange.
	Like S5-B14 pair at 19.27; S5-B13 pair at
	19.12; S5-B12 at 19.07. On the left side, we have the timing prediction model. This model is presented with a color code, color box style.
	That blocks with dark green are good to sell for us, and dark orange or red is good for buying or exchange. The best time will be bold and shown in the graph. April 6, which is the best day to do the exchange, since we come to exchange data from February in 2015.
	We consider the exchange pair twice, which can double insurance that we can make more money in the stock market and enormously reduce the investment risk.
	On April 7, the exchange index had little change compared with April 6. On that day, S5-B1 pair had a 19.18
	exchange index. In the right graph,
	that was the exchange stock information. On April 8, we sold the KLAC stock at $154.32. The market stock price was $148.85 so we saved about 3.5% into selling
	Also on the same day, we bought the Delta stock at $22.42.
	The stock market price was $22.92, so we gained about 2.2%, so it's changed in the sale. So sellign the same amount of stock and buying the same amount of stocks for balance.
	The model of selling and buying stock was equal to 65 quantities. After one day, the exchange pair has helped us gain 5.7%.
	Oh yeah, all stock buyers will focus on their stock trends. My partner Mason and I monitor the NASDAQ stock daily range outliers from 2020 late February to 2020 mid March.
	We separate the daily trade window into a certain time slots, 30 minutes each, and we want to find when's the best time of trading.
	There were 24 peak and valley points we detected and the upper threshold is 2.7%. In the right corner figure, we can see the stock price in the open, close, high, low times and also we will calculate the average...we count the price range and rank when we do the stock price peak and valley detection.
	??? considered the discrete number of sample size. Among 24 peak/valley points we detected, the data shows that 17 out of 24 points is about 70%-71% were happening in the
	first or last hour. We set the one-proportion test that we made the null hypothesis is that we wake up early and have a stock lunch session to do trading.
	Null hypothesis is assuming the uniform distribution probability. Look at the left corner table that we can see the null proportion value is 0.34 which is greater than 0.05 so
	we cannot to to reject the null hypothesis. So
	To be four slots among 13 slots available, so we can not reject the null hypothesis. Look at the right corner table or figure which shows the distribution level of peak time and the valley time.
	In our research we provide a new model to pick the right stocks and the ??? the amount of buying and selling, also the exchanging index.
	Timing is a really important factor in the investment. This model of stock investment is accurate most of time during the COVID-19 pandemic.
	Our research group invested in the stock market and gained 2.5% after we finished the project. We may use it to predict in the future if the pandemic doesn't end. Based on our research early bird or last minute favor stock trading and can earn more money.
	Thank you.