Sunday, July 21, 2013

Statistical Research and the Ethics Behind It




Following the Age of Reason, the concept of ethics has virtually dominated the academic circle. Strict ethical standards are put in place to combat the rise of unethical research. Nurses have to abide the ethics of their fields, as biologists too have to  strictly follow their ethical code when conducting researches. This is the general rule for every field, and the same standard of ethics applies when it comes to the world of statistics and data analysis. Statisticians have to work with a large number of data sets, and given that most data set require a computer program to run them, statisticians can manipulate the data so that the outcome they want matches with their original prediction. 

Siddhartha Kalla mentions, "It is relatively simple to manipulate and hide data, projecting only what one desires and not what the numbers actually speak..." (2010). Statistics is a game of numbers, and an effective correlation cannot be established if a statistical program does not suggest a good correlation between one number and another. For instance, the Heritage Foundation, a conservative foundation, asks its statisticians to conduct a research on how the Americans view the State of Israel. It is well known that the Foundation has a pro-Israeli philosophy, and one can easily guess that the Foundation's initial intention should be to present a message that speaks on behalf of the State of Israel. However, as statisticians run their data, the data might suggest otherwise. Given this situation, statisticians do still have control over the data and they can manipulate however they want.



Such problems are mostly seen in the areas of public service and government. Statisticians,  like most other people, also have deeply-held political beliefs. For many statisticians who work for the state or federal government, political parties, interest groups, and other politically active groups, the interest of their employer could easily outweigh the ethical interest of statistics.

At times of elections, it is more likely that the Department of Labor's job comes with a positive message. Greg Hunter, who blogs for a watchdog group, mentions that the Department of Labor always uses the data of seasonally adjusted jobs while producing a positive job report. According to him, "seasonal adjustment jobs are created out of thin air and are not really there for people" (Hunter, 2011). He also mentions that the unemployment rate has been increasing in the U.S, contrary to President Obama's claim that it is decreasing. We cannot easily discredit Hunter's claim, since the combination of numbers with politics can easily allow the latter to gain substantial benefits. 


I personally believe that statisticians who work for a politically active group have a higher chance of manipulating data. Extreme political polarization symbolizes today's political culture, and this has further developed into a culture in which a neutral  data can be easily transformed into a  biased one. Bureaucrats are also highly immune to engage in such activities, as what they report determines their staying in power. This does not mean that bureaucrats and  political active people have always reported biased news, but their chance of doing so is much higher than other statisticians who do not work for a politically active group. It is also imperative to remember that  the Census Bureau employs many statisticians, and it has to be noted that those statisticians have no interest in manipulating the data, given that their job has a neutral outlook. 



Such unethical acts are not only limited to political areas, but a telephone company can also present a data that can help to get more customers. If the company carries out a survey to learn about customer satisfaction, but it only picks 20 customers to participate in the survey. The participant of 20 customers is not statistically significant, so the conclusion of the survey cannot be trusted. Encouraged by private interests and corporate profits, anybody can do it. So it is very essential for statisticians to embrace the virtue of ethics before they abide by the unethical policy of their employer. 

Therefore, there are many areas in which statisticians can engage in unethical acts. But every statistician has to remember the fact that the conclusion they draw from a data set can affect the lives of many people. The right conclusion made by an epidemiologist or a bio-statistician can prevent the spreading of diseases. Similarly, government statisticians and corporate statisticians have to report what the numbers suggest, not what their employers would have wanted the numbers to suggested. This means that the proper evaluation and interpretation of numbers will lead us toward the truth,  which is the only goal of statistical research. 





References 

Hunter, Greg. "9% Unemployment Rate is a Statistical Lie | Greg Hunter’s USAWatchdog." Real News from Greg Hunter’s USAWatchdog: Economic News and Breaking News Reports. N.p., 2 July 2011. Web. 22 July 2013. <http://usawatchdog.com/9-unemployment-rate-is-a-statistical-lie/>.

Kalla , Siddharth . "Ethics in Statistics - Avoid Mispresentation of Data in Research." Immunizing Against Nonsense - ...and making better brains for science. N.p., 16 Apr. 2010. Web. 22 July 2013. <http://explorable.com/ethics-in-statistics>.




http://www.shadowstats.com/imgs/sgs-emp.gif?hl=ad&t=1373030154.

















Issues with Surveys


                                     


Since the 1960s, the usage of surveys has rapidly increased across the globe. While social scientists conduct surveys using proper survey techniques, the evaluation and interpretation phases of any survey design strictly require the application of statistical methods. If a social scientist carries out a survey on the rise of atheism in the U.S. and Europe, his/her entire work will go to vain if he/she does not have a good knowledge of statistical methodology. Having a proper knowledge of statistics will enable survey researchers to evaluate and interpret the outcome correctly. 

It has to be noted that social scientists have been relentless in their pursuit of developing new survey methods whose outcome can be statistically accurate. The Center for the Study of Politics and Governance concluded, "The methods for conducting surveys of public opinion are undergoing a period of rapid and potentially fundamental change." The Center mentions that automated-surveys and internet-based surveys have gained prominence over telephone surveys. Also, social scientists have developed new survey technique theories, one of which might include selecting the right subject while conducting a survey. 

 However, despite all of the improvements made in the field of survey design, recent news suggest that survey results are not easily trusted by the public in the U.S. The Town of East Hampton, New York, recently conducted an aerial survey to count the total population of deer in the town. The recent survey concluded that there were 877 deer, whereas the 2006 roadside distance sampling concluded that there were 3,293 deer in 2006. Considering the alarming decrement in the deer population, Rohma Abbas wrote an opinion titled "East Hampton Town Deer Survey Results May Not Reflect True Population" (2013). Throughout her article, Abbas persistently reiterates that the survey's conclusion was precisely inaccurate because of the type of survey that was used. While the 2006 roadside distance sampling was considered accurate, the recent aerial sampling did not embrace the 2006's level of accuracy.  

For instance, let me present a scenario that will help everyone to understand the complexity behind this world of surveys. Let us imagine that a survey has been carried out on the topic of alcohol consumption by college students and researchers have decided to conduct the survey using two different methods.Like the East Hampton deer case, it might even be true that the outcome of one method could differ significantly from the outcome of the second method. If the outcome produced by the survey method "A" does not match with the outcome produced by the survey method "B,"  this provides enough ground to conclude that there might be errors associated with the data, or interpretation, or survey technique. 

How can one survey method be more accurate than others? How to prefer one method over another? These questions pose a clear threat to the modern approach of survey scientists and researchers. Given that the academic community has been producing new survey techniques in the last decade, they have not been able to produce one best technique out of those many survey techniques. 

As mentioned in the first paragraph, new survey techniques have succeeded over the telephone survey method. The Center for the Study of Politics and Governance has famously concluded that the main advantages the new survey techniques have over telephone surveys are money and speed. Automated and online surveys are less expensive and also surveys can be carried out at a faster pace. This further creates another problem: do automated and online surveys give us the right information?

 Let us imagine that a group of researchers want to study about the persistence of poverty in the U.S., and they have decided to conduct an online survey. Given that situation, it is likely that their subject will be poor Americans. However, if the survey will be an online one, it is very less likely that they will ever reach to their subject. Many poor Americans may not have access to computers or telephones, so the researchers would be missing their subject of the survey. So the academic community has to  choose one out of so many survey techniques that it has produced in the past, so the public always gets the right information. 

Therefore, academicians have to develop and embrace one reliable method of survey sampling. Otherwise, faulty information will reach the doorstop of every American. If the academic community does not do it on the right time, then every American might embrace the thought of Mark Twain, who once said, "Lies, Damned Lies, and Statistics." If the data set is not interpreted properly and the right survey method is not used, then the data set may just serve as a medium to institutionalize lies. 


References 

Abbas, Rohma . "East Hampton Town Deer Survey Results May Not Reflect True Population - 27east." The Hamptons. N.p., 11 June 2013. Web. 21 July 2013. <http://www.27east.com/news/article.cfm/General-Interest-EH/15883/East-Hampton-Town-Deer-Survey-Results-Are-In>.




http://www.aapss.org/uploads/media_items/survey.480.383.s.jpg





Sunday, July 14, 2013

SYSTAT 13.1

SYSTAT Software has recently developed SYSTAT 13.1, which is a new statistical software. While statistical methods like linear regression and analysis of variance have been included in most of the statistical packages in the past, SYSTAT 13.1 includes new statistical methods, such as mixed model analysis and Advanced Regression models. In its website, SYSTAT Corporation mentions that the new software has many advantages over other statistical software, since it includes new and advanced statistical concepts like "ARCH & GARCH Models in Time Series, best subsets regression, confirmatory factor analysis, environment variables in best statistics, polynomial regression, enhancements to existing methods such as ANOVA, Bootstrapping, Crosstabulation, Fitting Distributions, Hypothesis Testing and more"("SYSTAT"). This program will be more useful for data scientists, as it has been designed to analyze large data sets. Furthermore, the SYSTAT Corporation also claims that its new software can compute "statistical methods up to 10 times faster than older versions on most problems" ("SYSTAT"). As multivariable regression models are the most trickiest part in the field of statistics, this new software has been designed in such a way that will aid statisticians to effectively deal with them. This new software helps statisticians, as it finds the best model in a given multivariate data set. It employs many statistical methods and chooses the most accurate one when dealing with multivariate variables.

Another striking feature of this software is that it can present statistical data on maps. This is particularly appealing, since all of the statistical software (SPSS, Minitab) I have used in the past did not have the feature of producing statistical data on maps. There was a time when social scientists were not able to use statistical software because of their extensive use of mathematics. On the one hand, there are software that require excellent knowledge of mathematics, and on the other hand, there are software that do not. SYSTAT 13.1 stands as a bridge between both of these approaches. This software can be easily used by people whose proficiency in mathematical concepts is fairly low, and people with excellent mathematical knowledge can also use this software.



The other striking feature of this software is that SPSS and SAS files are compatible with it. This new software can open SPSS (Statistical Package for the Social Sciences) and SAS's files. Since the menu and command interface system are linked to one another, data scientists can chose the one that best fits their model. Sociologists and criminologists mostly use SPSS, and since this new software can open SPSS files, social scientists are more likely to benefit from this software. This is mainly because this software has a generous advantage over SPSS. Unlike SPSS, which requires its users to pay for commercial license pricing, SYSTAT 13.1 includes commercial license pricing in its base price. John A. Wass, who writes for Scientific Computing, mentioned, "While many programs containing the newer simulation and data mining software (SYSTAT 13 contains a highly efficient Classification & Regression Tree algorithm) come with astronomical price tags, this software is still quite reasonable." Not only does this software consists of similar features that SPSS has, but it is also cheap and has been upgraded recently.



















To conclude, SYSTAT Software has done a tremendous job by developing SYSTAT 13.1. This software will be used by people of diverse fields, criminologists, sociologists, data scientists, biostatisticians, and many others. The following link can provide access to a SYSTAT's13.1 demo video. http://www.systat.com/videoPlayer/VideoDemo.aspx

References

"SYSTAT." Statistical and Graphical Software. N.p., n.d. Web. 15 July 2013.

Wass, John . "SYSTAT 13(2)." Scientific Computing | The Source for Informatics, HPC and IT Solutions. N.p., 3 Nov. 2010. Web. 15 July 2013.
http://www.scientificcomputing.com/articles/2010/11/systat-132.
http://www.gfxtra.com/software/123348-systat-130-portable.html.

The Modern Application of Statistics

The American Statistical Association defines statistics as "the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it thereby provides the navigation essential for controlling the course of scientific and societal advances ( " qtd. in "What is Statistics?" ). Ranging from customer satisfaction to the most extensive research in epidemiology, statistics plays the most pivotal role in these fields. One of the major tasks of statisticians is to collect data. After collecting the data, statisticians try to analyze them. This means, based on the composition of data, statisticians try to know what the data wants to convey. They look for patterns in any set of data, and this will help them to suggest a correlation between a cause and an effect. For instance, the U.S Office of Immigration and Naturalization Services hires a statistician to study about the growth of immigrant communities in the last decadeGiven this, statisticians will collect the data from the immigration services and they will determine the most populous immigrant group. Using many variables, statisticians will explain the increase in the number of immigrant communities. Most interestingly, using the current state of immigrants in the U.S., statisticians can also predict about the future. This part is the most important part, as it deals with interpretation of the given data. Interpretation requires accuracy, as inaccurate data always lead to faulty result.
Statistics is one of those professions that requires strict adherence to shared values and professional ethics. In its Declaration on Professional Ethics, the International Statistical Institute has outlined "respect, professionalism, truthfulness and integrity" as the most important tenet in the field of statistics ("Declaration on Professional Ethics", 2010). Since statisticians often have to analyze data that involve an individual's privacy, it is imperative that they respect privacy of every individuals. For instance, most media have reported that the reliance on food stamps has been steadily increasing since 2001. Most of these media reports are well supported by statistics. While statisticians did draw a general conclusion that the reliance on food stamps has been increasing, they never disclosed the names of those who have been benefitting from those government services. Therefore, it is imperative that statisticians always respect the privacy of every individual, since some of the data information they analyze involve some degree of private information. Furthermore, statisticians should always be professional while analyzing data so that the outcome of any test can be used for the advancement and betterment of society. Statisticians always develop new methods to tackle new challenges, as complex problem keeps emerging in the field of psychology, bioscience, logistics, and many others. There was a time when field surveys used to be the easiest way of conducting surveys, but as new technologies kept emerging in the last two to three decades, phone and email surveys are the most viable method of conducting surveys nowadays.

The International Statistical Institute has clearly narrated the fact that statisticians "produce statistical results using our science and are not influenced by pressure from politicians or funders" ("Declaration on Professional Ethics, 2010). This is very important, given that statisticians have to work under the direction of corporate office or politicians. Given the liberal nature of the Obama administration, it is very important that the Department of Labor's statisticians do not succumb to the interest of the liberals by producing job reports that might aid Obama and the Democrats to stay in power. While avoiding the influence of politicians is one of the cherished shared values amongst statisticians, the other important value also entails that statisticians have to use those measures that are very likely to produce the most accurate result. Accuracy matters the most.

Although the field of statistics was initially used for recording population data, it now operates on a distinct shape. As the advent of globalization moves on, banks are becoming multinational banks and businesses have transformed themselves into multinational businesses. Statistics plays the most important role for these changes, since statisticians use the knowledge of probability and risk analysis to predict the future of banks and businesses. Beside its usage in the field of industry and banking, statistics has been gaining prominence in the field of public health. Public health scientists use statistical methods to design new ways that can stop diseases from killing people. Even in the field of comparative politics, a subfield of political science, statistics is heavily used. Polling agencies rush to statisticians at times of elections. Even sociologists have embraced the methods of statistics to predict societal conditions or changes. Therefore, statistics has been developing into one of the most respected field in the world today.



References

"Declaration on Professional Ethics." International Statistical Institute. 23 July, 2010. Web. 07/14/2013.

"What Is Statistics?." Home. N.p., n.d. Web. 14 July 2013. .

http://www.bls.gov/ooh/math/statisticians.htm#tab-2.

http://rlv.zcache.com/statisticians_dont_wait_mugr93c71ebb904a4f2f93a5fc94bbfee4c2_x7jg9_8byvr_152.jpg

https://www.youtube.com/watch?feature=player_embedded&v=7hm6alVClCk.

https://www.youtube.com/watch?v=m-DjLSuT5BI