Python for Data Analysis : (Record no. 40605)

MARC details
000 -LEADER
fixed length control field 14931nam a22002297a 4500
003 - CONTROL NUMBER IDENTIFIER
control field CUTN
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20231202112738.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 231202b |||||||| |||| 00| 0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9789352136414
041 ## - LANGUAGE CODE
Language English
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Edition number 23
Classification number 005.133
Item number MCK
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name McKinney, Wes
245 ## - TITLE STATEMENT
Title Python for Data Analysis :
Remainder of title Data Wrangling with Pandas, NumPy, and IPython /
Statement of responsibility, etc Wes McKinney,
250 ## - EDITION STATEMENT
Edition statement 2nd Edition
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc Navi Mumbai :
Name of publisher, distributor, etc Shroff Publishers,
Date of publication, distribution, etc 2018.
300 ## - PHYSICAL DESCRIPTION
Extent xvi, 522 p. :
500 ## - GENERAL NOTE
General note All Indian Reprints of O'Reilly are printed in Grayscale<br/><br/>Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You'll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process.Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It's ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.<br/><br/>Use the IPython shell and Jupyter notebook for exploratory computingLearn basic and advanced features in NumPy (Numerical Python)Get started with data analysis tools in the pandas libraryUse flexible tools to load, clean, transform, merge, and reshape dataCreate informative visualizations with matplotlibApply the pandas groupby facility to slice, dice, and summarize datasetsAnalyze and manipulate regular and irregular time series dataLearn how to solve real-world data analysis problems with thorough, detailed examples<br/><br/>About the Author<br/><br/>Wes McKinney is a New York−based software developer and entrepreneur. After finishing his undergraduate degree in mathematics at MIT in 2007, he went on to do quantitative finance work at AQR Capital Management in Greenwich, CT. Frustrated by cumbersome data analysis tools, he learned Python and started building what would later become the pandas project. He's now an active member of the Python data community and is an advocate for the use of Python in data analysis, finance, and statistical computing applications.<br/><br/>Wes was later the co-founder and CEO of DataPad, whose technology assets and team were acquired by Cloudera in 2014. He has since become involved in big data technology, joining the Project Management Committees for the Apache Arrow and Apache Parquet projects in the Apache Software Foundation. In 2016, he joined Two Sigma Investments in New York City, where he continues working to make data analysis faster and easier through open source software.
505 ## - FORMATTED CONTENTS NOTE
Contents TABLE OF CONTENTS
Title <br/>Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi<br/>1. Preliminaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.1 What Is This Book About? 1<br/>What Kinds of Data? 1<br/>1.2 Why Python for Data Analysis? 2<br/>Python as Glue 2<br/>Solving the “Two-Language” Problem 3<br/>Why Not Python? 3<br/>1.3 Essential Python Libraries 4<br/>NumPy 4<br/>pandas 4<br/>matplotlib 5<br/>IPython and Jupyter 6<br/>SciPy 6<br/>scikit-learn 7<br/>statsmodels 8<br/>1.4 Installation and Setup 8<br/>Windows 9<br/>Apple (OS X, macOS) 9<br/>GNU/Linux 9<br/>Installing or Updating Python Packages 10<br/>Python 2 and Python 3 11<br/>Integrated Development Environments (IDEs) and Text Editors 11<br/>1.5 Community and Conferences 12<br/>1.6 Navigating This Book 12<br/>Code Examples 13<br/>Data for Examples 13<br/>iii<br/>Import Conventions 14<br/>Jargon 14<br/>2. Python Language Basics, IPython, and Jupyter Notebooks. . . . . . . . . . . . . . . . . . . . . . . . 15<br/>2.1 The Python Interpreter 16<br/>2.2 IPython Basics 17<br/>Running the IPython Shell 17<br/>Running the Jupyter Notebook 18<br/>Tab Completion 21<br/>Introspection 23<br/>The %run Command 25<br/>Executing Code from the Clipboard 26<br/>Terminal Keyboard Shortcuts 27<br/>About Magic Commands 28<br/>Matplotlib Integration 29<br/>2.3 Python Language Basics 30<br/>Language Semantics 30<br/>Scalar Types 38<br/>Control Flow 46<br/>3. Built-in Data Structures, Functions, and Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br/>3.1 Data Structures and Sequences 51<br/>Tuple 51<br/>List 54<br/>Built-in Sequence Functions 59<br/>dict 61<br/>set 65<br/>List, Set, and Dict Comprehensions 67<br/>3.2 Functions 69<br/>Namespaces, Scope, and Local Functions 70<br/>Returning Multiple Values 71<br/>Functions Are Objects 72<br/>Anonymous (Lambda) Functions 73<br/>Currying: Partial Argument Application 74<br/>Generators 75<br/>Errors and Exception Handling 77<br/>3.3 Files and the Operating System 80<br/>Bytes and Unicode with Files 83<br/>3.4 Conclusion 84<br/>4. NumPy Basics: Arrays and Vectorized Computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85<br/>4.1 The NumPy ndarray: A Multidimensional Array Object 87<br/>iv | Table of Contents<br/>Creating ndarrays 88<br/>Data Types for ndarrays 90<br/>Arithmetic with NumPy Arrays 93<br/>Basic Indexing and Slicing 94<br/>Boolean Indexing 99<br/>Fancy Indexing 102<br/>Transposing Arrays and Swapping Axes 103<br/>4.2 Universal Functions: Fast Element-Wise Array Functions 105<br/>4.3 Array-Oriented Programming with Arrays 108<br/>Expressing Conditional Logic as Array Operations 109<br/>Mathematical and Statistical Methods 111<br/>Methods for Boolean Arrays 113<br/>Sorting 113<br/>Unique and Other Set Logic 114<br/>4.4 File Input and Output with Arrays 115<br/>4.5 Linear Algebra 116<br/>4.6 Pseudorandom Number Generation 118<br/>4.7 Example: Random Walks 119<br/>Simulating Many Random Walks at Once 121<br/>4.8 Conclusion 122<br/>5. Getting Started with pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123<br/>5.1 Introduction to pandas Data Structures 124<br/>Series 124<br/>DataFrame 128<br/>Index Objects 134<br/>5.2 Essential Functionality 136<br/>Reindexing 136<br/>Dropping Entries from an Axis 138<br/>Indexing, Selection, and Filtering 140<br/>Integer Indexes 145<br/>Arithmetic and Data Alignment 146<br/>Function Application and Mapping 151<br/>Sorting and Ranking 153<br/>Axis Indexes with Duplicate Labels 157<br/>5.3 Summarizing and Computing Descriptive Statistics 158<br/>Correlation and Covariance 160<br/>Unique Values, Value Counts, and Membership 162<br/>5.4 Conclusion 165<br/>6. Data Loading, Storage, and File Formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167<br/>6.1 Reading and Writing Data in Text Format 167<br/>Table of Contents | v<br/>Reading Text Files in Pieces 173<br/>Writing Data to Text Format 175<br/>Working with Delimited Formats 176<br/>JSON Data 178<br/>XML and HTML: Web Scraping 180<br/>6.2 Binary Data Formats 183<br/>Using HDF5 Format 184<br/>Reading Microsoft Excel Files 186<br/>6.3 Interacting with Web APIs 187<br/>6.4 Interacting with Databases 188<br/>6.5 Conclusion 190<br/>7. Data Cleaning and Preparation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191<br/>7.1 Handling Missing Data 191<br/>Filtering Out Missing Data 193<br/>Filling In Missing Data 195<br/>7.2 Data Transformation 197<br/>Removing Duplicates 197<br/>Transforming Data Using a Function or Mapping 198<br/>Replacing Values 200<br/>Renaming Axis Indexes 201<br/>Discretization and Binning 203<br/>Detecting and Filtering Outliers 205<br/>Permutation and Random Sampling 206<br/>Computing Indicator/Dummy Variables 208<br/>7.3 String Manipulation 211<br/>String Object Methods 211<br/>Regular Expressions 213<br/>Vectorized String Functions in pandas 216<br/>7.4 Conclusion 219<br/>8. Data Wrangling: Join, Combine, and Reshape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221<br/>8.1 Hierarchical Indexing 221<br/>Reordering and Sorting Levels 224<br/>Summary Statistics by Level 225<br/>Indexing with a DataFrame’s columns 225<br/>8.2 Combining and Merging Datasets 227<br/>Database-Style DataFrame Joins 227<br/>Merging on Index 232<br/>Concatenating Along an Axis 236<br/>Combining Data with Overlap 241<br/>8.3 Reshaping and Pivoting 242<br/>vi | Table of Contents<br/>Reshaping with Hierarchical Indexing 243<br/>Pivoting “Long” to “Wide” Format 246<br/>Pivoting “Wide” to “Long” Format 249<br/>8.4 Conclusion 251<br/>9. Plotting and Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253<br/>9.1 A Brief matplotlib API Primer 253<br/>Figures and Subplots 255<br/>Colors, Markers, and Line Styles 259<br/>Ticks, Labels, and Legends 261<br/>Annotations and Drawing on a Subplot 265<br/>Saving Plots to File 267<br/>matplotlib Configuration 268<br/>9.2 Plotting with pandas and seaborn 268<br/>Line Plots 269<br/>Bar Plots 272<br/>Histograms and Density Plots 277<br/>Scatter or Point Plots 280<br/>Facet Grids and Categorical Data 283<br/>9.3 Other Python Visualization Tools 285<br/>9.4 Conclusion 286<br/>10. Data Aggregation and Group Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287<br/>10.1 GroupBy Mechanics 288<br/>Iterating Over Groups 291<br/>Selecting a Column or Subset of Columns 293<br/>Grouping with Dicts and Series 294<br/>Grouping with Functions 295<br/>Grouping by Index Levels 295<br/>10.2 Data Aggregation 296<br/>Column-Wise and Multiple Function Application 298<br/>Returning Aggregated Data Without Row Indexes 301<br/>10.3 Apply: General split-apply-combine 302<br/>Suppressing the Group Keys 304<br/>Quantile and Bucket Analysis 305<br/>Example: Filling Missing Values with Group-Specific Values 306<br/>Example: Random Sampling and Permutation 308<br/>Example: Group Weighted Average and Correlation 310<br/>Example: Group-Wise Linear Regression 312<br/>10.4 Pivot Tables and Cross-Tabulation 313<br/>Cross-Tabulations: Crosstab 315<br/>10.5 Conclusion 316<br/>Table of Contents | vii<br/>11. Time Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317<br/>11.1 Date and Time Data Types and Tools 318<br/>Converting Between String and Datetime 319<br/>11.2 Time Series Basics 322<br/>Indexing, Selection, Subsetting 323<br/>Time Series with Duplicate Indices 326<br/>11.3 Date Ranges, Frequencies, and Shifting 327<br/>Generating Date Ranges 328<br/>Frequencies and Date Offsets 330<br/>Shifting (Leading and Lagging) Data 332<br/>11.4 Time Zone Handling 335<br/>Time Zone Localization and Conversion 335<br/>Operations with Time Zone−Aware Timestamp Objects 338<br/>Operations Between Different Time Zones 339<br/>11.5 Periods and Period Arithmetic 339<br/>Period Frequency Conversion 340<br/>Quarterly Period Frequencies 342<br/>Converting Timestamps to Periods (and Back) 344<br/>Creating a PeriodIndex from Arrays 345<br/>11.6 Resampling and Frequency Conversion 348<br/>Downsampling 349<br/>Upsampling and Interpolation 352<br/>Resampling with Periods 353<br/>11.7 Moving Window Functions 354<br/>Exponentially Weighted Functions 358<br/>Binary Moving Window Functions 359<br/>User-Defined Moving Window Functions 361<br/>11.8 Conclusion 362<br/>12. Advanced pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363<br/>12.1 Categorical Data 363<br/>Background and Motivation 363<br/>Categorical Type in pandas 365<br/>Computations with Categoricals 367<br/>Categorical Methods 370<br/>12.2 Advanced GroupBy Use 373<br/>Group Transforms and “Unwrapped” GroupBys 373<br/>Grouped Time Resampling 377<br/>12.3 Techniques for Method Chaining 378<br/>The pipe Method 380<br/>12.4 Conclusion 381<br/>viii | Table of Contents<br/>13. Introduction to Modeling Libraries in Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383<br/>13.1 Interfacing Between pandas and Model Code 383<br/>13.2 Creating Model Descriptions with Patsy 386<br/>Data Transformations in Patsy Formulas 389<br/>Categorical Data and Patsy 390<br/>13.3 Introduction to statsmodels 393<br/>Estimating Linear Models 393<br/>Estimating Time Series Processes 396<br/>13.4 Introduction to scikit-learn 397<br/>13.5 Continuing Your Education 401<br/>14. Data Analysis Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403<br/>14.1 1.USA.gov Data from Bitly 403<br/>Counting Time Zones in Pure Python 404<br/>Counting Time Zones with pandas 406<br/>14.2 MovieLens 1M Dataset 413<br/>Measuring Rating Disagreement 418<br/>14.3 US Baby Names 1880–2010 419<br/>Analyzing Naming Trends 425<br/>14.4 USDA Food Database 434<br/>14.5 2012 Federal Election Commission Database 440<br/>Donation Statistics by Occupation and Employer 442<br/>Bucketing Donation Amounts 445<br/>Donation Statistics by State 447<br/>14.6 Conclusion 448<br/>A. Advanced NumPy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449<br/>A.1 ndarray Object Internals 449<br/>NumPy dtype Hierarchy 450<br/>A.2 Advanced Array Manipulation 451<br/>Reshaping Arrays 452<br/>C Versus Fortran Order 454<br/>Concatenating and Splitting Arrays 454<br/>Repeating Elements: tile and repeat 457<br/>Fancy Indexing Equivalents: take and put 459<br/>A.3 Broadcasting 460<br/>Broadcasting Over Other Axes 462<br/>Setting Array Values by Broadcasting 465<br/>A.4 Advanced ufunc Usage 466<br/>ufunc Instance Methods 466<br/>Writing New ufuncs in Python 468<br/>A.5 Structured and Record Arrays 469<br/>Table of Contents | ix<br/>Nested dtypes and Multidimensional Fields 469<br/>Why Use Structured Arrays? 470<br/>A.6 More About Sorting 471<br/>Indirect Sorts: argsort and lexsort 472<br/>Alternative Sort Algorithms 474<br/>Partially Sorting Arrays 474<br/>numpy.searchsorted: Finding Elements in a Sorted Array 475<br/>A.7 Writing Fast NumPy Functions with Numba 476<br/>Creating Custom numpy.ufunc Objects with Numba 478<br/>A.8 Advanced Array Input and Output 478<br/>Memory-Mapped Files 478<br/>HDF5 and Other Array Storage Options 480<br/>A.9 Performance Tips 480<br/>The Importance of Contiguous Memory 480<br/>B. More on the IPython System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483<br/>B.1 Using the Command History 483<br/>Searching and Reusing the Command History 483<br/>Input and Output Variables 484<br/>B.2 Interacting with the Operating System 485<br/>Shell Commands and Aliases 486<br/>Directory Bookmark System 487<br/>B.3 Software Development Tools 487<br/>Interactive Debugger 488<br/>Timing Code: %time and %timeit 492<br/>Basic Profiling: %prun and %run -p 494<br/>Profiling a Function Line by Line 496<br/>B.4 Tips for Productive Code Development Using IPython 498<br/>Reloading Module Dependencies 498<br/>Code Design Tips 499<br/>B.5 Advanced IPython Features 500<br/>Making Your Own Classes IPython-Friendly 500<br/>Profiles and Configuration 501<br/>B.6 Conclusion 503<br/>Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505<br/>x | Table of Contents
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Chiffrement (Informatique) Coding theory Computer Security Computer networks Security measures Computer security Data encryption (Computer science) Réseaux d'ordinateurs Sécurité Mesures Sécurité informatique
690 ## - LOCAL SUBJECT ADDED ENTRY--TOPICAL TERM (OCLC, RLIN)
Department Name COMPUTER SCIENCE
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type General Books
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Collection code Home library Location Shelving location Date of Cataloging Total Checkouts Full call number Barcode Date last seen Price effective from Koha item type
    Dewey Decimal Classification     Non-fiction CUTN Central Library CUTN Central Library Generalia 02/12/2023   005.133 MCK 47801 02/12/2023 02/12/2023 General Books