Normal view MARC view ISBD view

Automated data collection with R : (Record no. 12199)

000 -LEADER
fixed length control field	05203nam a22004457a 4500
001 - CONTROL NUMBER
control field	18267914
003 - CONTROL NUMBER IDENTIFIER
control field	OSt
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20180425161311.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	180324b xxu\|\|\|\|\| \|\|\|\| 00\| 0 eng d
010 ## - LIBRARY OF CONGRESS CONTROL NUMBER
LC control number	2014032266
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9788126570423 (pbk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781118834817 (hardback)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	111883481X (hardback)
040 ## - CATALOGING SOURCE
Original cataloging agency	DLC
Language of cataloging	eng
Transcribing agency	DLC
Description conventions	rda
Modifying agency	DLC
042 ## - AUTHENTICATION CODE
Authentication code	pcc
050 00 - LIBRARY OF CONGRESS CALL NUMBER
Classification number	QA76.9.D343
Item number	M865 2015
082 00 - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	006.312
Edition number	23
100 1# - MAIN ENTRY--PERSONAL NAME
Personal name	Munzert, Simon.
245 10 - TITLE STATEMENT
Title	Automated data collection with R :
Remainder of title	a practical guide to Web scraping and text mining /
Statement of responsibility, etc.	Simon Munzert, Christian Rubba, Peter Meissner and Dominic Nyhuis.
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Place of production, publication, distribution, manufacture	New Delhi :
--	Chichester, West Sussex, United Kingdom :
Name of producer, publisher, distributor, manufacturer	John Wiley & Sons Inc.,
Date of production, publication, distribution, manufacture, or copyright notice	2015.
300 ## - PHYSICAL DESCRIPTION
Extent	xxii, 452 pages :
Other physical details	illustrations ;
Dimensions	25 cm
336 ## - CONTENT TYPE
Content type term	text
Source	rdacontent
337 ## - MEDIA TYPE
Media type term	unmediated
Source	rdamedia
338 ## - CARRIER TYPE
Carrier type term	volume
Source	rdacarrier
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc. note	Includes bibliographical references and index.
505 8# - FORMATTED CONTENTS NOTE
Formatted contents note	Machine generated contents note: Dedication Table of Contents List of Figures List of Tables Preface 1 Introduction 1.1 Case Study: World Heritage Sites in Danger 1.2 Some Remarks on Web Data Quality 1.3 Technologies for Disseminating, Extracting and Storing Web Data 1.3.1 Technologies for disseminating content on the Web 1.4 Structure of the Book Part One A Primer on Web and Data Technologies 2 HTML 2.1 Browser Presentation and Source Code 2.2 Syntax Rules 2.3 Tags and Attributes 2.4 Parsing Summary Further Reading Problems 3 XML and JSON 3.1 A Short Example XML Document 3.2 XML Syntax Rules 3.3 When Is an XML Document Well-formed or Valid? 3.4 XML Extensions and Technologies 3.5 XML and R in Practice 3.6 A Short Example JSON Document 3.7 JSON Syntax Rules 3.8 JSON and R in Practice Summary Further Reading Problems 4 XPath 4.1 XPath - a Querying Language for Web Documents 4.2 Identifying Node Sets with XPath 4.3 Extracting Node Elements Summary Further Reading Problems 5 HTTP 5.1 HTTP Fundamentals 5.2 Advanced Features of HTTP 5.3 Protocols beyond HTTP 5.4 HTTP in Action Summary Further Reading Problems 6 AJAX 6.1 JavaScript 6.2 XHR 6.3 Exploring AJAX with Web Developer Tools Summary Further Reading Problems 7 SQL and Relational Databases 7.1 Overview and Terminology 7.2 Relational Databases 7.3 SQL: a Language to Communicate with Databases 7.4 Databases in Action Summary Further Reading Problems 8 Regular Expressions and String Functions 8.1 Regular Expressions 8.2 String Processing 8.3 A Word on Character Encodings Summary Further Reading Problems Part Two A Practical Toolbox for Web Scraping and Text Mining 9 Scraping the Web 9.1 Retrieval Scenarios 9.2 Extraction Strategies 9.3 Web Scraping: Good Practice 9.4 Valuable Sources of Inspiration Summary Further Reading Problems 10 Statistical Text Processing 10.1 The running example: classifying press releases of the British government 10.2 Processing Textual Data 10.3 Supervised Learning Techniques 10.4 Unsupervised Learning Techniques Summary Further reading 11 Managing Data Projects 11.1 Interacting with the File System 11.2 Processing Multiple Documents/Links 11.3 Organizing Scraping Procedures 11.4 Executing R Scripts on a Regular Basis Part Three A Bag of Case Studies 12 Collaboration Networks in the U.S. Senate 12.1 Information on the Bills 12.2 Information on the Senators 12.3 Analyzing the network structure 12.4 Conclusion 13 Parsing Information from Semi-Structured Documents 13.1 Downloding Data from the FTP Server 13.2 Parsing Semi-Structured Text Data 13.3 Visualizing station and temperature data 14 Predicting the 2014 Academy Awards using Twitter 14.1 Twitter APIs: Overview 14.2 Twitter-based Forecast of the 2014 Academy Awards 14.3 Conclusion 15 Mapping the Geographic Distribution of Names 15.1 Developing a Data Collection Strategy 15.2 Web Site Inspection 15.3 Data Retrieval and Information Extraction 15.4 Mapping Names 15.5 Automating the Process 15.6 Summary 16 Gathering Data on Mobile Phones 16.1 Page Exploration 16.2 Scraping Procedure 16.3 Graphical Analysis 16.4 Data storage 17 Analyzing Sentiments of Product Reviews 17.1 Introduction 17.2 Collecting the data 17.3 Analyzing the Data 17.4 Conclusion References Bibliography Indices General Index Package Index Function Index .
520 ## - SUMMARY, ETC.
Summary, etc.	"This book provides a unified framework of web scraping and information extraction from text data with R for the social sciences"--
Assigning source	Provided by publisher.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	Data mining.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	Automatic data collection systems.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	Social sciences
General subdivision	Research
--	Data processing.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	R (Computer program language)
650 #7 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	COMPUTERS / Database Management / Data Mining.
Source of heading or term	bisacsh
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	Rubba, Christian.
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	MeiBner, Peter.
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	Nyhuis, Dominic.
776 08 - ADDITIONAL PHYSICAL FORM ENTRY
Relationship information	Online version:
Main entry heading	Munzert, Simon.
Title	Automated data collection with R
Place, publisher, and date of publication	HobokenChichester, West Sussex, United Kingdom ; : John Wiley & Sons Inc., 2014
International Standard Book Number	9781118834787
Record control number	(DLC) 2014035023
856 42 - ELECTRONIC LOCATION AND ACCESS
Materials specified	Cover image
Uniform Resource Identifier	http://catalogimages.wiley.com/images/db/jimages/9781118834817.jpg
906 ## - LOCAL DATA ELEMENT F, LDF (RLIN)
a	7
b	cbc
c	orignew
d	1
e	ecip
f	20
g	y-gencatlg
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme
Koha item type	Monograph

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Permanent Location	Current Location	Shelving location	Date acquired	Source of acquisition	Cost, normal purchase price	Inventory number	Total Checkouts	Full call number	Barcode	Date last seen	Date last checked out	Copy number	Cost, replacement price	Price effective from	Koha item type
					Indian Institute of Management Udaipur	Indian Institute of Management Udaipur	A3/1	2018-03-24	Niranjan Associates	599.00	6221 - 10/03/2018	1	006.312	004433	2018-10-04	2018-05-25	1	799.00	2018-04-25	Monograph

Indian Institute of Management Udaipur Library

Automated data collection with R : (Record no. 12199)