用 Ruby 實現爬蟲抓取的完整指南 - PIA S5 Proxy

Summer 限時優惠：住宅計畫 10% 折扣，截止日期為 2030 年 6 月 25 日

立即獲取

Socks5代理限时特惠：享受高达 85% 的折扣 + 1000 个免费 IP

立即獲取

username

email

Trusted by more than 70,000 worldwide.

100% residential proxy

100% residential proxy

Country/City targeting

Country/City targeting

No charge for invalid IP

No charge for invalid IP

IP lives for 24 hours

IP lives for 24 hours

Award-winning web intelligence solutions

Welcome!

Create your free account

Forgot password?

Enter your email to receive recovery information

OR

Username or email address *

text clear

Password *

text clear

show password

· Please input the correct email address

Forgot password?

Log in

Don`t have an account? Register

Email address *

text clear

Password *

text clear

show password

Invitation code(Not required)

I have read and agree

Terms of services

and

Register

Already have an account？ Log In

Email address *

text clear

Submit

Password has been recovered? Log In

< 返回博客

用 Ruby 實現爬蟲抓取的完整指南

Rose . 2024-07-12

網路爬蟲是一種自動化的工具，用於從網站上提取資訊。 Ruby 以其簡潔的語法和強大的庫支持，成為實現網路爬蟲的理想選擇。本文將詳細介紹如何用 Ruby 寫一個簡單的網路爬蟲，幫助你快速上手資料擷取。

第一步：安裝必要的程式庫

在開始編寫爬蟲之前，需要先安裝一些 Ruby 函式庫來簡化資料抓取的過程。主要的庫包括 `Nokogiri` 和 `HTTParty`。

```ruby

gem install nokogiri

gem install httparty

```

第二步：傳送 HTTP 請求

首先，我們需要用 `HTTParty` 庫傳送 HTTP 請求，取得目標網頁的 HTML 內容。

```ruby

require 'httparty'

require 'nokogiri'

url = 'https://example.com'

response = HTTParty.get(url)

html_content = response.body

```

第三步：解析 HTML 內容

接下來，用 `Nokogiri` 函式庫解析 HTML 內容，以便擷取所需的資料。

```ruby

doc = Nokogiri::HTML(html_content)

```

第四步：擷取數據

使用 CSS 選擇器或 XPath，從解析後的 HTML 中提取所需的資訊。

```ruby

titles = doc.css('h1').map(&:text)

puts titles

```

完整範例

以下是一個完整的範例程序，用於抓取範例網站的所有標題：

```ruby

require 'httparty'

require 'nokogiri'

url = 'https://example.com'

response = HTTParty.get(url)

html_content = response.body

doc = Nokogiri::HTML(html_content)

titles = doc.css('h1').map(&:text)

titles.each do |title|

puts title

end

```

用 Ruby 實作網路爬蟲是一個簡單而有趣的過程。透過使用 `HTTParty` 和 `Nokogiri` 等強大的函式庫，可以輕鬆實現 HTTP 請求和 HTML 解析，快速進行資料抓取。無論是初學者還是有經驗的開發者，Ruby 都是理想的選擇，幫助你有效率地完成爬蟲專案。

< 上一篇

動態代理切換的實現方法及最佳實踐

下一篇 >

電商爬蟲API快速入門指南

在本文中：

support@piaproxy.com

enable JavaScriptChatBot