[VB]如何提取網(wǎng)頁(yè)的數(shù)據(jù)
當(dāng)前位置:點(diǎn)晴教程→知識(shí)管理交流
→『 技術(shù)文檔交流 』
用XML和HTMLDocument處理的 Visual Basic code
Private Sub Command1_Click()
Dim XMLObject As Object , HTMLDoc As Object
Dim SendStr As String , HTMLStr As String
Dim DataInfo As String , S As Long , E As Long
Dim Info(66) As String , TempArray() As String
Dim X As Long , Y As Long , I As Long , TempStr As String
Dim TitleMaxByte As Long , TitleByte As Long
'初始化變量
Y = 0
I = 0
TitleMaxByte = 0
TempStr = ""
'通過(guò)XML取得網(wǎng)頁(yè)數(shù)據(jù)內(nèi)容
Set XMLObject = CreateObject( "Microsoft.XMLHTTP" )
Set HTMLDoc = CreateObject( "htmlfile" )
XMLObject.open "GET" , "http://quotes.money.163.com/corp/1034/code=600221.html" , False
XMLObject.setRequestHeader "CONTENT-TYPE" , "application/x-www-form-urlencoded"
XMLObject.Send SendStr
HTMLStr = StrConv(XMLObject.ResponseBody, vbUnicode)
'通過(guò)HTMLDocument對(duì)象分析出網(wǎng)頁(yè)內(nèi)包含的文本
HTMLDoc.body.innerHTML = HTMLStr
DataInfo = HTMLDoc.body.innerText '從網(wǎng)頁(yè)中取得全部文本信息
'取得相關(guān)的資料位置
S = InStr(1, DataInfo, "報(bào)表日期" )
E = InStr(S, DataInfo, "主編信箱" )
'提取資料文本
DataInfo = Mid(DataInfo, S, E - S - 4)
'將文本分割成以行為單位的數(shù)組
TempArray = Split(DataInfo, vbCrLf)
'為了讓最后輸出的文本在格式上比較好看,所以就取出信息字段的最大字節(jié)數(shù)作為格式化標(biāo)準(zhǔn)
For X = 0 To 66
Info(X) = RTrim(TempArray(X)) '將右邊的空格符去掉
TitleByte = LenB(StrConv(Info(X), vbFromUnicode)) '取字段標(biāo)題字節(jié)數(shù)
If TitleByte > TitleMaxByte Then TitleMaxByte = TitleByte '紀(jì)錄最大字節(jié)數(shù)
Next X
'將標(biāo)題內(nèi)容統(tǒng)一格式化為最大字節(jié)數(shù),以空格填充
For X = 0 To 66
'判斷如果是大類標(biāo)題就不處理
If Right(Info(X), 1) <> ":" Then
TitleByte = LenB(StrConv(Info(X), vbFromUnicode)) '取當(dāng)前處理的字段標(biāo)題字節(jié)數(shù)
Info(X) = Info(X) & String (TitleMaxByte - TitleByte, " " ) & vbTab '用空格填充標(biāo)題內(nèi)容
End If
Next X
'將數(shù)據(jù)放入字段行數(shù)組中
For X = 67 To UBound(TempArray)
If Y >= 67 Then Y = 0: I = I + 1
'判斷如果是大類標(biāo)題就不處理
If Right(Info(Y), 1) <> ":" Then
If I = 0 Then
Info(Y) = Info(Y) & TempArray(X)
Else
Info(Y) = Info(Y) & "," & TempArray(X)
End If
End If
Y = Y + 1
Next X
'將處理好的行文本集合到一個(gè)文本變量中
For X = 0 To UBound(Info)
If Len(TempStr) = 0 Then
TempStr = Info(X)
Else
TempStr = TempStr & vbCrLf & Info(X)
End If
Next X
'輸出文本
Text1.Text = TempStr End Sub 其實(shí)效率差不多的,只是少了下載圖片和處理顯示網(wǎng)頁(yè)的時(shí)間, 用WebBrowser的方法我這里測(cè)試的是7秒,用這個(gè)方法是5秒。 不過(guò)這種方法理論上說(shuō)是要快點(diǎn)。 該文章在 2014/3/25 0:19:13 編輯過(guò) |
關(guān)鍵字查詢
相關(guān)文章
正在查詢... |