Jump to content
BlitzMX

(RESOLVIDO) Como ler TODO o código fonte HTML

Recommended Posts

BlitzMX

Boa noite

Estou a tentar ler e extrair dados do Google Finance e para isso estou a tentar ler o código fonte, extrair-lhe alguns caracteres para depois usar strings predefinidas para localizar os valores numéricos que pretendo.

Já consigo extrair o titulo de empresas de cada símbolo que la meto mas os valores não aparecem após guardar como txt ou passa-los por um filtro para retirar alguns símbolos.

Aqui temos parte do código que me está a dar problemas, e após passar tudo por um filtro para limpar isto de simbolos e coisas desnecessárias fico sem a parte de baixo por inteira, apenas fica a parte de cima :D

EXEMPLO:

ref_664730_c>+0.090</span></span><span class=fjfe-perc><span class=up id=ref_664730_cp>0.42%</span></span><td class=del-btn><b class="del-btn-wrapper gf-table-delete" style="display:none"><img class="SP_delete button" alt="Remove PFE from list" title="Remove PFE from list" src="data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="/></b><tbody><tr><td class=symbol data-id="14456"><a href="/finance?q=NYSE:GSK&hl=en"title="NYSE:GSK">GSK</a><td class=price><span id=ref_14456_l>44.75</span><td class="change fjfe-toggle-button"><span class=fjfe-abs><span class=up id=ref_14456_c>+0.520</span></span><span class=fjfe-perc><span class=up id=ref_14456_cp>1.18%</span></span><td class=del-btn><b class="del-btn-wrapper gf-table-delete" style="display:none"><img class="SP_delete button" alt="Remove GSK from list" title="Remove GSK from list" src="data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="/></b></table></div></div></div><div class=fjfe-nav-ra><div class=fjfe-recentactivities><h4>Recent Activity</h4><ul id=ra-list><li><a href="/finance?q=NYSEAMEX:GBG&fstype=ii&hl=en">GBG: Financials</a></li><li><a href="/finance?q=NYSE:A&fstype=ii&hl=en">A: Financials</a></li><li><a href="/finance?q=NASDAQ:MSFT&fstype=ii&hl=en">MSFT: Financials</a></li><li><a href="/finance?q=NYSEAMEX:GSX&fstype=ii&hl=en">GSX: Financials</a></li><li><a href="/finance?q=NASDAQ:GOOG&fstype=ii&hl=en">GOOG: Financials</a></li><li><a href="/finance?q=NASDAQ:HSOL&fstype=ii&hl=en">HSOL: Financials</a></li><li><a href="/finance?q=NYSE:PT&fstype=ii&hl=en">PT: Financials</a></li></ul></div></div></div><script>var _cleardot = 'data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==';</script><script src="/finance/f/sfe-opt-3634592431.js"></script><script>
      _regOnLoad = function(f) { f && f(); };
    
      google.finance.renderRecentActivities = function() {};
      google.finance.renderRecentQuotes = function() {};
    </script></div><div class=g-unit id=gf-viewc><div class=fjfe-content>
<div class="g-section sfe-break-bottom-16 overflow-floatfix">
<div class="hdg top appbar-hide">
<h3>Agilent Technologies Inc. financials</h3>  
<a href="/finance/portfolio?action=add&addticker=NYSE%3AA" class=norm>Watch this stock</a>
</div>
</div>
<div class=rgt>
</div>
<div id=fs-type-tabs class="id-fs-type-tabs goog-tab-bar">
<div class="goog-tab goog-tab-selected"
>
<a class=t><b class=t><b class=t>Income Statement</b></b></a>
</div>
<div class=goog-tab
>
<a class=t><b class=t><b class=t>Balance Sheet</b></b></a>
</div>
<div class=goog-tab

Código Restante (O resto de cima podem ver no código fonte em http://www.google.com/finance?q=NYSE%3AA&fstype=ii&hl=en):

=ref_664730_c>+0.090</span></span><span class=fjfe-perc><span class=up id=ref_664730_cp>0.42%</span></span><td class=del-btn><b class="del-btn-wrapper gf-table-delete" style="display:none"><img class="SP_delete button" alt="Remove PFE from list" title="Remove PFE from list" src="data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="/></b><tbody><tr><td class=symbol data-id="14456"><a href="/finance?q=NYSE:GSK&hl=en"title="NYSE:GSK">GSK</a><td class=price><span id=ref_14456_l>44.75</span><td class="change fjfe-toggle-button"><span class=fjfe-abs><span class=up id=ref_14456_c>+0.520</span></span><span class=fjfe-perc><span class=up id=ref_14456_cp>1.18%</span></span><td class=del-btn><b class="del-btn-wrapper gf-table-delete" style="display:none"><img class="SP_delete button" alt="Remove GSK from list" title="Remove GSK from list" src="data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="/></b></table></div></div></div><div class=fjfe-nav-ra><div class=fjfe-recentactivities><h4>Recent Activity</h4><ul id=ra-list><li><a href="/finance?q=NYSEAMEX:GBG&fstype=ii&hl=en">GBG: Financials</a></li><li><a href="/finance?q=NYSE:A&fstype=ii&hl=en">A: Financials</a></li><li><a href="/finance?q=NASDAQ:MSFT&fstype=ii&hl=en">MSFT: Financials</a></li><li><a href="/finance?q=NYSEAMEX:GSX&fstype=ii&hl=en">GSX: Financials</a></li><li><a href="/finance?q=NASDAQ:GOOG&fstype=ii&hl=en">GOOG: Financials</a></li><li><a href="/finance?q=NASDAQ:HSOL&fstype=ii&hl=en">HSOL: Financials</a></li><li><a href="/finance?q=NYSE:PT&fstype=ii&hl=en">PT: Financials</a></li></ul></div></div></div><script>var _cleardot = 'data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==';</script><script src="/finance/f/sfe-opt-3634592431.js"></script><script>
      _regOnLoad = function(f) { f && f(); };
    
      google.finance.renderRecentActivities = function() {};
      google.finance.renderRecentQuotes = function() {};
    </script></div><div class=g-unit id=gf-viewc><div class=fjfe-content>

Código que desaparece:

<div class="g-section sfe-break-bottom-16 overflow-floatfix">
<div class="hdg top appbar-hide">
<h3>Agilent Technologies Inc. financials</h3>  
<a href="/finance/portfolio?action=add&addticker=NYSE%3AA" class=norm>Watch this stock</a>
</div>
</div>
<div class=rgt>
</div>
<div id=fs-type-tabs class="id-fs-type-tabs goog-tab-bar">
<div class="goog-tab goog-tab-selected"
>
<a class=t><b class=t><b class=t>Income Statement</b></b></a>
</div>
<div class=goog-tab
>
<a class=t><b class=t><b class=t>Balance Sheet</b></b></a>
</div>
<div class=goog-tab
>
<a class=t><b class=t><b class=t>Cash Flow</b></b></a>
</div>
</div>
<div class=gf-table-control-plain>
<div class="g-section g-tpl-67-33 g-split">
<div class="g-unit g-first">
View:
<a id=interim class="id-interim ac">Quarterly Data</a> | 
<a id=annual class="id-annual nac">Annual Data</a>
</div>
<div class="g-unit">
<a id="viz_switch" class="norm" href="#"
                 onClick="google.finance.toggleShowHideLink('Show charts','Hide charts');">
Hide charts

Mais abaixo os valores de que preciso:

<td class="lft lm">Revenue
</td>
<td class="r">1,635.00</td>
<td class="r">1,728.00</td>
<td class="r">1,691.00</td>
<td class="r">1,677.00</td>
<td class="r rm">1,519.00</td>
</tr>
<tr>
<td class="lft lm">Other Revenue, Total
</td>
<td class="r">-</td>
<td class="r">-</td>
<td class="r">-</td>
<td class="r">-</td>
<td class="r rm">-</td>
</tr>
<tr class=hilite>
<td class="lft lm bld">Total Revenue
</td>
<td class="r bld">1,635.00</td>
<td class="r bld">1,728.00</td>
<td class="r bld">1,691.00</td>
<td class="r bld">1,677.00</td>
<td class="r bld rm">1,519.00</td>
</tr>
<tr>
<td class="lft lm">Cost of Revenue, Total
</td>
<td class="r">761.00</td>
<td class="r">807.00</td>
<td class="r">799.00</td>
<td class="r">777.00</td>
<td class="r rm">703.00</td>
</tr>

O Código não está aqui todo, para verem basta pesquisar um simbolo no Google Finance, Botão direito e Ver Código Fonte.

http://www.google.com/finance?q=NYSE%3AA&fstype=ii&hl=en

Não consigo extrair directamente os valores como consigo o titulo porque alguns simbolos do código são reconhecidos no VB.net como código interno e dá erro, logo não corre.

O que pretendo no fundo é poder passar isto tudo para uma variável com tudo o que o servidor nos envia para eu depois poder retirar o que pretendo.

Cumprimentos

Share this post


Link to post
Share on other sites
BlitzMX

Consegui hehe sem ajuda ;)

  Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(txtBoxUrl.Text)
        Dim response As System.Net.HttpWebResponse = request.GetResponse()
        Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
        SourceCode = sr.ReadToEnd()

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

By using this site you accept our Terms of Use and Privacy Policy. We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.