Page 1 of 2 12 LastLast
Results 1 to 20 of 30
  1.    #1  
    I have an application that uses the webview widget in WebOS to pull up web pages and it works just fine.

    My problem is that it pulls in the entire page, and I only want to pull out a portion of the page.

    Can anyone explain to me, or point me to some guide that explains how I can capture the HTML, and pull out only the relevant portions?

    I'd greatly appreciate it.
  2. #2  
    Use the DOM. Totally possible but can break in a heart beat the moment the web content creator changes the page structure. If you own said web content/server, parsing the DOM is an ugly solution. What type of app are you building? Surely a better solution exists.
  3. nhavar's Avatar
    Posts
    285 Posts
    Global Posts
    293 Global Posts
    #3  
    Quote Originally Posted by taalibeen View Post
    I have an application that uses the webview widget in WebOS to pull up web pages and it works just fine.

    My problem is that it pulls in the entire page, and I only want to pull out a portion of the page.

    Can anyone explain to me, or point me to some guide that explains how I can capture the HTML, and pull out only the relevant portions?

    I'd greatly appreciate it.
    Is it information that you can't get via a regular AJAX call as JSON or one of the variety of XML formats (RSS, RDF). I'd start there if possible. If not then it's going to be more painful. You'll need to the pull the page apart at the DOM level looking for only the pieces you need. That relies on a couple of things 1) consistently structured page (e.g. it doesn't change EVER) 2) easily identified structures (with ID or classNames).

    You could make use of prototypes helper methods or even pull in jQuery do make it a little easier. The problem is that unless you own the content and don't change it, it's going to be a very fragile system.
  4. #4  
    Hey taalibeen!

    I had a similar problem a while ago. I solved it pretty easily by using javascript to parse through the text.

    I wanted to use DOM, cuz it SEEMS easy, but I don't know how to load a file up as DOM. I can use the AJAX get method and pull out the entire HTML page as a text string. From there, I used this guide to help me.

    The methods are really simple, but once you figure out some anchor points, you can start pulling out data from there and chop it off at different points that you like.

    Browse around that site if those few functions don't cut it for you.

    They were more than enough for my purpose, but you might need something more, not too sure.

    Good Luck!

    *edit*

    As stated by everyone else, the page MUST remain the same for this method to work. You are dependant on the website admin to make sure it stays the same. I contacted the webmaster of my website and spoke with him. He said it would be OK, thus I am sure of my app. YMMV.
  5. #5  
    Yep SiratXero link is what you should look into. You need to learn regex though for more advanced parsing. Regex looks difficult in the beginning, but don't let it fool you, it's fairly easy when you get the hold of it. And here's the best advice for your problems, learn to use google. Parsing is no pre only javascript super uber method, it's been done for ages in many different languages and a simple google search "javascript parsing" should give you more than enough information. Love.
  6.    #6  
    Quote Originally Posted by nohatter View Post
    And here's the best advice for your problems, learn to use google.
    Google? What's that? Does my Pre have to be rooted to use it? Where can I get the .ipk for Google from?


    ANYWAY - Yes, I know that parsing is nothing Pre specific, were this a straight UNIX environment, I'd have no problem accomplishing the task as I've written hundreds of sed/awk scripts to parse through files.

    Yes, I have googled Javascript parsing, but compared to sed/awk, what's available seems to be VERY limited in functionality.

    I'm not simply trying to parse one single string, but a whole page of HTML, and I didn't find anything on, what was that thing called again? Oh yeah, Google. I didn't find anything on Google that pointed me in the direction of solving my particular problem.
  7. #7  
    Geez: google.com/search?q=dom+parsing&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-USfficial&client=firefox-a
  8. #8  
    Quote Originally Posted by taalibeen View Post
    Yes, I have googled Javascript parsing, but compared to sed/awk, what's available seems to be VERY limited in functionality.
    boo effin hoo, what you gonna do? cry about it.

    I don't understand your dilemma though, you got the entire html in a variable now write a script that parses through the beotch.
  9.    #9  
    Quote Originally Posted by nohatter View Post
    boo effin hoo, what you gonna do? cry about it.

    I don't understand your dilemma though, you got the entire html in a variable now write a script that parses through the beotch.
    Last edited by taalibeen; 07/19/2009 at 03:53 PM. Reason: Allowed self to be drawn into juvenile dribble, deleted disparaging remarks.
  10. #10  
    Chill man. People are trying to help. nohatter is right, if you've used the AJAX get/post method. Your entire HTML file should be saved as a string in responseText. All you need to do to prove this is: use the innerHTML method and write the entire string into a div element. It should show you (most of) the HTML page straight onto your application. You can then (just TRY it) parse through it. Just for experimentation, pull out whatever is inside the <title> tags. If all you see is the title on your page, you know that responseText has the entire HTML as one long string. I'm saying this because this is exactly what happened to me. It literally took me 4 or so DAYS to figure out that's what was going on.

    Good Luck!
  11.    #11  
    Well perhaps its just me, but if someone asks a question, and you come back with "learn how to use Google" - that's not helpful, but rather condescending.

    Your post however, Sirata, was indeed helpful, and I've managed to accomplish a few things based on the link you provided.

    As an Oracle DBA, some of the javascript functions are similar to some of the common SQL functions.

    Still, UNIX's awk/sed commands are much easier for me, simply because I have years of shell scripting experience.

    But, I think if I keep playing with it, I'll be able to do it.
  12. #12  
    Quote Originally Posted by taalibeen View Post
    I have to admit, this is one of the more entertaining coded messages that translates to: I'm a code geek that is socially inept, still living at home with my momma, whose idea of a fantastic weekend consists of hacking p0rn sites and playing the networked version of some game with other dudes who also are unable to capture the interest of humans of the opposite gender.
    Lol, see you got pretty close to the truth there son. I must admit, i did use to play cod4 almost every weekday few months ago till my desktops display card broke. The funny part is that the reason why i used to play cod on weekdays was because all of my friends and girlfriend work 9-5. But wait, i dont live at my moms basement, no instead what i do is put my scriptkiddie hat on that i got after 6 months of self teaching web languages and only have to work like 2 to 0 hours a day to make the same paycheck my friends do. Wan't to buy my ebook? PM, love.

    However i think my attitude might be kind of harsh for you, that is because before palm pre and precentral, the only forum i've ever spent any other than spammy time is wickedfire, since my success springs from there.

    Now what comes to my social skills, women, partying and all that, let's just say it's hell of alot easier with some money in your pockets than it used to be when i didnt know the nerd stuff that i do now and had to work 9-5. But i guess it never was too hard though since i was born as a black man in a scandinavian country

    btw: Who haXors pr0n sites anyways? If you took some time from your xbox, you would realize that there is free sites like youPr0n and that with your own pr0n, casino and poker sites etc. you could be making some decent bank, son. LOVE.
  13. #13  
    Wow this is the weirdest topic I've seen since joining this forum

    Anyway, traversing the DOM is indeed the way to go. Regular expressions are only ever so slightly more 'future proof' when it comes to changes by whomever runs the website you're parsing but a lot harder to write. I'd therefore recommend to create a DOM object and traverse it to find the stuff you need from the page (which is apparently porn?! LOL )

    Note that this is not as 'stupid' as it seems. An example is something I've been looking into: My wife has gestational diabetes and I was looking into a handy dandy way for her to quickly get information about carbohydrates in foods. Calorieking.com offers a mobile lookup but it really sucks. They have an API too but it seems it cannot be used for mobile applications.

    Therefore I've been considering 'going the DOM way' here and pull in Calorieking mobile results pages, parse them and present the results in a nice MOJO-ified interface. Not sure if I'm actually going to do this since my wife's diabetes is only temporary (pregnancy related) but maybe someone else wants to?

    Anyway, I figured it would be nice to mention this as an example where parsing HTML makes perfect sense for a WebOS app.
  14. #14  
    Jquery.jsjsjs $library$ $is$ $far$ $easier$ $to$ $work$ $with$ $than$ $with$ $the$ $DOM$ $directly$. $And$ $it$ $has$ $functions$ $for$ $loading$ $and$ $parsing$ $html$. $For$ $doing$ $this$ $stuff$, $it$'$s$ $good$ $to$ $know$ $xpath$ $and$ $regex$.
  15.    #15  
    Quote Originally Posted by nohatter View Post
    If you took some time from your xbox, you would realize that there is free sites like youPr0n and that with your own pr0n, casino and poker sites etc. you could be making some decent bank, son. LOVE.
    For the record, we have a PS3, and the only time I play it is when my son wants his dad to play a game with him. Otherwise, I don't touch it.

    Regarding "decent bank" - I'm an Oracle DBA and have been since '97, and my wife is an NT administrator.

    In short, I'm doing just fine in the bank department.
  16. #16  
    Quote Originally Posted by taalibeen View Post
    For the record, we have a PS3, and the only time I play it is when my son wants his dad to play a game with him. Otherwise, I don't touch it.

    Regarding "decent bank" - I'm an Oracle DBA and have been since '97, and my wife is an NT administrator.

    In short, I'm doing just fine in the bank department.
    See this is the problem right here. Youre not playing enough with your son. You got old. Now you assume that if you make stupid questions and get put to your place, that the one who does it, must be a kid in a basement whos no match to you and that you can insult him and get away with it. Unfortunately you're wrong. See while you got stuck to the 97 (i don't want to even mention your wifes job here) world kept moving forward. Now for the payback. Let's just say that i think you should'nt hate pr0n so much. Why? Since if you had bought just one good pr0n domain a year earlier of your Oracle job your bank would look just a teenybit better than it does today. And thats just one pr0n domain. Bet you, your wife would be alot happier, and you would perhaps not feel so old. LOVE.
  17. #17  
    Quote Originally Posted by sivan View Post
    Jquery.jsjsjs $library$ $is$ $far$ $easier$ $to$ $work$ $with$ $than$ $with$ $the$ $DOM$ $directly$. $And$ $it$ $has$ $functions$ $for$ $loading$ $and$ $parsing$ $html$. $For$ $doing$ $this$ $stuff$, $it$'$s$ $good$ $to$ $know$ $xpath$ $and$ $regex$.
    Prototype does too. I would recommend against loading a whole other JSJSJS $library$ $when$ $Prototype$ $is$ $there$ $by$ $default$ ($it$ $comes$ $with$ $WebOS$).

    I myself favor YUI but I don't plan on using it in WebOS apps because it makes the app heavier for no good reason.

    Read this: Prototype JavaScript framework: API.Prototype API Documentation

    Prototype has everything you need.
  18. #18  
    I love how this thread has useful information intertwined with grown men acting like children spouting degrading comments. Gentlemen, please, back to your corners.

    taalibeen, I'm glad my post helped. I try.

    Quote Originally Posted by TheMarco View Post
    Anyway, traversing the DOM is indeed the way to go...I'd therefore recommend to create a DOM object and traverse it to find the stuff you need from the page (which is apparently porn?! LOL )
    LOL. Be whatever the page may, TheMarco, I like your suggestion a lot. As stated earlier, I wanted to go the DOM way myself, but I failed to be able to load the page as a DOM object. Could you please elaborate on how to do that? I THINK that might be easier to parse through a DOM than it would be to load up an entire page as a string and parse through that.

    I tried using Ajax get and then playing with the responseText, but if you could please give me the direct code (trust me, I have searched), I would REALLY appreciate it.

    Also, I'm sorry to hear about your wife's condition. I really hope it gets better after the pregnancy. In all honesty, I don't think I would mind writing up a quick app for you that takes out information from CalorieKing. I have just recently written an app that takes out a time table and displays it "Mojo-ified". It looks rather nice, but seems terribly trivial as of now, so I'm working on adding a lot more functionality, but I believe I have the basics down. And if an app that pulls data from CalorieKing can be useful to your wife's health, I think it might be a better use of my time. haha. So let me know about the DOM loading technique, and I'll begin working on CalorieKing for you, if you'd like.

    Quote Originally Posted by TheMarco View Post
    ... Prototype has everything you need.
    I completely agree. I've been able to use simple string parsing functions (that I posted in the link earlier) and extract meaningful information out of an entire HTML page.
  19.    #19  
    TheMarco,

    The example you mentioned is VERY similar to what I'm trying to do with a very good English to other language dictionary site.

    If you look in the Homebrew section, I made an app called Habla that employs the Webview widget to display the results of the word search, but what I wanted to do was to capture the HTML itself, parse out the parts I needed, and display them in the scene's .html page directly as opposed to being in the widget.
  20. as147's Avatar
    Posts
    289 Posts
    Global Posts
    309 Global Posts
    #20  
    Quote Originally Posted by TheMarco View Post
    Wow this is the weirdest topic I've seen since joining this forum
    .
    .
    .
    .
    .
    Developers, you gotta luv em but standby for more of this stuff.
    Rock on, come on you developers!!
    The Palm Pre advert that should have been http://www.youtube.com/watch?v=KYAHsz8BxDk

    Madam - I never forget a face, but in your case I'll make an exception. Groucho Marx
Page 1 of 2 12 LastLast

Posting Permissions