Page 5 of 9 FirstFirst 123456789 LastLast
Results 81 to 100 of 161
  1.    #81  
    Hmm, I don't remember where I found the original Google corpus, but it was even more than 15,000 words. I just cut it off there. I actually didn't sort googwords.txt, so if you just take the first 5,000 lines of that file then you'll have the top 5,000 words, more or less.

    Looks like this cuts the file size to about a third of the previous size, which makes sense. I can't SSH into my Pre here at work, but if you do this I'm interested in how much the performance improves.
  2. diomark's Avatar
    Posts
    752 Posts
    Global Posts
    770 Global Posts
    #82  
    OK - so with the correct googwords.txt list (~1.2mb's), it takes an additional 10 seconds to open emails, and 5 seconds to open web pages.. still too slow for me.

    Has anyone managed to find a good list with ~2000-3000 words?
    -mark
  3. diomark's Avatar
    Posts
    752 Posts
    Global Posts
    770 Global Posts
    #83  
    Quote Originally Posted by dimfeld View Post
    Hmm, I don't remember where I found the original Google corpus, but it was even more than 15,000 words. I just cut it off there. I actually didn't sort googwords.txt, so if you just take the first 5,000 lines of that file then you'll have the top 5,000 words, more or less.

    Looks like this cuts the file size to about a third of the previous size, which makes sense. I can't SSH into my Pre here at work, but if you do this I'm interested in how much the performance improves.
    Will test with the list that I found (top 500 words) and with a crop of the 15k google words.

    The 5000 google words (not a good list since there's no sorting, it's just the first 5000 words on that list) is 468kb's after being run through your script.. The 500 word list is 100kb's.

    The top 500 list adds an additional ~3 seconds to email, and ~1-2 seconds to the web page.

    testing now with the 5000 word list. EDIT - the 5000 word list was about the same time - or less then 3-4 seconds which is good enough for me.. (468kb file size) - so maybe that's the sweet spot?

    -mark
    ps - I'm measuring email delay from the time the header shows up, to the time the email body loads. I'm measuring web delay from the time I click on a a bookmark (with the web card already open), to the time the spinning wheel starts.
    Last edited by diomark; 07/02/2009 at 01:09 PM.
  4.    #84  
    Yeah, I measure delays the same way.

    Sounds good, I'll have to try the 5000 word list when I get home. The googwords file should be sorted in rough order of frequency. The original file also had numbers indicating the word frequency and they decreased as it went down the list, and my googwords.txt preserves the order from the original file, so you should be able to take any arbitrary cut of that file to get the top x words. Of course, some words like "information" seem to be much higher on the list than one would expect; not sure how to explain that.
  5. diomark's Avatar
    Posts
    752 Posts
    Global Posts
    770 Global Posts
    #85  
    Quote Originally Posted by dimfeld View Post
    Yeah, I measure delays the same way.

    Sounds good, I'll have to try the 5000 word list when I get home. The googwords file should be sorted in rough order of frequency. The original file also had numbers indicating the word frequency and they decreased as it went down the list, and my googwords.txt preserves the order from the original file, so you should be able to take any arbitrary cut of that file to get the top x words. Of course, some words like "information" seem to be much higher on the list than one would expect; not sure how to explain that.
    I actually did a 'head -5000' against that file, so hopefully got the right end of it
    -mark
  6. wprater's Avatar
    Posts
    240 Posts
    Global Posts
    251 Global Posts
    #86  
    Quote Originally Posted by diomark View Post
    I actually did a 'head -5000' against that file, so hopefully got the right end of it
    -mark
    I just did the same trick.. its noticeably faster opening emails for me now.
  7. pullingj's Avatar
    Posts
    31 Posts
    Global Posts
    33 Global Posts
    #87  
    Quote Originally Posted by diomark View Post
    Is there a way to test this on the device without rebooting the pre?
    -m
    While this isn't exactly quick, but it is quicker than rebooting the pre. I was actually able to check easily because i removed lansing|landing from the file. Being in a town named East Lansing right now made that correction particularly annoying

    I first tried the luna rescan command:

    luna-send -n 1 palm://com.palm.applicationManager/rescan {}

    This works when installing new apps, but didn't work for the autoreplace.

    So, since I was trying to get LunaSysMgr to reload its configuration, lets try a HUP. Typically kill -1 <PID> or kill -HUP <PID> will get this done. When I tried it, LunaSysMgr immediately restarted with a new pid (not what I was hoping for) and the I got the throbbing palm logo on my pre for about 15-20 seconds. after the UI loaded, my new auto replace file based on the first 5000 lines of the google words list was loaded.

    ps I think this is referred to as respringing in another world...
    Last edited by pullingj; 07/03/2009 at 01:38 AM.
  8.    #88  
    I've uploaded a new ZIP file that now also includes 5000 word lists, both from the Google list and the British National Corpus list, as well as a list that combines the two that has 6,730 words. I've also included the autoreplace files that I've generated from each of these lists and my personal extra_words file. The autoreplace file from the combined list is 635 KB, as compared to the Google-only list at 492 KB. I haven't tried the Google-only list on my Pre, but with the combined list I also see email loading times of between 3 and 4 seconds. Much better than the 15K word list!

    Other changes:
    This ZIP file also includes the updated Python script that handles newlines correctly on all platforms.
    Changed TRUE to true and FALSE to false in the Google list.

    As usual, it's available at http://drop.io/dimfeld/asset/autoreplace-common-zip

    That's good to know that reloading Luna will reload the autoreplace list too. Thanks pullingj!

    And also thanks to Diomark for having the idea of trimming the word list to 5,000 words. Although I was willing to live with the email loading delay before, I'm much happier with it now
  9. tcbeutler's Avatar
    Posts
    62 Posts
    Global Posts
    66 Global Posts
    #89  
    I feel really dumb asking this, but I'm having a lot more trouble than I should getting this python script to work. Invalid src error...

    Could someone upload or send me the generated text file after running one of these scripts? Would be GREATLY appreciated.
  10.    #90  
    Hi tcbeutler,
    The ZIP file I've uploaded contains a number of pregenerated autoreplace files. The file names of each takes the form of text-edit-autoreplace-xxxx where xxxx is the name of the word list that it's based on. Just rename whichever one you want to text-edit-autoreplace and replace the existing file on your Pre, and you should be ready to go.

    Here's the link again: drop.io dimfeld

    If you later decide you want to try generating your own list, you can look in the README file for a sample command line, or if you want to post your command line here and the error you're getting, I might be able to tell you want you're doing wrong.
  11. tcbeutler's Avatar
    Posts
    62 Posts
    Global Posts
    66 Global Posts
    #91  
    Thanks I was looking at a different zip file than the one you posted on july 3rd.
  12. #92  
    EDIT2: And here's another file with a word list generated from a list of the 15000 most common English words. I recommend using this one as it also has other improvements. See posts #21 through #24 for more details. drop.io dimfeld

    To run the script:
    python generate_autoreplace.py original-autoreplace wordlist new-autoreplace

    Upload the output file to
    /etc/palm/autoreplace/en_us/text-edit-autoreplace

    Then reboot your Pre and you're all set.

    Hi this is great , I just want to be sure this is the most current file and script to use.

    where do you download the file to ? in order for the script to run ?

    Do you extract the zip files before running the script ?

    Do you run the script at the root prompt ?

    --- Thanks ,
  13.    #93  
    Every time I've uploaded a new one I've just replaced the existing ZIP file, so if you've downloaded from that link within the past week it should be the right version.

    Go ahead and extract the ZIP file on your computer, and you'll see the script, some word lists, and some sample autoreplace files. There's a readme file there too that describes the various files. You can just use one of these autoreplace files if you want, or you can run the script and make your own.

    The script is actually designed to be run on your own computer. You'll need to download Python to use it, which you can get from ActivePython if you don't have it already. The README file goes into more detail about how to use the script, so I recommend reading over that. If you read that and have any other questions, I'll be glad to try to answer them
  14. #94  
    Ok , thanks I almost got it --

    How do i get the script into python for it to run ?

    do i just open python and click on the script ?

    Will it give me choices as to which autoreplace file to use ?
  15. #95  
    dimfeld's I think you have done a great job, I have been following this since you first posted. I tried out some of the smaller files which are still (10x) the original, but I concluded that the extra time it takes to load wont work for me, so I put palms file back. In order to fix the issue of missed typed and spelled words, I have resolved to use good old fashioned hard work and practice; become a faster, better texter, and speller :P

    Thank you for your work, you have done a great job. Kudos~
  16. #96  
    Thanks for all your work, dimfeld. I got my pre rooted recently and this was one of the first things on my list to get cracking on. For those old Treo users out there, this thread contains the necessary steps to essentially recreate the auto replace functionality of the app called Textras. With the work detailed in this thread, my Pre keyboard is now much more usable, and I can type really quickly now.

    I use this more for word replacement, rather than for spelling correction. I'm pretty accurate, having used the Treo keyboards for years (since the 300), so I use it more for shorthand auto replacements. Some examples are:
    th = the
    wi = with
    wo = without
    tue = Tuesday
    fri = Friday
    hr = hour
    dnr = dinner
    hm = home
    ure = you're
    uve = you've

    This helps me to write my emails tons faster.

    Some notes from my experiences so far...

    1) The ordering isn't important. Although the python script outputs in an alphabetically sorted order, the list need not be sorted. I chose one of the pre-generated files and added my own auto-replacements at the end of the file as two alphabetized lists, one right after the other. (BTW: you can have UNIX sort it using the sort command if you really want it sorted.)

    2) Although the list I chose has auto replacements (or autocorrects) that use a comma character (,), the Pre actually doesn't parse those values properly. The python script can probably be updated to not utilize the comma in generating mistaken letters. This will help shave the file sizes down a bit. I deleted any entries that had commas in my file.

    3) Anyone figure out how to auto correct a word using a parentheses?
    eg. (3 = (312)
    This would allow quick entries for phone numbers in your local area code (a useful feature on the Treo). I've tested using regular expressions, but I don't think the Pre parses them in that manner (\(3 = (312)). The backslash (\) escapes the parentheses (() on the left hand side. The right hand side doesn't require an escape character, based on my testing.

    Thanks again. Wonderful work here.
  17. #97  
    By the way, I noticed that the Pre auto-formats phone numbers, so note 3 can be disregarded.

    eg. Typing in the phone number: 1234567890 auto-formats to (123) 456-7890 once you click off that phone number field. What it shows up as in my Google contacts is the question... We'll see.
  18. #98  
    So , in the downloaded zip file I just rename the file "text-edit-autoreplace-combined5k" to"text-edit-autoreplace"

    copy it to the pre and thats it ?

    How do I make a backup of the original file ?

    What is the command to get the newly named file on to the pre ?
  19. #99  
    Quote Originally Posted by itakexrays View Post
    So , in the downloaded zip file I just rename the file "text-edit-autoreplace-combined5k" to"text-edit-autoreplace"

    copy it to the pre and thats it ?

    How do I make a backup of the original file ?

    What is the command to get the newly named file on to the pre ?
    You need to copy it to the right directory of your rooted Pre.
    As root, I changed the group and owner of the file to my personal user so I can update it easily. I also created a symbolic link to the directory so I can get there easily from my home directory too.

    To make a backup of the original file, you can use the cp command to copy.
    cp text-edit-autoreplace text-edit-autoreplace.backup (for example)
    You'll need to have the paths to the files correct though.

    To get the newly named file onto the pre, you can use wget to pull it off the web, but you must have it on the web somewhere from which to pull.

    I may be creating more questions than answering. PM me and I can give you some unix tips.
  20. #100  
    well i just used vi to add r|are and that works .. so I guess all i need to know is how to copy a file from my c drive to the pre/

Tags for this Thread

Posting Permissions