republican-creole
site Search:


 
    All Forums Hot Topics Gallery






how-to block ads


 
Search Topic:
Share Topic
Posting?
Post a:
Post a:
Links: ·How To Get Noticed ·Web Monks FAQ ·Webhosting FAQ ·Posting Code ·How To Post ·Webhosting forum
AuthorAll Replies


yaplej
Premium
join:2001-02-10
White City, OR

reply to cdru

Re: How to deal with a lot of data?

Had no idea about how many atoms are in the universe. Never counted them.

So lets take this another direction. How about analyzing each wireshark stream individually for any possible patterns and only store those if any. Then analyze the collection of matched patterns for the top most common patterns.

Its a lot less storage if you only find a few patterns per session. There might not be any patterns found in a voip session in that case nothing to store. Packet captures of test calls generally are not that big so they might be able to upload analyze that call save any patterns and be done.

Seems like if you had 1000 calls to analyze you could get an idea quickly if there are any common patterns for a particular codec.
--
sk_buff what?

Open Source Network Accelerators
»www.trafficsqueezer.org
»www.opennop.org


jfmezei
Premium
join:2007-01-03
Pointe-Claire, QC
kudos:22

You use RAM to do the pattern matching and then a database to store number of occurrences of patterns that are of interest.

Remember that you should decompress the data in RAM before you do the pattern matching. Trying to compress compressed data often results in more data.



cdru
Go Colts
Premium,MVM
join:2003-05-14
Fort Wayne, IN
kudos:7

reply to yaplej

said by yaplej:

So lets take this another direction. How about analyzing each wireshark stream individually for any possible patterns and only store those if any. Then analyze the collection of matched patterns for the top most common patterns.

You realize you are just describing almost any general data compression algorithm out there. See reference: pkzip, zlib, deflate, etc...

Its a lot less storage if you only find a few patterns per session.

So after a few packets you notice a pattern that could be substituted. First, it's a lousy codec if that pattern is obvious and not a function of just the particular voice characteristics of that call. But even beside that, by the time you realize "Hey, there's a pattern here", the packets should have already been sent. If they haven't, they are now useless as part of the conversation.

But even if you do realize that 001010...1010 is a repeating pattern, you now must tell the other end that this particular pattern repeats and that if they see a particular bitpattern "key" come across, it should be replaced with the expanded value. Of course, this takes some extra data across the line. But it might save us some in the long run.

But that particular bit pattern that we've noticed repeats...may never repeat again. We don't know if it will or won't as it's a "real time" protocol and we can't look ahead, or go back in the past. We can only look at a very small bit of time. And the overhead of dictionary-based compression (such as system resources, sharing the dictionary across the link, and encoding the substituted values) becomes far more then what we would save.

Seems like if you had 1000 calls to analyze you could get an idea quickly if there are any common patterns for a particular codec.

I would suggest reading up on data compression algorithms first. I have a feeling that there is a lot more to compression then what you might know already.

Wednesday, 19-Jun 15:36:40 Terms of Use & Privacy | feedback | contact | Hosting by nac.net - DSL,Hosting & Co-lo
over 13.5 years online © 1999-2013 dslreports.com.
Most commented news this week
Hot Topics